Getting Started

library(panelPomp)
#> Loading required package: pomp

Introduction

Panel data arise when time series are measured on a collection of units. When the time series for each unit is modeled as a partially observed Markov process (POMP) the collection of these models is a PanelPOMP. The panelPomp package provides facilities for inference on panel data using PanelPOMP models. Monte Carlo methods used for POMP models require adaptation for PanelPOMP models due to the higher dimensionality of panel data. This package builds on the functionality and tools of the popular pomp R package, providing a computationally efficient framework that can be used to simulate from, fit, and diagnose PanelPOMP models. As such, a basic working knowledge of the pomp package is recommended. Here we cover some of the necessary basics. See Getting Started With pomp for an introductory Vignette to the pomp package.

Mathematical Notation

Before discussing features of the panelPomp package, we describe a mathematical notation that is helpful in communicating details about panelPomp code and models. The general scope of the panelPomp package requires notation concerning random variables and their densities in arbitrary spaces. The notation below allows us to talk about these things using the language of mathematics, enabling precise description of models and algorithms.

Units of the panel can be identified with numeric labels \(\{1,2,\dots,U\}\), which we also write as \(1:U\). Let \(N_u\) be the number of measurements collected on unit \(u\), allowing for the possibility that a different number of observations are collected for each unit. The data from the entire panel are written as \(y^*_{1:U,1:N_u} = \{y^*_{1,1}, y^*_{2,1},\dots, y^*_{U, 1}, y^*_{1,2}\ldots, y^*_{u,N_u}\}\) where the \(n^{th}\) observation from unit \(u\) (denoted as \(y^*_{u,n}\)) is collected at time \(t_{u,n}\) with \(t_{u,1}<t_{u,2}<\dots<t_{u,N_u}\). Observation times \(\{t_{u, n}\}\) are often equally spaced, but this general framework and notation permit unequally spaced observations times both across and within units. The data from unit \(u\) are modeled as a realization of an observable stochastic process \(Y_{u,1:N_u}\) which is dependent on a latent Markov process \(\{X_{u}(t),t_{u,0}\le t\le t_{u,N_u}\}\) defined subsequent to an initial time \(t_{u,0}\le t_{u,1}\). Requiring that \(\{X_u(t)\}\) and \(\{Y_{u,i},i\neq n\}\) are independent of \(Y_{u,n}\) given \(X_u(t_{u,n})\), for each \(n\in 1: N_{u}\), completes the partially observed Markov process (POMP) model structure for unit \(u\). For a PanelPOMP we require additionally that all units are modeled as independent.

While the latent process may exist between measurement times, its value at measurement times is of particular interest. We write \(X_{u,n}=X_u(t_{u,n})\) to denote the latent process at the observation times. We suppose that \(X_{u,n}\) and \(Y_{u,n}\) take values in arbitrary spaces \(\mathbb{X}_{u}\) and \(\mathbb{Y}_{u}\) respectively. Using the independence of units, conditional independence of the observable random variables, and the Markov property of the latent states, the joint distribution of the entire collection of latent variables \(\mathbf{X} = \{X_{u,0:N_u}\}_{u = 1}^U\) and observable variables \(\mathbf{Y} = \{Y_{u,1:N_u}\}_{u = 1}^U\) can be written as: \[ f_{\mathbf{X}\mathbf{Y}}(\mathbf{x}, \mathbf{y}) = \prod_{u = 1}^U f_{X_{u, 0}}(x_{u,0}; \theta)\prod_{n = 1}^{N_u} f_{Y_{u, n}|X_{u, n}}(y_{u, n}|x_{u, n}; \theta)f_{X_{u, n}|X_{u, n-1}}(x_{u, n}|x_{u, n-1}; \theta), \] where \(\theta\in\mathbb{R}^{D}\) is a possibly unknown parameter vector. This representation is useful as it demonstrates how any PanelPOMP model can be fully described using three primary components: the transition densities \(f_{X_{u,n}|X_{u,n-1}}(x_{u,n}| x_{u,n-1};\theta)\), measurement densities \(f_{Y_{u,n}|X_{u,n}}(y_{u,n}| x_{u,n}:\theta)\), and initialization densities \(f_{X_{u, 0}}(x_{u,0}; \theta)\). Each class of densities are permitted to depend arbitrarily on \(u\) and \(n\), allowing non-stationary models and the inclusion of covariate time series. In addition to continuous-time dynamics, the framework includes discrete-time dynamic models by specifying \(X_{u,0:N_u}\) directly without ever defining \(\{X_u(t),t_{u,0}\le t\le t_{u,N_u}\}\).

Likelihood Function

The marginal density of \(Y_{u,1:N_u}\) at \(y_{u,1:N_u}\) is \(f_{Y_{u,1:N_u}}(y_{u,1:N_u};\theta)\) and the likelihood function for unit \(u\) is \(L_{u}(\theta) = f_{Y_{u,1:N_u}}(y^*_{u,1:N_u};\theta)\). The likelihood for the entire panel is \(L(\theta) = \prod_{u=1}^{U} L_{u}(\theta)\), and any solution \(\hat\theta=\arg\max L(\theta)\) is a maximum likelihood estimate (MLE). The log likelihood is \(\ell(\theta)=\log L(\theta)\). We also permit the possibility that some parameters may affect only a subset of units, so that the parameter vector can be written as \(\theta=(\phi,\psi_1,\dots,\psi_U)\), where the important densities described above can be written as \[ f_{X_{u,n}\vert X_{u,n-1}}(x_{u,n}| x_{u,n-1} ; \theta)=f_{X_{u,n}|X_{u,n-1}}(x_{u,n}| x_{u,n-1} ; \phi,\psi_u) \tag{1}\]

\[ f_{Y_{u,n}|X_{u,n}}(y_{u,n}| x_{u,n} ; \theta) = f_{Y_{u,n}|X_{u,n}}(y_{u,n}| x_{u,n} ; \phi,\psi_u) \tag{2}\]

\[ f_{X_{u,0}}(x_{u,0} ; \theta) = f_{X_{u,0}}(x_{u,0} ; \phi,\psi_u) \tag{3}\]

Then, \(\psi_{u}\) is a vector of unit-specific parameters for unit \(u\), and \(\phi\) is a shared parameter vector. We suppose \(\phi\in\mathbb{R}^{A}\) and \(\psi\in\mathbb{R}^{B}\), so the dimension of the parameter vector \(\theta\) is \(D=A+B U\). In practice, the densities in Equation 1Equation 3 serve two primary roles in PanelPOMP models: evaluation and simulation. The way these fundamental goals are represented in the panelPomp package is described in Table 1.

Table 1: Basic mathematical functions of all PanelPOMP models and their representation in the panelPomp
Method Mathematical terminology
rprocess Simulate from Equation 1: \(f_{X_{u,n}\vert X_{u,n-1}}(x_{u,n}| x_{u,n-1} ; \theta)\)
dprocess Evaluate Equation 1: \(f_{X_{u,n}| X_{u,n-1}}(x_{u,n}| x_{u,n-1} ; \theta)\)
rmeasure Simulate from Equation 2: \(f_{Y_{u,n}|X_{u,n}}(y_{u,n}| x_{u,n} ; \phi,\psi_u)\)
dmeasure Evaluate Equation 2: \(f_{Y_{u,n}|X_{u,n}}(y_{u,n}| x_{u,n} ; \phi,\psi_u)\)
rinit Simulate from Equation 3: \(f_{X_{u,0}}(x_{u,0} ; \phi,\psi_u)\)
dinit Evaluate Equation 3: \(f_{X_{u,0}}(x_{u,0} ; \phi,\psi_u)\)

Each independent unit in a panel is POMP model, represented using the pomp package. Each pomp object contains the same components described in Table 1, with the exception that parameters cannot be shared across individual units in the panel. As such, the functions listed in Table 1 are available in the panelPomp package through the pomp package. Additional functions of interest that are not listed in Table 1 include: rprior() and dprior(), enabling the use of Bayesian analysis if desired; emeasure() and vmeasure(), which describe the conditional expectation and covariance of the measurement model for algorithms that rely on these values (such as the Kalman filter).

panelPomp objects

The panelPomp package is written in a functional object oriented programming framework. Key to the most important features of the package is the panelPomp class, which is implemented using the S4 system. This S4 class contains three slots:

  • unit_objects: A list of pomp objects.
  • shared: a named numeric vector containing the names and values of parameters that are shared for each unit of the panel.
  • specific: a numeric matrix with row and column names; row names correspond to the parameter names, and column names to the unit names of the panel.

Notably, the functions listed in Table 1 are not part of a panelPomp object directly, rather they are part of the individual unit objects saved in the slot unit_objects. These objects can be extracted using the extracter function: unit_objects(<object>).

Constructing panelPomp objects

The fundamental mathematical functions that define a PanelPOMP model are made available via the pomp objects in the unit_objects slot. As such, constructing a panelPomp object is simple if you are already familiar with constructing pomp objects, or if you already have access to pomp objects. Here we describe how to create a panelPomp object if the unit-specific pomp objects are already created. In the next sub-section, we give a brief demonstration of how to construct a panelPomp object from scratch, including each individual pomp object. Here, we construct a panelPomp object representing a panel of stochastic Gompertz population models with log-normal measurement error. The latent state process is defined as: \[ X_{n + 1} = K^{1-S}X_n^S\epsilon_n, \] where \(S = \exp^{-r}\) and the \(\epsilon_n\) are i.i.d. lognormal random variables with variance \(\sigma^2\). The measurement model for the observed variables \(Y_n\) are distributed as: \[ Y_n \sim \text{Lognormal}\big(\log X_n, \tau \big). \] The parameters of this model are:

  • Per-capita growth rate \(r\).
  • The carrying capacity \(K\).
  • The process noise standard deviation \(\sigma\).
  • The measurement error standard deviation \(\tau\).
  • The initial condition \(X_0\).

This particular model class has a constructor function gompertz() from the pomp package. Here, we create 5 unique instances of this model, and use these instances to create a single panelPomp object:

mod1 <- pomp::gompertz()  # Using default values 
mod2 <- pomp::gompertz(K = 2, r = 0.01)  # Overwriting some of the defaults 
mod3 <- pomp::gompertz(K = 1.5, sigma = 0.15, X_0 = 5)
mod4 <- pomp::gompertz(K = 1.5, r = 0.05, X_0 = 5)
mod5 <- pomp::gompertz(K = 5, sigma = 0.08)

panelMod1 <- panelPomp(
  object = list(mod1, mod2, mod3, mod4, mod5)
)

One important thing to note above the above construction is that each individual model already has parameter values present. When this is the case, the panelPomp() constructor sets all parameters to be unit specific:

print(specific(panelMod1))
#>          unit
#> parameter unit1 unit2 unit3 unit4 unit5
#>     K       1.0  2.00  1.50  1.50  5.00
#>     r       0.1  0.01  0.10  0.05  0.10
#>     sigma   0.1  0.10  0.15  0.10  0.08
#>     tau     0.1  0.10  0.10  0.10  0.10
#>     X_0     1.0  1.00  5.00  5.00  1.00
print(shared(panelMod1))
#> numeric(0)

In the panelMod1 object, all five parameters are listed as unit specific. Notably, because \(\tau\) was not modified in any of the unit specific objects, it has the same value across all five units. In such cases, it might make sense to list parameters that have the same value across all units as shared parameters, which can be done in the model constructor:

panelMod2 <- panelPomp(
  object = list(mod1, mod2, mod3, mod4, mod5),
  shared = c("tau" = 0.1)
)

specific(panelMod2)
#>          unit
#> parameter unit1 unit2 unit3 unit4 unit5
#>     K       1.0  2.00  1.50  1.50  5.00
#>     r       0.1  0.01  0.10  0.05  0.10
#>     sigma   0.1  0.10  0.15  0.10  0.08
#>     X_0     1.0  1.00  5.00  5.00  1.00
shared(panelMod2)
#> tau 
#> 0.1

In this case we did not need to explicitly specify unit-specific parameters; if parameter values are present in the unit pomp objects that comprise the panel, parameters are assumed to be unit-specific unless otherwise specified. However, it is possible to explicitly provide a matrix of unit specific parameters in the constructor, if desired. This is especially important if the individual pomp objects that make up the panel have missing parameter values.

Unit-specific parameters can be expressed in two ways: as a matrix with rows corresponding to parameter values and columns the corresponding unit (as seen above), or as a named numeric vector that follows the convention <param>[<unit name>]:

specific(panelMod2, format = 'vector')
#>     K[unit1]     r[unit1] sigma[unit1]   X_0[unit1]     K[unit2]     r[unit2] 
#>         1.00         0.10         0.10         1.00         2.00         0.01 
#> sigma[unit2]   X_0[unit2]     K[unit3]     r[unit3] sigma[unit3]   X_0[unit3] 
#>         0.10         1.00         1.50         0.10         0.15         5.00 
#>     K[unit4]     r[unit4] sigma[unit4]   X_0[unit4]     K[unit5]     r[unit5] 
#>         1.50         0.05         0.10         5.00         5.00         0.10 
#> sigma[unit5]   X_0[unit5] 
#>         0.08         1.00

It is often convenient to modify which parameters are shared and which are unit-specific on existing panelPomp objects rather than creating new objects from scratch. This can be done with the shared<- and specific<- setter functions:

shared(panelMod2) <- c('r' = 0.05, 'sigma' = 0.1)
specific(panelMod2) <- c('tau[unit1]' = 0.11, 'tau[unit4]' = 0.09)

print(shared(panelMod2))
#>     r sigma 
#>  0.05  0.10
print(specific(panelMod2))
#>      unit
#> param unit1 unit2 unit3 unit4 unit5
#>   K    1.00   2.0   1.5  1.50   5.0
#>   tau  0.11   0.1   0.1  0.09   0.1
#>   X_0  1.00   1.0   5.0  5.00   1.0

Notice above that if a shared parameter (tau) is changed to a unit-specific parameter and not all values of the unit-specific parameter are explicitly provided, the parameters that are not specified default to the original shared value. The unit-specific setter function also works in matrix format:

specific(panelMod2) <- matrix(
  data = rbind(c(1.24, 1.78), 
               c(   2,    3)),
  nrow = 2, 
  dimnames = list(
    param = c("K", "X_0"), 
    unit = c('unit2', 'unit4')
  )
)

specific(panelMod2)
#>      unit
#> param unit1 unit2 unit3 unit4 unit5
#>   K    1.00  1.24   1.5  1.78   5.0
#>   X_0  1.00  2.00   5.0  3.00   1.0
#>   tau  0.11  0.10   0.1  0.09   0.1

Neither the shared<- nor the specific<- setter functions allow a user to add new parameters (or unit names) that are not already part of the model:

shared(panelMod2) <- c("foo" = 1)
#> Error: in 'shared<-': 'value' contains parameters not found in 'object'.
specific(panelMod2) <- c("tau[unit6]" = 1)
#> Error: in 'specific<-': 'value' contains unit names not in 'object'.