Historically, the gravity model has been used to model connectivity among locations based on the distances among \(i\) origins and \(j\) destinations (\(d_{ij}\)) and population sizes of each location (\(N_i\) and \(N_j\)). A major advantage of the gravity model is that distance and population size are simple covariates that can be obtained in almost any context, allowing a researcher to infer connectivity with little information.
When fitting the gravity model to data, we also need a matrix of observed trip volume (\(M\)). Trip volume is an intentionally vague term here because a mobility data matrix can be populated with a variety of measurements that capture relative amounts of travel among locations. For example, if we are using Call Data Records (CDR) which are measures of mobile phone usage supplied by mobile phone companies, the unit of travel volume is likely to be the total number of person trips per unit time. Or if we are using a travel survey that researchers have collated from questionares given to residents of an area, the measurement of travel volume might be the total number of individuals that reported travelling from \(i\) to \(j\) in the past month.
The actual unit of travel volume is less important when modeling connectivity because the gravity model formula estimates relative travel volume that is normalized across all \(j\) destinations of an origin, so that the rows of \(M\) sum to 1. However, it is important that all \(ij\) cells in \(M\) have the same unit of travel volume per unit time because the model estimates parameters based on relative differences in trip volume.
The rest of this vignette shows an example of how to estimate and simulate connectivity (\(\hat{\pi}_{ij}\)) for all possible \(i \rightarrow j\) travel routes found in supplied data matrices (\(M\), \(D\), and \(N\)).
Before we can fit a gravity model, we must build data matrices from the longform travel data. The utility functions get_mob_matrix()
, get_distance_matrix()
, and get_pop_vec()
can be used to generate data matrices representing travel volume among locations (\(M\)), along with distances (\(D\)) and population sizes (\(N\)). Here we use pre-built matrices from simulated data in the mobiliy_matrices()
data object.
M <- mobility_matrices$M D <- mobility_matrices$D N <- mobility_matrices$N
By design, we have added additional stochasticity around trip counts in the movement matrix (\(M\)) and randomly sampled 80% of observed routes to simulate missing observations, which are shown as grey cells in the plot below.
Gravity model parameters can be estimated using the fit_gravity()
function. This function uses the distance among locations (\(D\)) and population sizes (\(N\)) as covariates in the gravity equation, which is fitted to the movement matrix (\(M\)) with a Poisson likelihood link function: \[
\begin{aligned}
m_{ij} &\sim \text{Poisson}\big(\pi_{ij}N_{i}\big)\\
\pi_{ij} &= c_{ij}/\sum_{\forall j}c_{ij} \\
c_{ij} &= \theta \Bigg(
\frac{
N_{i}^{\omega_1} N_{j}^{\omega_2}
}{
d_{ij}
} \Bigg)
\end{aligned}
\] Gravity model parameters are fit through normalized connectivity values (\(\pi_{ij}\)) that scale the Poission likelihood so that it is proportional to the observed trip counts (\(m_{ij}\)). The exponential parameters \(\omega_1\) and \(\omega_2\) are weights that modify the contribution of origin and destination population sizes, and \(\theta\) is a proportionality constant. The denominator of the gravity model, \(d_{ij}^\gamma\), serves as the dispersal kernel function. The fitting function fit_gravity()
estimates the posterior distribution of model parameters using Bayesian MCMC inference.
mod <- fit_gravity(M=M, D=D, N=N, n_chain=2, n_burn=1000, n_samp=1000, n_thin=2, DIC=TRUE) #> ::Fitting gravity model for 10 origins and 10 destinations:: #> Using uniformative priors #> Compiling model graph #> Resolving undeclared variables #> Allocating nodes #> Graph information: #> Observed stochastic nodes: 74 #> Unobserved stochastic nodes: 30 #> Total graph size: 1196 #> #> Initializing model #> #> NOTE: Stopping adaptation
The above code fits the gravity model using 4 sampling chains, discards the first 1000 samples as ‘burin in’, and then takes 1000 samples thinning by 2 to give 500 total samples per chain. The DIC=TRUE
argument tells the function to calculate the Deviance Information Criterion (DIC).
Running a gravity model on a large number of locations (e.g. \(\gt 100\)) or drawing a large number of samples from posterior distributions (e.g. 10000) may take several minutes. If computation is cumbersome, then the model can run sampling chains in parallel instead of sequentially by specifying parallel = TRUE
.
The fitting function fit_gravity()
returns results as a coda::mcmc.list()
object shown below. The coda::mcmc.list()
is a common object class that holds the posterior samples for each parameter and each sampling chain.
str(mod) #> List of 2 #> $ : 'mcmc' num [1:500, 1:7] 35316 35319 35319 35316 35319 ... #> ..- attr(*, "dimnames")=List of 2 #> .. ..$ : NULL #> .. ..$ : chr [1:7] "deviance" "gamma" "omega_1" "omega_2" ... #> ..- attr(*, "mcpar")= num [1:3] 1 500 1 #> $ : 'mcmc' num [1:500, 1:7] 35318 35320 35316 35319 35318 ... #> ..- attr(*, "dimnames")=List of 2 #> .. ..$ : NULL #> .. ..$ : chr [1:7] "deviance" "gamma" "omega_1" "omega_2" ... #> ..- attr(*, "mcpar")= num [1:3] 1 500 1 #> - attr(*, "class")= chr "mcmc.list"
To calculate summary statistics of estimated parameter values, you can supply a fitted model object to the summarize_mobility()
function. This function is a wrapper for the MCMCvis::MCMCsummary()
function that calculates summary statistics for each parameter across each chain along with convergance diagnosics like the Gelman-Rubin convergence diagnostic and (\(\hat{R}\)) and samples auto-correlation foreach parameter.
mod <- summarize_mobility(mod) mod #> Mean SD HPD2.5 HPD97.5 Rhat SSeff AC2 AC5 #> gamma 1.724926e+00 0.01304628 1.698391e+00 1.749850e+00 1.00 921 0.07 0.01 #> omega_1 1.450205e+00 0.77755603 4.945579e-01 3.045838e+00 1.00 368 0.42 0.11 #> omega_2 3.157012e-02 0.01556911 3.486917e-03 5.960560e-02 1.00 923 0.07 0.01 #> theta 1.039387e+00 0.71954153 8.013179e-02 2.506090e+00 1.04 577 0.27 0.00 #> DIC 3.532856e+04 2.41229794 3.532560e+04 3.533320e+04 1.00 757 0.14 0.01 #> deviance 3.531930e+04 2.41229794 3.531634e+04 3.532394e+04 1.00 757 0.14 0.01 #> pD 4.632423e+00 NA NA NA NA NA NA NA #> AC10 #> gamma -0.02 #> omega_1 0.03 #> omega_2 0.05 #> theta 0.05 #> DIC -0.04 #> deviance -0.04 #> pD NA
The check_mobility()
function provides goodness of fit metrics and summary plots.
check_mobility(M, D, N, mod)
#> $DIC
#> [1] 35328.56
#>
#> $RMSE
#> [1] 776.7042
#>
#> $MAPE
#> [1] 4.824089
#>
#> $R2
#> [1] 0.9049106
Now that we have verified that our model fits the data adequately, we can simulate connectivity values among the off-diagonal locations by using the estimated parameter values from the fitted gravity model in the sim_gravity()
function.
pi_hat <- sim_gravity(N=N, D=D, theta=mod['theta', 'Mean'], omega_1=mod['omega_1', 'Mean'], omega_2=mod['omega_2', 'Mean'], gamma=mod['gamma', 'Mean'])