The SpatialInference package provides tools for
statistical inference with spatially correlated data. Its main features
are:
Load the package and data containing the centroids of all 3,108 counties of the contiguous United States:
library(SpatialInference)
library(sf); library(ggplot2)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE
library(lfe)
#> Loading required package: Matrix
library(modelsummary)
data("US_counties_centroids")
ggplot(US_counties_centroids) + geom_sf(size = .1) + theme_bw()Even though the two noise variables were drawn independently (so the true coefficient is zero), a naive regression finds a highly significant relationship:
spuriouslm <- fixest::feols(noise1 ~ noise2, data = US_counties_centroids, vcov = "HC1")
spuriouslm
#> OLS estimation, Dep. Var.: noise1
#> Observations: 3,108
#> Standard-errors: Heteroskedasticity-robust
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 6.890000e-16 0.017872 3.860000e-14 1.0000e+00
#> noise2 8.738022e-02 0.015535 5.624836e+00 2.0216e-08 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 0.996015 Adj. R2: 0.007316This is because the OLS errors \(\varepsilon_i\) are not i.i.d. — nearby residuals are positively correlated, which inflates the \(t\)-statistic. A map of the regression residuals reveals the spatial clustering:
US_counties_centroids$resid <- spuriouslm$residuals
ggplot(US_counties_centroids) +
geom_sf(aes(col = resid), size = .1) + theme_bw() + scale_color_viridis_c()Before computing Conley standard errors, we need to choose a
bandwidth: the distance within which residuals are allowed to be
correlated. The covgm_range() function estimates this from
the data by computing the empirical covariogram of the regression
residuals and identifying the distance at which covariation first
crosses zero (Lehner 2026; Pebesma
2004).
The plot shows the empirical covariogram — each point represents the average cross-product of residual pairs within a distance bin. The red vertical line marks the estimated correlation range \(\hat{\varsigma}\) (here around 831 km), and the red dashed line marks zero covariance. Beyond the estimated range, the covariogram fluctuates around zero, indicating that residual correlation has effectively ceased.
This is the key input for the Conley standard error: the bandwidth should match the distance over which residuals are actually correlated.
To extract just the numeric range for programmatic use:
# Compute the covariogram manually
covgm <- gstat::variogram(
spuriouslm$residuals ~ 1,
data = sf::as_Spatial(US_counties_centroids),
covariogram = TRUE,
width = 2e4,
cutoff = as.numeric(max(sf::st_distance(US_counties_centroids))) * (2/3)
)
estimated_range <- extract_corr_range(covgm)
estimated_range
#> [1] 820.0129A critical insight documented by Lehner (2026) is that the relationship between the bandwidth and the magnitude of Conley standard errors follows an inverse-U shape. This means that both too narrow and too wide bandwidths lead to underestimated standard errors — contradicting the conventional wisdom that wider bandwidths are always more conservative.
The inverseu_plot_conleyrange() function visualises
this:
The grey dashed line represents the HC1 standard error. As the cutoff distance increases from zero, the Conley SE first rises (as positively correlated residual pairs are included), reaches a peak near the true correlation range, and then declines (as uncorrelated pairs dilute the estimate). At very wide bandwidths, the Conley SE can fall below the HC1 level — making wide-bandwidth Conley errors less conservative than heteroskedasticity-robust errors.
With the estimated range in hand, we compute the spatial HAC standard
error using conley_SE():
spuriouslm_fe <- felm(noise1 ~ noise2 | unit + year | 0 | lat + lon,
data = US_counties_centroids, keepCX = TRUE)
regfe_conley <- conley_SE(reg = spuriouslm_fe, unit = "unit", time = "year",
lat = "lat", lon = "lon",
kernel = "epanechnikov", dist_fn = "Haversine",
dist_cutoff = 831)
conley <- sapply(regfe_conley, function(x) diag(sqrt(x)))[2] %>% round(5)
conley
#> Spatial.noise2
#> 0.137The Conley SE at the estimated bandwidth is substantially larger than the HC1 error. Constructing a corrected \(t\)-statistic:
The \(t\)-statistic is far below 1.96, so we correctly fail to reject the null hypothesis — as expected, given that the two variables are genuinely independent.
The compute_conley_lfe() function provides a
shortcut:
For a complete workflow — regression, Moran’s I test, and Conley SEs
in one call — use lm_sac():
lmsac.out <- lm_sac("noise1 ~ noise2 | unit + year | 0 | lat + lon",
US_counties_centroids,
conley_cutoff = 831, conley_kernel = "epanechnikov")
lmsac.out$conley_SE
#> [1] 0.0000000 0.1370043
gm.param <- list(
list("raw" = "nobs", "clean" = "Observations", "fmt" = 0),
list("raw" = "Moran_y", "clean" = "Moran's I [y]", "fmt" = 3),
list("raw" = "Moran_resid", "clean" = "Moran's I [resid]", "fmt" = 3),
list("raw" = "y_mean", "clean" = "Mean [y]", "fmt" = 3),
list("raw" = "y_SD", "clean" = "Std.Dev. [y]", "fmt" = 3)
)
modelsummary(lmsac.out,
estimate = c("{estimate}"),
statistic = c("({std.error})", "([{conley}])"),
gof_omit = "Model|Range_resid|Range_resp|Range_y",
gof_map = gm.param)
#> Found litedown! Enabling r-universe template| (1) | |
|---|---|
| (Intercept) | 0.000 |
| (0.018) | |
| ([0.137]) | |
| noise2 | 0.087 |
| (0.018) | |
| ([0.137]) | |
| Observations | 3108 |
The spatial HAC estimate is sensitive not only to the bandwidth but also to the kernel function. The package includes six kernels with different distance-decay profiles. The uniform kernel weights all pairs equally within the cutoff; the others downweight more distant pairs with varying decay rates:
compute_conley_lfe(spuriouslm_fe, cutoff = 831, kernel_choice = "uniform")
#> [1] 0.13964
compute_conley_lfe(spuriouslm_fe, cutoff = 831, kernel_choice = "epanechnikov")
#> [1] 0.137
compute_conley_lfe(spuriouslm_fe, cutoff = 831, kernel_choice = "bartlett")
#> [1] 0.12253
compute_conley_lfe(spuriouslm_fe, cutoff = 831, kernel_choice = "parzen")
#> [1] 0.11584
compute_conley_lfe(spuriouslm_fe, cutoff = 831, kernel_choice = "gaussian")
#> [1] 0.13799
compute_conley_lfe(spuriouslm_fe, cutoff = 831, kernel_choice = "biweight")
#> [1] 0.13087Based on Monte Carlo evidence in Lehner (2026), the Bartlett and Epanechnikov kernels tend to deliver the best size control in practice.