Title: | Conduct Multiple Types of Geographic Regression Discontinuity Designs |
---|---|
Description: | Spatial versions of Regression Discontinuity Designs (RDDs) are becoming increasingly popular as tools for causal inference. However, conducting state-of-the-art analyses often involves tedious and time-consuming steps. This package offers comprehensive functionalities for executing all required spatial and econometric tasks in a streamlined manner. Moreover, it equips researchers with tools for performing essential placebo and balancing checks comprehensively. The fact that researchers do not have to rely on 'APIs' of external 'GIS' software ensures replicability and raises the standard for spatial RDDs. |
Authors: | Alexander Lehner [aut, cre] |
Maintainer: | Alexander Lehner <[email protected]> |
License: | GPL-3 |
Version: | 0.1.0.9000 |
Built: | 2025-02-14 02:57:02 UTC |
Source: | https://github.com/axlehner/spatialrdd |
Creates a vector with 0's and 1's to determine on which side of the cut-off each observation is. For this it is useful to have a polygon that fully describes the "treated area".
If you do not have such a polygon there is a (preliminary and patchy) way implemented in the package via points2line
and cutoff2polygon
that lets you go from points to line to "treated polygon" in a very crude way.
assign_treated(data, polygon, id = NA)
assign_treated(data, polygon, id = NA)
data |
sf data frame containing point data (if you have polygons, convert first with sf::st_centroid()) |
polygon |
sf object with polygon geometry that fully describes the area(s) that contain the treated points |
id |
string that represents the name of the column in the data that represents the unique identifier for each observation |
A vector of type factor with 0's and 1's. Convert with as.numeric() if you want real numbers/integers.
This is essentially a wrapper of sf::st_intersection
.
points_samp.sf <- sf::st_sample(polygon_full, 100) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id")
points_samp.sf <- sf::st_sample(polygon_full, 100) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id")
Creates n
segments of a line (the RD cut-off) and assigns the closest border segment for each observation in the sf data frame.
Computationally these tasks are quite demanding when the sample size is big and thus might take a few seconds to complete.
border_segment(data, cutoff, n = 10)
border_segment(data, cutoff, n = 10)
data |
sf data frame containing point data |
cutoff |
the RDD border in the form of a line (preferred) or borderpoints |
n |
the number of segments to be produced |
a vector with factors, each category representing one segment
points_samp.sf <- sf::st_sample(polygon_full, 100) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) points_samp.sf$segment10 <- border_segment(points_samp.sf, cut_off, 3)
points_samp.sf <- sf::st_sample(polygon_full, 100) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) points_samp.sf$segment10 <- border_segment(points_samp.sf, cut_off, 3)
Unifies shift_border
, cutoff2polygon
, assign_treated
in one function to carry out a myriad of placebo checks at once.
The output is either a data.frame (with or without geometry of the respective placeboline) or a coefplot.
Requires operations data.frame that contains all desired operations (columns shift.x, shift.y, scale, angle, orientation.1, orientation.2, endpoint.1, endpoint.2),
if you don't need a certain operation just use default values (e.g. 0 for angle and 1 for scale), but the column has to be there.
create_placebos( data, cutoff, formula, operations, bw_dist, coefplot = FALSE, geometry = FALSE )
create_placebos( data, cutoff, formula, operations, bw_dist, coefplot = FALSE, geometry = FALSE )
data |
sf data.frame that contains all units of observation |
cutoff |
initial RD cutoff as an sj object |
formula |
provide the formula you want to use for OLS, omit the treatetment dummy (if you want a univariate regression just on "treated", then provide y ~ 1 as formula) |
operations |
container that has all the information in it on how to change the border for each placeboregression |
bw_dist |
what is the distance for the bandwith (in CRS units, thus ideally metres) |
coefplot |
provide coefplot instead of a data.frame |
geometry |
set to |
either a coefplot or data.frame containing results of placebo regressions
points_samp.sf <- sf::st_sample(polygon_full, 100) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id") operations.df <- data.frame(operation = c("shift"), shift.x = c(0), shift.y = c(0), scale = 1, angle = 0, orientation.1 = c("west"), orientation.2 = c("west"), endpoint.1 = c(.8), endpoint.2 = c(.2)) create_placebos(data = points_samp.sf, cutoff = cut_off, formula = id ~ 1, operations = operations.df, bw_dist = 3000)
points_samp.sf <- sf::st_sample(polygon_full, 100) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id") operations.df <- data.frame(operation = c("shift"), shift.x = c(0), shift.y = c(0), scale = 1, angle = 0, orientation.1 = c("west"), orientation.2 = c("west"), endpoint.1 = c(.8), endpoint.2 = c(.2)) create_placebos(data = points_samp.sf, cutoff = cut_off, formula = id ~ 1, operations = operations.df, bw_dist = 3000)
sf multilinestring representing a spatial RD cut-off
data(cut_off)
data(cut_off)
A spatial data.frame of class sf
Lehner, Alexander (2023) Culture, Institutions, and the Roots of Gender Inequality: 450 Years of Portuguese Colonialism in India
Creates an approximation of a "treated/untreated polygon" to assign the status again to each observation after the border has been shifted.
The function extends both ends of the provided cutoff to the edge of the (imaginary) bounding box of the provided data (this ensures all observations will be included).
Key is that you provide a 2-tuple that indicates in which side of the bounding box each end should go (1st element is the one with lower x-coordinate, i.e. leftern most). Always check the output manually by plotting the polygon (e.g. with tm_shape(your.polygon) + tm_polygons()
).
If the output polygon looks odd, a first check should be to just switch the elements from the orientation vector around! See vignette(shifting_borders)
for details and illustrative examples.
cutoff2polygon(data, cutoff, orientation = NA, endpoints = c(0, 0))
cutoff2polygon(data, cutoff, orientation = NA, endpoints = c(0, 0))
data |
study dataset to determine the bounding box (so that all observations are covered by the new polygons) in sf format |
cutoff |
sf object of the (placebo) cut-off |
orientation |
in which side of the bounding box does each of the extensions of the cutoff go into? First element refers to endpoint of border with smaller x-coordinate ("westernmost") (takes two of "north", "east", "south", "west" in a vector, e.g. |
endpoints |
at what position on the edge should each polygon end? (vector with two numbers between 0 and 1, where 0.5 e.g. means right in the middle of the respective edge) |
a polygon as an sf object
points_samp.sf <- sf::st_sample(polygon_full, 100) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) cutoff2polygon(data = points_samp.sf, cutoff = cut_off, orientation = c("west", "west"), endpoints = c(.8, .2))
points_samp.sf <- sf::st_sample(polygon_full, 100) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) cutoff2polygon(data = points_samp.sf, cutoff = cut_off, orientation = c("west", "west"), endpoints = c(.8, .2))
Takes in a border in the form of a polyline (or borderpoints) and converts it into point data. These points are later used to run separate non-parametric RD estimations which eventually allows to visualise potential heterogeneous treatment effects alongside the cut-off.
discretise_border( cutoff, n = 10, random = FALSE, range = FALSE, ymax = NA, ymin = NA, xmax = NA, xmin = NA )
discretise_border( cutoff, n = 10, random = FALSE, range = FALSE, ymax = NA, ymin = NA, xmax = NA, xmin = NA )
cutoff |
sf object of the RD cut-off in the form of a line (not preferred, but also boundarypoints are possible) |
n |
the number of borderpoints to be created |
random |
whether they are randomly chosen (not desireable in most cases) |
range |
default = FALSE, if there is a specific range (N-S or E-W) for which the points are to be drawn (useful in order to exclude sparse borderpoints with little/no oberservations around because the non-parametric RD estimation will fail) |
ymax |
if range = TRUE: y coordinates |
ymin |
if range = TRUE: y coordinates |
xmax |
if range = TRUE: x coordinates |
xmin |
if range = TRUE: x coordinates |
an sf object with selected (and evenly spaced) borderpoints
borderpoints <- discretise_border(cutoff = cut_off, n = 10)
borderpoints <- discretise_border(cutoff = cut_off, n = 10)
Produces plot of GRDDseries and optionally of a map that visualises every point estimate in space.
plotspatialrd(SpatialRDoutput, map = FALSE)
plotspatialrd(SpatialRDoutput, map = FALSE)
SpatialRDoutput |
spatial object that is produced by an estimation with |
map |
TRUE/FALSE depending on whether mapplot is desired (make sure to set |
plots produced with ggplot2
points_samp.sf <- sf::st_sample(polygon_full, 1000) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) # assign treatment: points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id") # first we define a variable for the number of "treated" and control NTr <- length(points_samp.sf$id[points_samp.sf$treated == 1]) NCo <- length(points_samp.sf$id[points_samp.sf$treated == 0]) # the treated areas get a 10 percentage point higher literacy rate points_samp.sf$education[points_samp.sf$treated == 1] <- 0.7 points_samp.sf$education[points_samp.sf$treated == 0] <- 0.6 # and we add some noise, otherwise we would obtain regression coeffictions with no standard errors points_samp.sf$education[points_samp.sf$treated == 1] <- rnorm(NTr, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 1] points_samp.sf$education[points_samp.sf$treated == 0] <- rnorm(NCo, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 0] # create distance to cutoff points_samp.sf$dist2cutoff <- as.numeric(sf::st_distance(points_samp.sf, cut_off)) points_samp.sf$distrunning <- points_samp.sf$dist2cutoff # give the non-treated one's a negative score points_samp.sf$distrunning[points_samp.sf$treated == 0] <- -1 * points_samp.sf$distrunning[points_samp.sf$treated == 0] # create borderpoints borderpoints.sf <- discretise_border(cutoff = cut_off, n = 10) borderpoints.sf$id <- 1:nrow(borderpoints.sf) # finally, carry out estimation alongside the boundary: results <- spatialrd(y = "education", data = points_samp.sf, cutoff.points = borderpoints.sf, treated = "treated", minobs = 20, spatial.object = FALSE) plotspatialrd(results)
points_samp.sf <- sf::st_sample(polygon_full, 1000) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) # assign treatment: points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id") # first we define a variable for the number of "treated" and control NTr <- length(points_samp.sf$id[points_samp.sf$treated == 1]) NCo <- length(points_samp.sf$id[points_samp.sf$treated == 0]) # the treated areas get a 10 percentage point higher literacy rate points_samp.sf$education[points_samp.sf$treated == 1] <- 0.7 points_samp.sf$education[points_samp.sf$treated == 0] <- 0.6 # and we add some noise, otherwise we would obtain regression coeffictions with no standard errors points_samp.sf$education[points_samp.sf$treated == 1] <- rnorm(NTr, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 1] points_samp.sf$education[points_samp.sf$treated == 0] <- rnorm(NCo, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 0] # create distance to cutoff points_samp.sf$dist2cutoff <- as.numeric(sf::st_distance(points_samp.sf, cut_off)) points_samp.sf$distrunning <- points_samp.sf$dist2cutoff # give the non-treated one's a negative score points_samp.sf$distrunning[points_samp.sf$treated == 0] <- -1 * points_samp.sf$distrunning[points_samp.sf$treated == 0] # create borderpoints borderpoints.sf <- discretise_border(cutoff = cut_off, n = 10) borderpoints.sf$id <- 1:nrow(borderpoints.sf) # finally, carry out estimation alongside the boundary: results <- spatialrd(y = "education", data = points_samp.sf, cutoff.points = borderpoints.sf, treated = "treated", minobs = 20, spatial.object = FALSE) plotspatialrd(results)
Small function that connects dots and makes them one line which can later be used as a cutoff for the RD.
points2line(borderpoints, crs)
points2line(borderpoints, crs)
borderpoints |
a set of points on a boundary |
crs |
set the coordinate reference system (CRS) |
a line as an sf object
points_samp.sf <- sf::st_sample(polygon_full, 2) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) points2line(points_samp.sf, crs = sf::st_crs(points_samp.sf))
points_samp.sf <- sf::st_sample(polygon_full, 2) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) points2line(points_samp.sf, crs = sf::st_crs(points_samp.sf))
sf multipolygon
data(polygon_full)
data(polygon_full)
A spatial data.frame of class sf
Lehner, Alexander (2023) Culture, Institutions, and the Roots of Gender Inequality: 450 Years of Portuguese Colonialism in India
sf multipolygon
data(polygon_treated)
data(polygon_treated)
A spatial data.frame of class sf
Lehner, Alexander (2023) Culture, Institutions, and the Roots of Gender Inequality: 450 Years of Portuguese Colonialism in India
Preliminary function, styling with e.g. kable and kableExtra has to be done by the user individually.
You could also just use the package of your choice to print out columns of the output from spatialrd
.
printspatialrd(SpatialRDoutput)
printspatialrd(SpatialRDoutput)
SpatialRDoutput |
output file from the |
A table with results from the spatialrd
function
points_samp.sf <- sf::st_sample(polygon_full, 1000) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) # assign treatment: points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id") # first we define a variable for the number of "treated" and control NTr <- length(points_samp.sf$id[points_samp.sf$treated == 1]) NCo <- length(points_samp.sf$id[points_samp.sf$treated == 0]) # the treated areas get a 10 percentage point higher literacy rate points_samp.sf$education[points_samp.sf$treated == 1] <- 0.7 points_samp.sf$education[points_samp.sf$treated == 0] <- 0.6 # and we add some noise, otherwise we would obtain regression coeffictions with no standard errors points_samp.sf$education[points_samp.sf$treated == 1] <- rnorm(NTr, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 1] points_samp.sf$education[points_samp.sf$treated == 0] <- rnorm(NCo, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 0] # create distance to cutoff points_samp.sf$dist2cutoff <- as.numeric(sf::st_distance(points_samp.sf, cut_off)) points_samp.sf$distrunning <- points_samp.sf$dist2cutoff # give the non-treated one's a negative score points_samp.sf$distrunning[points_samp.sf$treated == 0] <- -1 * points_samp.sf$distrunning[points_samp.sf$treated == 0] # create borderpoints borderpoints.sf <- discretise_border(cutoff = cut_off, n = 10) borderpoints.sf$id <- 1:nrow(borderpoints.sf) # finally, carry out estimation alongside the boundary: results <- spatialrd(y = "education", data = points_samp.sf, cutoff.points = borderpoints.sf, treated = "treated", minobs = 20, spatial.object = FALSE) printspatialrd(results)
points_samp.sf <- sf::st_sample(polygon_full, 1000) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) # assign treatment: points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id") # first we define a variable for the number of "treated" and control NTr <- length(points_samp.sf$id[points_samp.sf$treated == 1]) NCo <- length(points_samp.sf$id[points_samp.sf$treated == 0]) # the treated areas get a 10 percentage point higher literacy rate points_samp.sf$education[points_samp.sf$treated == 1] <- 0.7 points_samp.sf$education[points_samp.sf$treated == 0] <- 0.6 # and we add some noise, otherwise we would obtain regression coeffictions with no standard errors points_samp.sf$education[points_samp.sf$treated == 1] <- rnorm(NTr, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 1] points_samp.sf$education[points_samp.sf$treated == 0] <- rnorm(NCo, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 0] # create distance to cutoff points_samp.sf$dist2cutoff <- as.numeric(sf::st_distance(points_samp.sf, cut_off)) points_samp.sf$distrunning <- points_samp.sf$dist2cutoff # give the non-treated one's a negative score points_samp.sf$distrunning[points_samp.sf$treated == 0] <- -1 * points_samp.sf$distrunning[points_samp.sf$treated == 0] # create borderpoints borderpoints.sf <- discretise_border(cutoff = cut_off, n = 10) borderpoints.sf$id <- 1:nrow(borderpoints.sf) # finally, carry out estimation alongside the boundary: results <- spatialrd(y = "education", data = points_samp.sf, cutoff.points = borderpoints.sf, treated = "treated", minobs = 20, spatial.object = FALSE) printspatialrd(results)
Text
randinf( depvar, points.sf, lambdapoisl, nruns, id = "id", geometry = TRUE, sentinel_tries = 10, ... )
randinf( depvar, points.sf, lambdapoisl, nruns, id = "id", geometry = TRUE, sentinel_tries = 10, ... )
depvar |
the dependent variable as a string |
points.sf |
sf data.frame containing all points and the relevant variables |
lambdapoisl |
the parameter for the poisson line process (see |
nruns |
number of randomization runs (ideally in the thousands, start by trying a few dozen to check performance and speed) |
id |
the unique id column in the points frame |
geometry |
should the return results frame contain geometries so that all placebo/randomization lines can be plotted on a map? |
sentinel_tries |
sometimes the poisson process creates lines that do not cross any points, thus leading to no control units (expected to happen by chance). This parameter determines how often the function should retry in case this happnes (default is 10). |
... |
pass arguments to |
a randomization inference p-value or, alternatively, a data frame containing all simulated lines with the respective estimates
Baddeley A, Turner R (2005). “spatstat: An R Package for Analyzing Spatial Point Patterns.” Journal of Statistical Software, 12(6), 1–42. doi:10.18637/jss.v012.i06. Baddeley A, Rubak E, Turner R (2015). Spatial Point Patterns: Methodology and Applications with R. Chapman and Hall/CRC Press, London. ISBN 9781482210200
points_samp.sf <- sf::st_sample(polygon_full, 1000) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) # assign treatment: points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id") # first we define a variable for the number of "treated" and control NTr <- length(points_samp.sf$id[points_samp.sf$treated == 1]) NCo <- length(points_samp.sf$id[points_samp.sf$treated == 0]) # the treated areas get a 10 percentage point higher literacy rate points_samp.sf$education[points_samp.sf$treated == 1] <- 0.7 points_samp.sf$education[points_samp.sf$treated == 0] <- 0.6 # and we add some noise, otherwise we would obtain regression coeffictions with no standard errors points_samp.sf$education[points_samp.sf$treated == 1] <- rnorm(NTr, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 1] points_samp.sf$education[points_samp.sf$treated == 0] <- rnorm(NCo, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 0] results.ri <- randinf("education", points_samp.sf, 0.0001, 100)
points_samp.sf <- sf::st_sample(polygon_full, 1000) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) # assign treatment: points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id") # first we define a variable for the number of "treated" and control NTr <- length(points_samp.sf$id[points_samp.sf$treated == 1]) NCo <- length(points_samp.sf$id[points_samp.sf$treated == 0]) # the treated areas get a 10 percentage point higher literacy rate points_samp.sf$education[points_samp.sf$treated == 1] <- 0.7 points_samp.sf$education[points_samp.sf$treated == 0] <- 0.6 # and we add some noise, otherwise we would obtain regression coeffictions with no standard errors points_samp.sf$education[points_samp.sf$treated == 1] <- rnorm(NTr, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 1] points_samp.sf$education[points_samp.sf$treated == 0] <- rnorm(NCo, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 0] results.ri <- randinf("education", points_samp.sf, 0.0001, 100)
This functions takes in a border and can either shift, shrink, or rotate it. All of them can be done together as well.
This usually takes a bit of trial and error, so make sure to plot the result each time.
For a detailed walk through check out the according vignette: vignette(shifting_borders)
.
shift_border( border, operation = c("shift", "scale", "rotate"), shift = c(0, 0), scale = 1, angle = 0 )
shift_border( border, operation = c("shift", "scale", "rotate"), shift = c(0, 0), scale = 1, angle = 0 )
border |
sf object with line geometry |
operation |
|
shift |
if |
scale |
if |
angle |
if |
a new border in the form of an sf object
shift_border(border = cut_off, operation = c("shift", "scale"), shift = c(-5000, -3000), scale = .85) shift_border(border = cut_off, operation = "rotate", angle = 10)
shift_border(border = cut_off, operation = c("shift", "scale"), shift = c(-5000, -3000), scale = .85) shift_border(border = cut_off, operation = "rotate", angle = 10)
This function loops over all boundary points and locally estimates a non-parametric RD (using local linear regression)
using the rdrobust
function from the rdrobust
package from Calonico, Cattaneo, Titiunik (2014).
It takes in the discretized cutoff point file (the RDcutoff, a linestring chopped into parts by the discretise_border
function)
and the sf object (which essentially is just a conventional data.frame with a geometry()
column) containing all the observations (treated and untreated).
The treated indicator variable has to be assigned before (potentially with assign_treated
) and be part of the sf object as a column.
spatialrd( y, data, cutoff.points, treated, minobs = 50, bwfix_m = NA, sample = FALSE, samplesize = NA, sparse.exclusion = FALSE, store.CIs = FALSE, spatial.object = TRUE, ... )
spatialrd( y, data, cutoff.points, treated, minobs = 50, bwfix_m = NA, sample = FALSE, samplesize = NA, sparse.exclusion = FALSE, store.CIs = FALSE, spatial.object = TRUE, ... )
y |
The name of the dependent variable in the points frame in the form of a string |
data |
sf data.frame with points that describe the observations |
cutoff.points |
sf object of borderpoints (provided by user or obtained with |
treated |
column that contains the treated dummy (as string) |
minobs |
the minimum amount of observations in each estimation for the point estimate to be included (default is 50) |
bwfix_m |
fixed bandwidth in meters (in case you want to impose one yourself) |
sample |
draw a random sample of points (default is FALSE) |
samplesize |
if random, how many points |
sparse.exclusion |
in case we want to try to exclude sparse border points before the estimation (should reduce warnings) |
store.CIs |
set TRUE of confidence intervals should be stored |
spatial.object |
return a spatial object (deafult is TRUE, needed if you want to plot the point estimates on a map)? |
... |
in addition you can use all options in |
This function nests rdrobust
. All its options (aside from running variable x
and cutoff c
) are available here as well (e.g. bw selection, cluster level, kernel, weights).
Check the documentation in the rdrobust
package for details. (bandwidth selection default in rdrobust
is bwselect = 'mserd')
To visualise the output, use plotspatialrd
for a graphical representation. You can use printspatialrd
(or an R package of your choice) for a table output. .
a data.frame or spatial data.frame (sf object) in case spatial.object = TRUE (default)
Calonico, Cattaneo and Titiunik (2014): Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs, Econometrica 82(6): 2295-2326.
points_samp.sf <- sf::st_sample(polygon_full, 1000) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) # assign treatment: points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id") # first we define a variable for the number of "treated" and control NTr <- length(points_samp.sf$id[points_samp.sf$treated == 1]) NCo <- length(points_samp.sf$id[points_samp.sf$treated == 0]) # the treated areas get a 10 percentage point higher literacy rate points_samp.sf$education[points_samp.sf$treated == 1] <- 0.7 points_samp.sf$education[points_samp.sf$treated == 0] <- 0.6 # and we add some noise, otherwise we would obtain regression coeffictions with no standard errors points_samp.sf$education[points_samp.sf$treated == 1] <- rnorm(NTr, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 1] points_samp.sf$education[points_samp.sf$treated == 0] <- rnorm(NCo, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 0] # create distance to cutoff points_samp.sf$dist2cutoff <- as.numeric(sf::st_distance(points_samp.sf, cut_off)) points_samp.sf$distrunning <- points_samp.sf$dist2cutoff # give the non-treated one's a negative score points_samp.sf$distrunning[points_samp.sf$treated == 0] <- -1 * points_samp.sf$distrunning[points_samp.sf$treated == 0] # create borderpoints borderpoints.sf <- discretise_border(cutoff = cut_off, n = 10) borderpoints.sf$id <- 1:nrow(borderpoints.sf) # finally, carry out estimation alongside the boundary: results <- spatialrd(y = "education", data = points_samp.sf, cutoff.points = borderpoints.sf, treated = "treated", minobs = 20, spatial.object = FALSE)
points_samp.sf <- sf::st_sample(polygon_full, 1000) # create points # make it an sf object bc st_sample only created the geometry list-column (sfc): points_samp.sf <- sf::st_sf(points_samp.sf) # add a unique ID to each observation: points_samp.sf$id <- 1:nrow(points_samp.sf) # assign treatment: points_samp.sf$treated <- assign_treated(points_samp.sf, polygon_treated, id = "id") # first we define a variable for the number of "treated" and control NTr <- length(points_samp.sf$id[points_samp.sf$treated == 1]) NCo <- length(points_samp.sf$id[points_samp.sf$treated == 0]) # the treated areas get a 10 percentage point higher literacy rate points_samp.sf$education[points_samp.sf$treated == 1] <- 0.7 points_samp.sf$education[points_samp.sf$treated == 0] <- 0.6 # and we add some noise, otherwise we would obtain regression coeffictions with no standard errors points_samp.sf$education[points_samp.sf$treated == 1] <- rnorm(NTr, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 1] points_samp.sf$education[points_samp.sf$treated == 0] <- rnorm(NCo, mean = 0, sd = .1) + points_samp.sf$education[points_samp.sf$treated == 0] # create distance to cutoff points_samp.sf$dist2cutoff <- as.numeric(sf::st_distance(points_samp.sf, cut_off)) points_samp.sf$distrunning <- points_samp.sf$dist2cutoff # give the non-treated one's a negative score points_samp.sf$distrunning[points_samp.sf$treated == 0] <- -1 * points_samp.sf$distrunning[points_samp.sf$treated == 0] # create borderpoints borderpoints.sf <- discretise_border(cutoff = cut_off, n = 10) borderpoints.sf$id <- 1:nrow(borderpoints.sf) # finally, carry out estimation alongside the boundary: results <- spatialrd(y = "education", data = points_samp.sf, cutoff.points = borderpoints.sf, treated = "treated", minobs = 20, spatial.object = FALSE)