Spatio-Causal Patterns of Sample Growth

Authors: Andre F. Ribeiro

JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate these theoretic patterns in the full American census from 1840 to 1940, and samples ranging from the street-level all the way to the national. This reveals new conditions for the generalizability of samples over space and time, and connections among the Shapley value, counterfactual statistics, and hyperbolic geometry. We then consider 100 years of the American census (and all variables in the census) as case study. For each cross-section (decade), we consider the important task of predicting economic growth for over 60K individual locations under increasing spatial samples. We demonstrate how (1) generalizability tradeoffs evolve across spatial levels, and (2) repeat the validation of generalizability limits derived in [27] for the spatial domain, and with the current census micro-data.
Researcher Affiliation Academia ANDRE F. RIBEIRO , Harvard University, USA and University of Sao Paulo, Brazil
Pseudocode No The paper describes methods and processes using mathematical formulations and descriptive text, but it does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code No The paper does not contain an explicit statement about releasing source code, nor does it provide any links to a code repository.
Open Datasets Yes The datasets analyzed are available in the IPUMS repository[17]. IPUMS. U.S. Individual-level Census (United States Bureau of the Census). 2022. url: https://usa.ipums.org/ usa/complete_count.shtml.
Dataset Splits No The paper mentions using 'held-out sample' for accuracy calculation and that 'One million location and year were chosen randomly', but it does not specify the percentages, exact counts, or methodology used for training, validation, or test splits of the dataset.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, or cluster specifications) used to run the experiments.
Software Dependencies No The paper lists various types of models and algorithms used (e.g., 'Neural Network Models, Generalized Linear Models, Boosting Models'), but it does not specify the names or version numbers of any particular software libraries, frameworks, or solvers.
Experiment Setup No The paper states that 'Detailed description of algorithms used, and their hyperparameter optimization, can be found on [27],' deferring the crucial experimental setup details to a separate publication rather than providing them in the main text.