reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Poly2Vec: Polymorphic Fourier-Based Encoding of Geospatial Objects for GeoAI Applications

Authors: Maria Despoina Siampou, Jialiang Li, John Krumm, Cyrus Shahabi, Hua Lu

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate POLY2VEC on five diverse tasks, organized into two categories. The first empirically demonstrates that POLY2VEC consistently outperforms objectspecific baselines in preserving three key spatial relationships: topology, direction, and distance. The second shows that integrating POLY2VEC into a state-of-the-art Geo AI workflow improves the performance in two popular tasks: population prediction and land use inference.
Researcher Affiliation	Academia	1Department of Computer Science, University of Southern California, Los Angeles, USA 2Department of People and Technology, Roskilde University, Denmark 3Department of Computer Science, Aalborg University (Copenhagen campus), Denmark. Correspondence to: Maria Despoina Siampou <EMAIL>.
Pseudocode	No	The paper describes mathematical formulations and methodologies but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	2Code available at https://github.com/USC-InfoLab/poly2vec
Open Datasets	Yes	We utilized publicly available Open Street Map (OSM) datasets for Singapore and New York, obtained from Geofabrik9 in .osm.pbf format. Geospatial objects, including POIs, roads, and buildings, were extracted using OSM-specific tags (amenity, shop, tourism, leisure for POIs, motorway, trunk, primary, secondary for roads, and building for buildings). Region partitions were derived from Singapore Subzones10 and NYC Census Tracts11. Dataset statistics are presented in Table 5. 9https://download.geofabrik.de/
Dataset Splits	Yes	The training, validation, and testing ratios for the datasets corresponding to these tasks is 60:20:20. All experiments were run 5 times and we report average performances and standard deviation.
Hardware Specification	Yes	Our experiments are performed on a cluster node equipped with an 18-core Intel i9-9980XE CPU, 125 GB of memory, and two 11 GB NVIDIA Ge Force RTX 2080 Ti GPUs.
Software Dependencies	Yes	Furthermore, all neural network models are implemented based on Py Torch version 2.3.0 with CUDA 11.8 using Python version 3.9.19.
Experiment Setup	Yes	We set the minimum frequency fmin = 0.1, the maximum frequency fmax = 1.0 and W = 10, resulting in 210 frequencies. We set the final size of the geometry embedding v to d = 32. All the MLPs consist of two layers with Re LU activation functions. For training on the spatial reasoning tasks, we utilize the Adam W optimizer and set the learning rate lr = 10 4 and weight decay wd = 10 8. The batch size is set to 128, and the downstream models were trained for 20 epochs.