SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery

Authors: Konstantin Klemmer, Esther Rolf, Caleb Robinson, Lester Mackey, Marc Rußwurm

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we use Sat CLIP embeddings to improve performance on nine diverse geospatial prediction tasks including temperature prediction, animal recognition, and population density estimation. Sat CLIP consistently outperforms alternative location encoders and shows promise for improving geographic domain adaptation.
Researcher Affiliation Collaboration 1Microsoft Research 2CU Boulder 3Microsoft AI for Good Research Lab 4Wageningen University & Research EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the Sat CLIP objective and encoder architectures using mathematical formulas and text, but it does not include a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps.
Open Source Code Yes We release the pretrained encoder as a Py Torch model. We also release our new globally distributed pretraining dataset, S2-100K. ... Code for Sat CLIP pretraining and downstream experiments as well as the S2-100K dataset is available at https://github.com/microsoft/satclip.
Open Datasets Yes To construct our pretraining dataset, S2-100K, we sample 100, 000 tiles of 256 256 pixel, multi-spectral (12-channel) Sentinel-2 satellite imagery and their associated centroid locations. ... Code for Sat CLIP pretraining and downstream experiments as well as the S2-100K dataset is available at https://github.com/microsoft/satclip. ... In all datasets, the inputs are raw latitude/longitude coordinates, which we transform into location embeddings. The nine downstream datasets we choose for evaluation span socioeconomic and environmental applications. To evaluate the degree to which location embeddings capture socioeconomic factors, we regress Median Income (Jia and Benson 2020), California Housing prices (Pace and Barry 2003), and logged Population Density (Rolf et al. 2021). We predict variables including Air Temperature (Hooker, Duveiller, and Cescatti 2018) and Elevation (Rolf et al. 2021) from coordinates as environmental regression objectives. We additionally classify Biomes, Ecoregions (Dinerstein et al. 2017), and compile a new country code classification task Countries. Lastly, we classify i Naturalist species (Horn et al. 2018).
Dataset Splits Yes We use 90% of the data points, selected uniformly at random, for pretraining and reserve the remaining 10% as a validation set to monitor overfitting. ... All results are reported for an unseen test set. ... For these experiments, we deploy a spatial train/test split strategy: We hold out entire continents, either Africa or Asia, as test sets and use the remaining data for model training and validation this emulates a frequent problem of spatial data gaps in real-world applications. We test a few-shot domain adaptation setting, where we add a small sample (1%, uniformly sampled) of test continent points to the training set...
Hardware Specification Yes We train models for 500 epochs on an A100 GPU.
Software Dependencies No The paper mentions 'Py Torch' and frameworks like 'Siren' and 'Res Net18, Res Net50, and Vi T16', but it does not specify any version numbers for these software components or other libraries, which is required for reproducible description.
Experiment Setup Yes During pretraining, we found that batch sizes of 8k help the model to learn more fine-grained representations... We train models for 500 epochs on an A100 GPU. ... Hyperparameters like learning rate, number of layers, or hidden dimensions are tuned using a random search on an independent validation set. ... The spatial smoothness of the representation is controlled by the number of Legendre polynomials L. This effectively defines the resolution of the location encoding and its capacity to learn small and large-scale geospatial patterns, with larger L corresponding to finer spatial resolution.