Advancing Generalization Across a Variety of Abstract Visual Reasoning Tasks

Authors: Mikołaj Małkiński, Jacek Mańdziuk

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experiments demonstrate strong generalization capabilities of the proposed model, which in several settings outperforms the existing literature methods.
Researcher Affiliation Academia 1Warsaw University of Technology, Warsaw, Poland 2AGH University of Krakow, Krakow, Poland EMAIL, EMAIL
Pseudocode No The paper describes the model architecture (Fig. 5) and its components in detail but does not present any structured pseudocode or algorithm blocks.
Open Source Code Yes The code for reproducing all experiments is publicly accessible at: https://github.com/mikomel/raven
Open Datasets Yes We utilize four RPM datasets: PGM [Barrett et al., 2018], I-RAVEN [Hu et al., 2021], I-RAVEN-Mesh and A-I-RAVEN [Małki nski and Ma ndziuk, 2025a]. We extend the evaluation of Po NG beyond RPMs, to two benchmarks comprising visual analogies with both synthetic [Hill et al., 2019] and real-world [Bitton et al., 2023] images.
Dataset Splits Yes PGM: Each regime contains 1.42M RPMs, where 1.2M, 20K, and 200K belong to the train, validation, and test splits, resp. I-RAVEN: The benchmark consists of 10K matrices per configuration, totaling 70K matrices, split into the train, validation, and test with a 60/20/20 ratio. VAP: Each regime contains 710K matrices, with 600K, 10K, and 100K devoted to the train, validation, and test splits, resp. VASR: We use Silver data for training, which includes 150K, 2.25K, and 2.55K matrices in the train, validation, and test splits, resp.
Hardware Specification Yes All experiments are performed on a single GPU (NVIDIA DGX A100).
Software Dependencies No The paper mentions the Adam optimizer, Vision Transformer (ViT), and TCN, but does not specify any software versions for these or other libraries.
Experiment Setup Yes Po NG is trained using a standard training strategy involving the Adam optimizer [Kingma and Ba, 2014] with default hyperparameters (λ = 0.001, β1 = 0.9, β2 = 0.999, ϵ = 10 8). Learning rate λ is reduced by a factor of 10 after 5 epochs without improvement in validation loss. Early stopping is applied after 10 epochs without validation loss reduction. We use batch size B = 128 for experiments on RAVEN-like datasets and B = 256 in the remaining cases to reduce training time on large datasets.