A-I-RAVEN and I-RAVEN-Mesh: Two New Benchmarks for Abstract Visual Reasoning

Authors: Mikołaj Małkiński, Jacek Mańdziuk

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate 13 strong models from the AVR literature on the introduced datasets, revealing their specific shortcomings in generalization and knowledge transfer.
Researcher Affiliation Academia 1Warsaw University of Technology, Warsaw, Poland 2AGH University of Krakow, Krakow, Poland EMAIL, EMAIL
Pseudocode No The paper describes methods and processes but does not contain a clearly labeled pseudocode or algorithm block.
Open Source Code Yes The code for reproducing all experiments is publicly accessible at: https://github.com/mikomel/raven
Open Datasets Yes First, we introduce Attributeless-I-RAVEN (A-I-RAVEN), comprising 10 generalization regimes. Next, we propose I-RAVEN-Mesh, a variant of I-RAVEN with a new grid-like structure overlaid on the matrices. The released code allows for generation of all datasets from scratch, eliminating the dependency on file-hosting services required to distribute the data.
Dataset Splits Yes In each experiment, we utilize 42 000 training, 14 000 validation, and 14 000 test matrices, following the standard data split protocol taken in prior works [Zhang et al., 2019a; Hu et al., 2021].
Hardware Specification Yes Experiments were run on a worker with a single NVIDIA DGX A100 GPU.
Software Dependencies No The paper mentions using the Adam optimizer with specific parameters and that the training job is packaged as a Docker image with fixed dependencies, but it does not explicitly list software dependencies with specific version numbers within the text.
Experiment Setup Yes In all experiments we use the Adam optimizer [Kingma and Ba, 2014] with β1 = 0.9, β2 = 0.999, ϵ = 10 8 and a batch size set to 128. Learning rate is initialized to 0.001 and reduced 10-fold (at most 3 times) if no progress is seen in the validation loss in 5 subsequent epochs, and training stops early in the case of 10 epochs without progress.