GenVP: Generating Visual Puzzles with Contrastive Hierarchical VAEs
Authors: Kalliopi Basioti, Pritish Sahu, Qingze Liu, Zihao Xu, Hao Wang, Vladimir Pavlovic
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on five different datasets indicate that Gen VP achieves state-of-the-art (SOTA) performance both in puzzle-solving accuracy and out-of-distribution (OOD) generalization in 22 OOD scenarios. |
| Researcher Affiliation | Collaboration | Kalliopi Basioti1, {Pritish Sahu2 , Qingze Tony Liu1}, Zihao Xu1, Hao Wang1, Vladimir Pavlovic1 1Rutgers University, 2SRI International EMAIL EMAIL, EMAIL |
| Pseudocode | No | The paper describes the generative and inference models with equations and textual descriptions, and presents a graphical model in Figure 1, but it does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code for the methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We assessed Gen VP with the RAVEN-based (RAVEN (Zhang et al., 2019a), I-RAVEN (Hu et al., 2021) and RAVEN-FAIR (Benny et al., 2021)) and the VAD (Hill et al., 2019) and PGM (Barrett et al., 2018). |
| Dataset Splits | Yes | Each training set consists of 1.2 million puzzles, and each testing set consists of 200,000 puzzles. The training set consists of 600K examples, and the testing set consists of 100K. |
| Hardware Specification | Yes | All the models are trained on a server with 24GB NVIDIA RTX A5000 GPUs, 512GM RAM, and Ubuntu 20.04. For the efficiency and scalability evaluations, we used a server with characteristics of 48GB NVIDIA RTX A6000 GPUs and Dual AMD EPYC 7352 @ 2.3GHz = 48 cores, 96 v Cores CPU. |
| Software Dependencies | No | The paper mentions 'Adam W algorithm (Loshchilov & Hutter, 2017)' and 'Py Torch' but does not specify their version numbers. |
| Experiment Setup | Yes | In both cases, we used the Adam W algorithm (Loshchilov & Hutter, 2017) with a learning rate 10 4. We set the batch size to B = {RAVEN-based: 100, PGM: 400, VAD: 400} RPM puzzles, which means that we use B valid puzzles for ELBO and global contrasting and a batch size of A = {RAVEN-based: 7, PGM: 7, VAD: 3} for the local contrasting loss. For the β hyperparameters we set them to β1 = 1, β2 = 0, β3 = 1, β4 = 1, β5 = 1, β6 = 1, βR = 250, βG = 20, βL = 20. |