reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Digi-Q: Learning VLM Q-Value Functions for Training Device-Control Agents

Authors: Hao Bai, Yifei Zhou, Li Li, Sergey Levine, Aviral Kumar

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The goal of our experiments is to evaluate the efficacy of Digi-Q in producing effective Q-functions that in turn are able to train strong Android device control agents. Our experiments will answer the following questions: (1) How does Digi-Q compare with other state-of-the-art agent training algorithms, previously studied in the context of Android device control tasks? and (2) Can Digi-Q learn effectively from past interaction data? In addition, we perform several ablation experiments to understand the effects of various components of Digi-Q: to understand the benefits of using representation fine-tuning and to validate the efficacy of the Best-of-N reranking approach for training the policy using the value function.
Researcher Affiliation	Collaboration	Hao Bai UIUC Yifei Zhou UC Berkeley Erran Li Amazon Sergey Levine UC Berkeley Aviral Kumar CMU Equal Contribution, correspondance to: EMAIL, yifei EMAIL
Pseudocode	Yes	A DETAILS ON THE ALGORITHM For completeness, we include a detailed pseudo-code of Digi-Q in Algorithm 1. Algorithm 1 Digi-Q: Practical Framework
Open Source Code	Yes	The project is open-sourced at https://github.com/Digi RL-agent/digiq
Open Datasets	Yes	We evaluate our results on Android-in-the-Wild (Ait W) with offline dataset containing 1296 trajectories for Ait W Web Shopping subset and 1008 trajectories from Ait W General subset from pre-trained Auto UI checkpoint, following Bai et al. (2024).
Dataset Splits	No	The paper mentions that it evaluates results with the autonomous evaluator with the first 96 instructions in the train and test set, and collects 1296 trajectories for Webshop and 1008 for General subsets. However, it does not explicitly provide the specific percentages or counts for the training, validation, and test splits used for the model training process from these total trajectories.
Hardware Specification	No	We thank Google Cloud for providing Gemini 1.5 Pro credit donations for academic use and some GPU and TPU resources. We also thank the NCSA Delta cluster admins for providing us with GPU resources for training.
Software Dependencies	No	We encode the text strings with BERT and images with BLIP-2 model. Then we concatenate all these feature vectors and pass them through a MLP that tries to predict the V value. We use LLa Va-1.5 (Liu et al., 2024a) for the backbone VLM for our Qand Vfunctions.
Experiment Setup	Yes	Hyperparameters for Digi-Q are carefully tuned through binary search on the training set of General and Web Shopping subsets. The final choice of hyperparameters for both methods can be found in Table 4. ... Table 4: Hyperparameters for Digi-Q on both General and Web Shopping subset of Ait W. If multiple values are displayed, the bolded value represents the selected value after hyperparamemter sweeping.