Active Fine-Tuning of Multi-Task Policies
Authors: Marco Bagatella, Jonas Hübotter, Georg Martius, Andreas Krause
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experiment section is designed to evaluate active multi-task fine-tuning and provide an empirical answer to several questions. |
| Researcher Affiliation | Academia | 1Department of Computer Science, ETH Z urich, Z urich, Switzerland 2Max Planck Institute for Intelligent Systems, T ubingen, Germany 3University of T ubingen, T ubingen, Germany. Correspondence to: Marco Bagatella <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 AMF Input: initial policy π0, budget N, desired task distr. µc Output: fine-tuned policy πN Initialize dataset D0 = for n [0, . . . , N 1] do Compute cn as the solution to Eq. 2 Collect new demonstration τn for task cn if n + 1 % B = 0 then Dn+1 = Dn+1 B {cn B+1:n, τn B+1:n} Update πn+1 from πn+1 B with Dn+1 end if end for |
| Open Source Code | Yes | In order to ease reproducibility, we open-source our codebase on the project s repo.4 4github.com/marbaga/amf |
| Open Datasets | Yes | In Metaworld (Yu et al., 2020) we create a scene... In Franka Kitchen (Fu et al., 2020)... We consider the Robomimic benchmark... Octo is pretrained on a large-scale real-world robotic dataset (Collaboration, 2023)... |
| Dataset Splits | No | The paper describes how pre-training demonstrations are allocated across tasks and how evaluation is performed over task distributions, but does not specify fixed training/test/validation splits for the policy learning process itself. The data collection is interactive. |
| Hardware Specification | No | The paper mentions 'GPU acceleration' but does not specify any particular GPU models or other detailed hardware specifications for running the experiments. |
| Software Dependencies | No | The paper mentions using a 'sentence transformer (all-Mini LM-L6-v2)' and 'Adam W optimizer' but does not provide specific version numbers for key software libraries or programming languages used for implementation. |
| Experiment Setup | Yes | The MLP policy has with 2 layers and 256 units per layer, with layer normalization (Ba, 2016). Policies are pre-trained for 200 epochs with batch size of 256, learning rate of 10 4 using the Adam W optimizer(Loshchilov & Hutter, 2019). Each fine-tuning round involves 3000 gradient steps, each with a batch size of 256. |