Conformalized Interactive Imitation Learning: Handling Expert Shift and Intermittent Feedback
Authors: Michelle Zhao, Henny Admoni, Reid Simmons, Aaditya Ramdas, Andrea Bajcsy
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare Conformal DAgger to prior uncertainty-aware DAgger methods in scenarios where the distribution shift is (and isn’t) present because of changes in the expert’s policy. We find that in simulated and hardware deployments on a 7DOF robotic manipulator, Conformal DAgger detects high uncertainty when the expert shifts and increases the number of interventions compared to baselines, allowing the robot to more quickly learn the new behavior. [...] We instantiate Conformal DAgger in a simulated 4D robot goal-reaching task and in hardware on a 7 degree-of-freedom robotic manipulator [...] |
| Researcher Affiliation | Academia | a Robotics Institute, School of Computer Science, Carnegie Mellon University b Departments of Statistics and Machine Learning, Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 1 Conformal DAgger (changes from DAgger (Ross et al., 2011) highlighted) |
| Open Source Code | No | The paper states: "Project page at cmu-intentlab.github.io/conformalized-interactive-il/." This is a project page and does not explicitly state that the source code for the methodology is provided, nor is it a direct link to a code repository. |
| Open Datasets | Yes | We test on three benchmark datasets from Angelopoulos et al. (2023): (1) Amazon stock prices, (2) Google stock prices (Nguyen, 2018), and the (3) Elec2 dataset (Harries, 1999). |
| Dataset Splits | No | The paper describes an iterative online learning process where data is aggregated after each deployment episode. For the time series datasets, it mentions a 'lookback window k = 100 timesteps' for the nonconformity score calculation but does not specify explicit training/test/validation splits for model training or evaluation of the conformal intervals. |
| Hardware Specification | No | The paper mentions "hardware deployments on a 7DOF robotic manipulator" and "a 7 degree-of-freedom robotic manipulator" and a "Meta Quest 3 remote controller". However, it does not specify the computational hardware (e.g., GPU/CPU models, memory) used for running or training the models. |
| Software Dependencies | No | The paper states that base prediction models were "all trained via darts (Herzen et al., 2022)" and refers to using a "CNN-based diffusion policy (Chi et al., 2023)" and a "Res Net-18 visual encoder". However, it does not provide specific version numbers for any of these software components or libraries. |
| Experiment Setup | Yes | Conformal DAgger uses an uncertainty threshold = 0.06, temperature β = 100, lookback window k = 100, lr = 0.6, and initial qlo,hi 0 = 0.01. [...] Ensemble DAgger [...] We use 3 ensemble members and an uncertainty threshold of = 0.06 for the ensemble disagreement and a safety classifier threshold of s = 0.03. Lazy DAgger uses s = 0.03 to begin human intervention, and only switches back to autonomous mode when the deviation between the learner s prediction and expert s action are below a context-switching threshold, 0.1 s. To make Safe DAgger s number of initial interventions comparable, we decrease the safety classifier threshold to s = 0.01. The initial robot policy r i=0 is trained on a dataset D0 of 10 expert trajectories with synthetically injected noise drawn from N(1, 0.5). |