Fully General Online Imitation Learning

Authors: Michael K. Cohen, Marcus Hutter, Neel Nanda

JMLR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now walk through a toy example, in which our imitation learner has about a halfmillion demonstrator models in its model class Π... Running it with 20 different random seeds, the number of queries required is 486.75 52.63 (out of 215 timesteps), and no client ever quit. Returning to run depicted in Figure 1, Table 1 works through an example of the posterior and the imitator s behavior.
Researcher Affiliation Collaboration Michael K. Cohen EMAIL Department of Engineering Science University of Oxford Future of Humanity Institute Oxford, UK OX1 3PJ Marcus Hutter EMAIL Deep Mind Department of Computer Science Australian National University Acton, ACT, Australia 2601 Neel Nanda EMAIL Independent
Pseudocode No The paper describes the imitator's policy in Section 4, 'Imitation', using mathematical equations (3) and (4) but does not present the steps in a structured pseudocode or algorithm block format.
Open Source Code Yes The code for this toy example can be found at https://tinyurl.com/imitation-toy-example.
Open Datasets No The action space A of the demonstrator is null {0, 1}4. The observation space O is { , 1, 2, 3}. ... Each observation is randomly sampled; it is 1 with probability 1/4, 2 with probability 1/16, 3 with probability 1/64, and otherwise .
Dataset Splits No The paper describes a synthetic environment for a 'toy example' where observations are randomly sampled during online learning. It mentions running with '20 different random seeds' but does not specify train/test/validation splits for a fixed dataset.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, or memory) used for running the experiments or simulations.
Software Dependencies No The paper mentions a URL for the code, which might imply a programming language like Python, but it does not specify any particular software libraries, frameworks, or their version numbers.
Experiment Setup Yes For an imitator with α = 1e-14, Figure 1 shows how often it has to query the demonstrator to pick the restaurant features. Recommendations are random, and this is only one run. Running it with 20 different random seeds, the number of queries required is 486.75 52.63 (out of 215 timesteps), and no client ever quit.