Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Online Non-stochastic Control with Partial Feedback

Authors: Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, empirical studies in both synthetic linear and simulated nonlinear tasks validate our method s effectiveness, thus supporting the theoretical findings. Keywords: online non-stochastic control, partial feedback, dynamic regret, online ensemble, online learning with memory, bandit convex optimization
Researcher Affiliation Academia Yu-Hu Yan EMAIL Peng Zhao EMAIL Zhi-Hua Zhou EMAIL National Key Laboratory for Novel Software Technology Nanjing University, Nanjing 210023, China
Pseudocode Yes Algorithm 1 Base Algorithm Input: Memory H, dimension d, domain K, shrinkage α, perturbation δ, step size η. 1 Initialize the corresponding variables. 2 Initialize w1, . . . , w H, any feasible decisions for the first H rounds. for t = H + 1, . . . , T do 3 Submit decision wt and receive cost ft(wt H:t) + εt. 4 Draw a random bit bt Bernoulli(1/H). if t H and bt QH 1 i=1 (1 bt i) = 1 then 5 Estimate the gradient via (4.3). 6 Update the decision via (4.4). else 7 Maintain the decision wt+1 = wt. end end
Open Source Code No The paper does not contain any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets No The paper describes experiments in "synthetic linear" and "simulated nonlinear environments" such as "pendulum," "cartpole," and "data center cooling." These are either generated data or standard simulation setups, but the paper does not provide concrete access information (e.g., URLs, DOIs, specific citations to public dataset repositories) for any external publicly available datasets.
Dataset Splits No The paper describes experiments in synthetic and simulated environments (time-varying linear dynamical system, pendulum, cartpole, data center cooling) and discusses how cost functions change (gradual, abrupt, mixture). It mentions restarting algorithms periodically to handle non-linearity. However, it does not specify explicit training/test/validation dataset splits (e.g., percentages, sample counts, or references to predefined splits) in the traditional sense for empirical evaluation.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU models, CPU models, or memory specifications) used for running its experiments. It only mentions general experimental settings without hardware specifics.
Software Dependencies No The paper mentions "Open AI Gym" in the context of the cartpole environment. However, it does not provide specific version numbers for Open AI Gym or any other software libraries, frameworks, or programming languages used in the implementation.
Experiment Setup No The paper states: "All hyperparameters are set to be theoretically optimal except the learning rate of the meta learners, which are scaled by constants to speed up the learning process." This is a general statement about hyperparameter tuning but does not provide concrete values for any specific hyperparameters (e.g., learning rate, batch size, number of epochs, specific optimizer settings) used in the experiments.