reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Decision-Focused Learning: Foundations, State of the Art, Benchmark and Future Opportunities

Authors: Jayanta Mandi, James Kotary, Senne Berden, Maxime Mulamba, Victor Bucarey, Tias Guns, Ferdinando Fioretto

JAIR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	It evaluates the strengths and limitations of these techniques and includes an extensive empirical evaluation of eleven methods across seven problems. The survey also offers insights into recent advancements and future research directions in DFL. Section 5 brings forth seven benchmark DFL tasks from public datasets, with a comparative evaluation of eleven DFL techniques.
Researcher Affiliation	Academia	Jayanta Mandi EMAIL KU Leuven, Belgium, James Kotary EMAIL University of Virginia, USA, Senne Berden EMAIL KU Leuven, Belgium, Maxime Mulamba EMAIL Vrije Universiteit Brussel, Belgium, Vıctor Bucarey EMAIL Universidad de O Higgins, Chile, Tias Guns EMAIL KU Leuven, Belgium, Ferdinando Fioretto EMAIL University of Virginia, USA
Pseudocode	Yes	Algorithm 1 Gradient-descent in prediction-focused learning, Algorithm 2 Gradient-descent in decisionfocused learning with regret as task loss
Open Source Code	Yes	The code and data used in the benchmarking are accessible through https://github.com/Pred Opt/predopt-benchmarks.
Open Datasets	Yes	The code and data used in the benchmarking are accessible through https://github.com/Pred Opt/predopt-benchmarks. Section 5 brings forth seven benchmark DFL tasks from public datasets, with a comparative evaluation of eleven DFL techniques.
Dataset Splits	Yes	For the sake of completeness, the data generation process is described below.1 Each problem instance has cost vector of dimension \|E\| = 40 and feature vector of dimension p = 5. The training data consists of {(zi, ci)}N i=1, which are generated synthetically. ... Furthermore, for each setting, a different training set of of size 1000 is used. In each case, the final performance of the model is evaluated on a test set of size 10, 000.
Hardware Specification	No	The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU/CPU models, memory specifications, or details of computing clusters.
Software Dependencies	Yes	Each model for each setup is trained using pytorch (Paszke et al., 2019) and pytorch-lightning (Falcon et al., 2019) with the adam optimizer (Kingma & Ba, 2015) and reduce LROn Plateau (Py Torch, 2017) learning rate scheduler. For QPTL, the QP problems are solved using cvxpylayers (Agrawal et al., 2019a). For other techniques, which treat the CO solver as a blackbox solver, gurobi (Gurobi Optimization, 2021) or ortools (Perron & Furnon, 2020) is used as the solver.
Experiment Setup	Yes	In the experimental evaluations, hyperparameter tuning is performed via grid search. In the grid search, each of the hyperparameters is tried for a set of values. The set of values to be tested for each hyperparameter is predetermined. The hyperparameter of each model for each experiment is selected based on performance on the validation dataset. For each hyperparameter a range of values as defined in Table 3 is considered.