reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Reinforcement Learning from Optimization Proxy for Ride-Hailing Vehicle Relocation

Authors: Enpeng Yuan, Wenbo Chen, Pascal Van Hentenryck

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments on the New York City dataset show that the RLOP approach reduces both the relocation costs and computation time signiﬁcantly compared to the optimization model, while pure reinforcement learning fails to converge due to computational complexity. The proposed RLOP framework is evaluated on Yellow Taxi Data in Manhattan, New York City (NYC, 2019). Section 6 reports the experimental results on a large-scale case study in New York City.
Researcher Affiliation	Academia	Enpeng Yuan EMAIL Wenbo Chen EMAIL Pascal Van Hentenryck EMAIL School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
Pseudocode	Yes	Algorithm 1: RLOP
Open Source Code	No	The paper does not explicitly state that the authors' implementation code is open-source, nor does it provide a link to a code repository. It mentions using Gurobi and Pytorch, which are third-party tools.
Open Datasets	Yes	The proposed RLOP framework is evaluated on Yellow Taxi Data in Manhattan, New York City (NYC, 2019). ... NYC (2019). Nyc taxi & limousine commission trip record data.. Accessed: 2020-10-01.
Dataset Splits	Yes	The optimization proxy is trained from 2017/01 to 2017/05, 8am 9am, Monday to Friday... In total, 15,000 data points are used in training and 2500 data points are held out for testing. ... the policy is validated on other instances in 2017/05 after each training episode
Hardware Specification	Yes	All the models are solved using Gurobi 9.1 with 24 cores of 2.1 GHz Intel Skylake Xeon CPU (Gurobi Optimization, 2021).
Software Dependencies	Yes	All the models are solved using Gurobi 9.1 ... It is trained in Pytorch by Adam optimizer
Experiment Setup	Yes	Speciﬁcally, the MLP has two hidden layers of (128, 128) units with hyperbolic tangent (tanh) activation functions. It is trained in Pytorch by Adam optimizer with batch size 32 and learning rate 10-3 ... Algorithm 1 with the baseline is run with α = 0.005, β = 0.75 and γ = 0.75. The sampling variance Σii is taken as 0.05a0.75 i where a0.75 i is the 75th percentile of action ai in the supervised-learning data set.