Reinforcement Learning from Optimization Proxy for Ride-Hailing Vehicle Relocation

Authors: Enpeng Yuan, Wenbo Chen, Pascal Van Hentenryck

JAIR 2022 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments on the New York City dataset show that the RLOP approach reduces both the relocation costs and computation time significantly compared to the optimization model, while pure reinforcement learning fails to converge due to computational complexity. The proposed RLOP framework is evaluated on Yellow Taxi Data in Manhattan, New York City (NYC, 2019). Section 6 reports the experimental results on a large-scale case study in New York City.
Researcher Affiliation Academia Enpeng Yuan EMAIL Wenbo Chen EMAIL Pascal Van Hentenryck EMAIL School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA
Pseudocode Yes Algorithm 1: RLOP
Open Source Code No The paper does not explicitly state that the authors' implementation code is open-source, nor does it provide a link to a code repository. It mentions using Gurobi and Pytorch, which are third-party tools.
Open Datasets Yes The proposed RLOP framework is evaluated on Yellow Taxi Data in Manhattan, New York City (NYC, 2019). ... NYC (2019). Nyc taxi & limousine commission trip record data.. Accessed: 2020-10-01.
Dataset Splits Yes The optimization proxy is trained from 2017/01 to 2017/05, 8am 9am, Monday to Friday... In total, 15,000 data points are used in training and 2500 data points are held out for testing. ... the policy is validated on other instances in 2017/05 after each training episode
Hardware Specification Yes All the models are solved using Gurobi 9.1 with 24 cores of 2.1 GHz Intel Skylake Xeon CPU (Gurobi Optimization, 2021).
Software Dependencies Yes All the models are solved using Gurobi 9.1 ... It is trained in Pytorch by Adam optimizer
Experiment Setup Yes Specifically, the MLP has two hidden layers of (128, 128) units with hyperbolic tangent (tanh) activation functions. It is trained in Pytorch by Adam optimizer with batch size 32 and learning rate 10-3 ... Algorithm 1 with the baseline is run with α = 0.005, β = 0.75 and γ = 0.75. The sampling variance Σii is taken as 0.05a0.75 i where a0.75 i is the 75th percentile of action ai in the supervised-learning data set.