Multi Objective Quantile Based Reinforcement Learning for Modern Urban Planning

Authors: Lukasz Pelcner, Leandro Soriano Marcolino, Matheus Aparecido do Carmo Alves, Paula A. Harrison, Peter M. Atkinson

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also present experimental results, validating our approach and discussing its broader implications for future urban planning, along with potential extensions to enhance scalability and applicability. The primary metrics used for evaluation include: To evaluate the performance of our proposed method, we performed experiments considering three stakeholders preferences: We benchmarked our framework against three state-of-the-art baselines across four scenarios reflecting diverse stakeholder preferences. Our method consistently outperformed these baselines, achieving statistically significant improvements in key metrics such as land-use alignment, woodland preservation, and urban proximity.
Researcher Affiliation Academia 1Lancaster University 2University of S ao Paulo 3UK Centre for Ecology & Hydrology EMAIL , EMAIL and EMAIL
Pseudocode Yes Pseudocode for QOLU closely follows a Distributional DQN with quantile regression, as shown in Algorithm 1. Algorithm 2 defines BUIA s pseudo code, showing the critical role of the attractiveness metric in guiding bottom-up decisions.
Open Source Code No The paper does not contain an explicit statement about the release of source code for the methodology described, nor does it provide a direct link to a code repository. The text discusses implementation details but not public availability.
Open Datasets Yes 6 UKCEH Dataset In environmental research, the availability of extensive and long-term datasets is crucial for the development and application of advanced algorithms. The UK Centre for Ecology & Hydrology (UKCEH) stands as a key contributor, offering a valuable repository of data that not only informs scientific endeavors but also facilitates practical applications in real-life scenarios. This section highlights this dataset s utility as a crucial piece in developing and applying an RL MAS. As users of the UKCEH data, our primary objective is to leverage the land use dataset to enhance our understanding of the environmental dynamics in the southwest region of the United Kingdom. Spanning the years 2015 to 2021, this dataset serves as a vital component in our larger effort to create and implement a reinforcement learning multi-agent system, designed to navigate the complexities of real-world scenarios. Our interest lies in the practical application of this dataset as we work towards developing algorithms that can adapt and learn within dynamic environmental contexts. By incorporating the UKCEH s land use data, we aim to enrich our understanding of the region, enabling our reinforcement learning multi-agent system to operate effectively in real-life settings. [UK Centre for Ecology & Hydrology, 2023] UK Centre for Ecology & Hydrology. Ukceh land cover map (2015 2021 series). https://www.ceh.ac.uk/services/land-cover-map, 2023. Accessed: 2025-06-10.
Dataset Splits No The paper describes a simulated environment where agents interact and make decisions, rather than using a static dataset with predefined training, validation, and test splits. The experimental setup details how the environment is partitioned and how parcels are sampled during the simulation, but not in the context of standard dataset splits. The text states: "The environment is partitioned into blocks of 40 40 pixel units, referred to as parcels, as shown in Figure 1." and "Simulation Iteration and Land Conversion. An iteration of the simulation is deemed complete once the reward allocation is finalized. It is noteworthy that during each iteration, 50 parcels are sampled with replacement, allowing for the potential conversion of up to 11 land types into urban areas."
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, memory specifications, or accelerator types.
Software Dependencies No The paper mentions the use of 'Optimizer Adam' in Table 1 but does not specify version numbers for any key software components, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes Table 1: QOLU s architecture and hyper-parameters information. Conv. Block 3x3 kernels, 32 64 128 filters Flatten Layer Residual MLP 2 fully-connected layers (256 units) + skip connection around the pair Quantile Head 51 atoms with V [-200, 200] Optimizer Adam Mini-batch size 256 Discount factor γ 0.99 Exploration ϵ-greedy (schedule not fixed) Table 2: BUIA s architecture information. Conv. block 1 convolutional layer, kernel and filter counts matching the input MLP 128 128 64 Output Softmax over current candidate parcels. The process is repeated for 1000 iterations.