reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Conformalized Interval Arithmetic with Symmetric Calibration

Authors: Rui Luo, Zhixin Zhou

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7 Application and Experiment In this section, we will introduce two main applications of the proposed methods. Then we will analyze the empirical performance of our algorithms on public datasets. [...] 7.4 Results We split the calibration and test sets equally as indicated by (7), repeating this process 100 times to record the average results and the standard deviation. Figure 2 presents the results for constructing prediction sets for subsets without overlaps, specifically for the Bike Sharing, Community Crime, and Medical Expenditure Panel Survey datasets.
Researcher Affiliation	Collaboration	Rui Luo1*, Zhixin Zhou2 1City University of Hong Kong, Hong Kong SAR, China 2Alpha Benito Research, Los Angeles, USA EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: CIA with Symmetric Calibration [...] Algorithm 2: Stratified CIA with Symmetric Calibration
Open Source Code	Yes	Code https://github.com/luo-lorry/CIA [...] The code of our method will be published as open source.
Open Datasets	Yes	1. Bike Sharing (Fanaee-T 2013): This dataset is used to investigate the factors influencing bike rental demand. [...] 2. Community Crime (Redmond 2009): This dataset is used to predict the per capita violent crime rate of a community [...] 3. Medical Expenditure Panel Survey (Af HRa Q 2021): This dataset is used to predict the utilization of medical services [...] Dataset: Road Traffic in Anaheim and Chicago (Bar Gera, Stabler, and Sall 2023).
Dataset Splits	Yes	We use 70% of data for training the quantile regression model and 30% for calibration and testing. [...] and allocate 50%, 10%, and 40% for training, validation, and calibration and testing.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment.
Experiment Setup	No	The paper describes general training procedures and data splits but lacks specific hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) for the models used in the experiments.