reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Logic-Logit: A Logic-Based Approach to Choice Modeling

Authors: Shuhan Zhang, Wendi Ren, Shuang Li

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluation, conducted on both synthetic datasets and real-world data from commercial and healthcare domains, demonstrates that Logic-Logit significantly outperforms baseline models in terms of interpretability and accuracy. ... To validate the effectiveness of our proposed method, this experimental section begins with evaluations of synthetic datasets with clearly defined predicates, ground truth consumer types, and decision rules. We then assess distribution learning and event prediction accuracy using two complex real-world datasets. ... We report the training and testing losses as negative log-likelihoods (Equation 3) and the top-1 accuracy for both datasets, where top-1 accuracy is the proportion of instances where the most selected product is predicted to have the highest choice probability. All results are reported as the average over 10 runs.
Researcher Affiliation	Academia	Shuhan Zhang, Wendi Ren & Shuang Li CUHK-Shenzhen EMAIL,EMAIL
Pseudocode	Yes	The pseudo-code and additional details are provided in Appendix A. ... Algorithm 1 Solving Frank-Wolfe Step using Column Generation ... Algorithm 2 Generate Candidate Rules
Open Source Code	No	The paper does not explicitly state that the source code is available, nor does it provide any links to a code repository.
Open Datasets	Yes	The Expedia Hotel Dataset1 is released in 2013 on a Kaggle competition for improving the recommendation system of Expedia. ... 1https://www.kaggle.com/datasets/vijeetnigam26/expedia-hotel ... MIMIC-IV2 is an electronic health record dataset of patients admitted to the intensive care unit (ICU) (Johnson et al., 2023). ... 2https://mimic.mit.edu/
Dataset Splits	Yes	To prepare our dataset, we divided it into training and testing subsets using a 3:1 ratio based on search timestamps. ... Finally, we randomly sampled 10,000 instances for the training set and 1,000 instances for the test set. ... Finally, we randomly selected 3000 records as the training set and 300 records as the test set.
Hardware Specification	Yes	Our experiments are applicable to both CPU and GPU. For this paper, we use GPU NVIDIA Ge Force RTX 3090.
Software Dependencies	No	The paper mentions the use of the Adam optimizer but does not specify any software libraries or frameworks with version numbers used for implementation.
Experiment Setup	Yes	Hyperparameters setting of Deep learning driven choice models We apply similar hyperparameters for all neural network-based models followed the setting in RUMNet (Aouad & D esir, 2022). We select the standard ADAM optimizer with a learning rate of 0.001 and batch sizes of 32. We set label smoothing as 0.0001, a norm-based regularization method on the neural network s outputs. ... Overall Convergence Criterion: The NLL loss decreases less than 0.001 after a new preference type is found and proportion is updated. The ultimate output rule number for each preference type is set to be 30 for Table 6, 30 and 100 for Table 8. The maximum rule number of the rule set (Rule Prune Threshold) R during column generation is set to be 100. The candidate rule set size generated during the column generation iteration is set to be 100 * searching rule length. The number of percentile thresholds we put on the continuous features to get predicates is set to be 30, i.e. a threshold per 3.3%. We search rules up to 3 conjunctions. For each length-1 to length-3 conjunctions we search for 10, 50, 100 iterations.