reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Optimal Threshold Labeling for Ordinal Regression Methods

Authors: Ryoya Yamasaki

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments with real-world datasets, changing the labeling procedure of existing 1DT-based methods to the proposed one improved the classiﬁcation performance in many tried cases. We performed numerical experiments using real-world (RW) datasets to answer the question, whether a modiﬁed 1DT-based method with the EOT labeling can yield better classiﬁcation performance (i.e., smaller test task risk) for the OR task than existing 1DT-based methods using other labeling functions.
Researcher Affiliation	Academia	Ryoya Yamasaki EMAIL Department of Systems Science Graduate School of Informatics, Kyoto University 36-1 Yoshida-Honmachi, Sakyo-ku, Kyoto 606-8501 JAPAN
Pseudocode	Yes	The threshold parameters for the EOT labeling can be computed with a dynamic programming-based algorithm (Algorithm 1) mentioned in Lin & Li (2006); see also a researchers site (https://home.work. caltech.edu/~htlin/program/orensemble/) of Lin & Li (2006), and Section C of our paper for its optimality guarantee. This algorithm ﬁrst sorts unique elements of { 𝑎(x𝑖)}𝑛 𝑖=1 to { 𝑎𝑗}, and takes advantage of the recurrence relation (46) of the minimizer of the empirical task risk for sample points 𝑖 s s.t. 𝑎(x𝑖) 𝑎𝑗 along the ascending order of { 𝑎𝑗} to calculate threshold parameters minimizing the empirical task risk. It costs a computational complexity of quasi-linear order O(𝑛log 𝑛) regarding the training sample size 𝑛, which stems from the sorting operation in Line 2 while the rest operation in Lines 3 15 costs a computational complexity of O(𝑛𝐾).
Open Source Code	Yes	One can obtain the variousdomain datasets from a researchers site (http://www.uco.es/grupos/ayrna/orreview) of Gutierrez et al. (2015) or our Git Hub repository (https://github.com/yamasakiryoya/OTL). ... See https://github.com/yamasakiryoya/OTL for program codes that we used.
Open Datasets	Yes	In the experiments, we used the 17 various-domain datasets, COL (contact-lenses), PAS (pasture), SQ1 (squash-stored), SQ2 (squash-unstored), BON (bondrate), TAE (tae), AUT (automobile), NEW (newthyroid), TOY (da Costa et al., 2008), ESL (employee selection), BAS (balance-scale), EUQ (eucalyptus), LEV (lectures evaluation), ERA (employee rejection/acceptance), SWD (social workers decision), WQR (winequality-red), CAR (car evaluation) datasets, and the 3 face-age datasets, MORPH (MORPH Album2), CACD, and AFAD datasets (Ricanek & Tesafaye, 2006; Chen et al., 2014; Niu et al., 2016). ... One can obtain the variousdomain datasets from a researchers site (http://www.uco.es/grupos/ayrna/orreview) of Gutierrez et al. (2015) or our Git Hub repository (https://github.com/yamasakiryoya/OTL). We purchased the MORPH dataset at https://ebill.uncw.edu/C20231_ustores/web/... The CACD dataset can be downloaded from https://bcsiriuschen.github.io/CARC/... For the AFAD dataset obtainable at https:// github.com/afad-dataset/tarball...
Dataset Splits	Yes	For the various-domain datasets, we randomly divided each dataset into 72 % training, 8 % validation, and 20 % test sets. For the face-age datasets, we resized all images to 128 128 3 pixels (3 stems from RGB channels) and randomly divided each dataset into 72 % training, 8 % validation, and 20 % test sets, and the training phase used images randomly cropped with the size of 120 120 3 pixels as input to improve the stability of the model against the diﬀerence of facial positions, and validation and test phases used images center-cropped to the same size, following procedures by Cao et al. (2020).
Hardware Specification	No	The paper does not explicitly specify any hardware details like GPU/CPU models, memory amounts, or cloud instance types used for running experiments. It mentions using ResNet-34 architecture and Adam optimizer, but these are not hardware specifications.
Software Dependencies	No	The paper mentions using 'Adam' as an optimization procedure. It also refers to 'program codes published in https://github.com/Raschka-research-group/ coral-cnn by Cao et al. (2020)' and 'mlxtend: Providing machine learning and data science utilities and extensions to python’s scientiﬁc computing stack. Journal of Open Source Software, 3(24):638, 2018.' for preprocessing. However, it does not explicitly list the specific versions of programming languages (e.g., Python), deep learning frameworks (e.g., PyTorch, TensorFlow), or CUDA versions used for the main experimental setup, which are necessary for reproducible software dependencies.
Experiment Setup	Yes	In each trial, we trained the network using Adam of the learning rate 10 2.5 and mini-batch size 256 (or 16 when the training sample size is less than 256) as an optimization procedure for 500 epochs when 𝑛tot 2000 (i.e., for the various-domain datasets) or 100 epochs otherwise (i.e., for the face-age datasets).