reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Transfer Learning in Information Criteria-based Feature Selection

Authors: Shaohan Chen, Nikolaos V. Sahinidis, Chuanhou Gao

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, simulation studies and applications with real data demonstrate the usefulness of the TLCp scheme.
Researcher Affiliation	Academia	Shaohan Chen EMAIL School of Mathematical Sciences Zhejiang University Hangzhou 310027, China; Nikolaos V. Sahinidis EMAIL H. Milton Stewart School of Industrial & Systems Engineering and School of Chemical & Biomolecular Engineering Georgia Institute of Technology Atlanta, GA 30332, USA; Chuanhou Gao EMAIL School of Mathematical Sciences Zhejiang University Hangzhou 310027, China
Pseudocode	Yes	Algorithm 1 Using the approximate Cp method to select features; Algorithm 2 Using the approximate TLCp method to select features for the target task
Open Source Code	Yes	The source code for reproducing the experimental results is available at https://github.com/Shaohan-Chen/ Transfer-learning-in-Mallows-Cp.
Open Datasets	Yes	In this subsection, we evaluate the performance of the proposed TLCp method on school data used by Bakker and Heskes (2003), Argyriou et al. (2008) and Zhou et al. (2011)... We ﬁnally test the proposed TLCp methods using the Parkinson’s telemonitoring data set from the UCI Machine Learning Repository (Tsanas et al., 2009).
Dataset Splits	Yes	For each target data size (n = 210, 250, 290), we randomly split the target data set (furnace A) 300 times with n samples as the training set and the remaining 100 samples as the test set. For each target sample size (n = 130, 150, 170), we divide the target data set into 10000 random splits with n samples as the training data and the remaining 30 samples as the test data. For each sample size (n = 100, 110), we randomly split the target data set 5000 times with n samples as the training set and the remaining 30 as the test set.
Hardware Specification	Yes	All experiments in this paper were conducted on a computer with a 6-core, 2.60-GHz CPU and 16-GB memory.
Software Dependencies	Yes	We use the software package from Zhou et al. (2011) and Mathworks (2017) to solve these two multi-task methods. We implement the aforementioned benchmarks based on the statistics and machine learning toolbox (Mathworks, 2017).
Experiment Setup	Yes	We chose the tuning parameter of the Cp model (4) as λ = 2, and set the parameters of the TLCp model (8) λ1, λ2, λ3, λ4 according to the tuning rules stated in Corollary 15 or Theorem 20, as λ1 = 1, λ2 = 1, λi 3 = 4/δ2 i (i = 1, , k), λ4 2. We tune the hyperparameters of the proposed TLCp methods with two tasks based on Theorem 20, as λ 1 = ˆσ2 2, λ 2 = ˆσ2 1, λt 3 = 4ˆσ2 1 ˆσ2 2 ˆδ2 t (t = 1, , k) and λ 4 = mini {1, ,k}...