Dimension Reduction and MARS

Authors: Yu Liu LIU, Degui Li, Yingcun Xia

JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical studies including both simulation and empirical applications show its effectiveness in dimension reduction and improvement over MARS and other commonly-used nonparametric methods in regression estimation and prediction.
Researcher Affiliation Academia Yu Liu EMAIL School of Mathematical Sciences University of Electronic Science and Technology of China, China Degui Li EMAIL Department of Mathematics University of York, UK Yingcun Xia EMAIL Department of Statistics and Data Science National University of Singapore, Singapore and School of Mathematical Sciences University of Electronic Science and Technology of China, China
Pseudocode No The paper describes the MARS algorithm in paragraph text in Section 2, but does not present it as a structured pseudocode or algorithm block.
Open Source Code Yes The source codes for g KDR and dr MARS as well as all the relevant files can be downloaded from https://github.com/liuyu-star/dr MARS.
Open Datasets Yes The data (https://archive.ics.uci.edu/ml/datasets/concrete+compressive+strength) is about the concrete compressive strength (Y ) and its dependence on concrete s ingredients and age (X). The data (www.kaggle.com/harlfoxem/housesalesprediction) contains house sale prices for King County in US including Seattle between May 2014 and May 2015. The data (archive.ics.uci.edu/ml/datasets/Parkinsons+Telemonitoring) is composed of a range of biomedical voice measurements... The data (https://archive.ics.uci.edu/ml/datasets/Residential+Building+Data+Set) contains construction cost, project variables, and economic variables... The data (www.kaggle.com/datasets/muratkokludataset/pistachio-dataset) includes a total of N = 2, 148 images... The data (https://archive.ics.uci.edu/ml/datasets/Hill-Valley) contains N = 1, 212 records... The data (www.kaggle.com/datasets/cnic92/200-financial-indicators-of-us-stocks-20142018) includes N = 986 US stocks in year 2018...
Dataset Splits Yes We randomly select n = min(1000, N/3 ) or n = min(2000, 2N/3 ) observations as the training set, and the remaining observations as the testing set, and repeat the random splitting 100 times. The dimension of SDR space is selected using the 10-fold CV described in Section 4.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running its experiments.
Software Dependencies No All methods are implemented with R. Specifically, package dr (Weisberg, 2002) for p Hd, cve function in package CVar E for CVE, package MAVE for MAVE, package earth (Milborrow et al., 2017) for MARS, svm function in package e1071 (Dimitriadou et al., 2008) for SVM, package random Forest (Liaw and Wiener, 2002) for RF are used in our numerical studies. The paper lists software packages but does not specify their version numbers or the version of R used.
Experiment Setup Yes For all the R functions, their default values of tuning parameters are used. In addition, as the random rotation is a commonly-used ensemble method (e.g., Blaser and Fryzlewicz, 2016; Cannings and Samworth, 2017; Bagnall et al., 2018), we also include it in our comparison, denoted by RAND. We randomly select n = min(1000, N/3 ) or n = min(2000, 2N/3 ) observations as the training set, and the remaining observations as the testing set, and repeat the random splitting 100 times. The dimension of SDR space is selected using the 10-fold CV described in Section 4.