LCEN: A Nonlinear, Interpretable Feature Selection and Machine Learning Algorithm
Authors: Pedro Seber, Richard Braatz
TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In a wide variety of artificial and empirical datasets, LCEN constructed sparse and frequently more accurate models than other methods, including sparse, nonlinear methods, on tested datasets. LCEN was empirically observed to be robust against many issues typically present in datasets and modeling, including noise, multicollinearity, and data scarcity. As a feature selection algorithm, LCEN matched or surpassed the thresholded elastic net but was, on average, 10.3-fold faster based on our experiments. LCEN for feature selection can also rediscover multiple physical laws from empirical data. As a machine learning algorithm, when tested on processes with no known physical laws, LCEN achieved better results than many other dense and sparse methods including being comparable to or better than ANNs on multiple datasets. |
| Researcher Affiliation | Academia | Pedro Seber EMAIL Department of Chemical Engineering Massachusetts Institute of Technology Richard D. Braatz EMAIL Department of Chemical Engineering Massachusetts Institute of Technology |
| Pseudocode | Yes | Algorithm 1 LASSO-Clip-EN (LCEN) Input: X and y data; lists of hyperparameters alpha, l1_ratio, degree, lag; hyperparameters cutoff, trans_type, interaction, transform_y # LASSO step: filters features without requiring a combinatorially large number of potential hyperparameters, as l1_ratio is fixed. Determines the degree and lag hyperparameters for feature expansion. Temporarily set l1_ratio = 1. ... Algorithm 2 Feature expansion for LCEN |
| Open Source Code | No | LCEN is free, open-source, and easy to use, allowing even non-specialists in machine learning to benefit from and use it. Its main limitations are that (1) LCEN is not a universal function approximator, as it can model only the functions present in the expansion of dataset features, (2) its feature expansion algorithm is better suited to numerical data over image or text data, and (3) LCEN is not always as accurate as a dense deep learning method. If enough compute and time are available for model training, users in scenarios that focus on accuracy above anything else or with non-numerical data types may prefer to use a deep learning method. |
| Open Datasets | Yes | Table A1: Datasets used in this work and their sources. ... CARMENES star data Schweitzer et al. (2019) [link to dataset] Kepler s 3rd Law Kepler et al. (1997) (Original from 1619) Diesel Freezing Point Hutzler & Westbrook (2000) [link to dataset] Abalone Nash et al. (1995) Concrete Compressive Strength Yeh (1998) [dataset: Yeh (2007)] Boston housing (modified by us) Harrison & Rubinfeld (1978) [link to dataset] GEFCom 2014 Hong et al. (2016) [link to dataset] |
| Dataset Splits | Yes | All models tested in this work had their hyperparameters selected by 5-fold cross-validation (CV) and this CV procedure was repeated for 3 different seeds so that the average standard deviation of results can be reported, except for those trained on the GEFCom 2014 dataset, which used time series cross-validation. ... For the Diesel freezing point dataset, 30% of the dataset was randomly separated to form the test set. For the Abalone dataset, the last 1,044 entries (25%) were used as the test set as per Waugh (1995) and Clark et al. (1996). For the Concrete Compressive Strength dataset, 25% of the dataset was randomly separated to form the test set as per Yeh (1998). For the Boston housing dataset, 20% of the dataset was randomly separated to form the test set. For the GEFCom 2014 dataset, the data from task 1 were used as the training set and all data from tasks 2 15 were used as the test set. |
| Hardware Specification | Yes | All experiments were done in a personal computer equipped with a 13th Gen Intel Core i5-13600K CPU, 64 GB of DDR4 RAM, and an NVIDIA Ge Force RTX 4090 GPU. |
| Software Dependencies | No | For the LASSO and ridge regression models: α = 0 and 20 log-spaced values between 4.3 and 0 (as per np.logspace(-4.3,0,20)). ... Using sklearn s Polynomial Features function, generate polynomial (and interaction if the hyperparameter interaction is True) transforms of the X data for the given degree. ... Learning rates equal to [0.0005, 0.001, 0.005, 0.01, 0.05], the Adam W optimizer, the Re LU and tanhshrink activation functions, a batch size of 32, weight decay with λ equal to [0, 0.01, 0.05, 0.08, 0.1], 100 epochs, and a cosine scheduler with a minimum learning rate equal to 1/16 of the original learning rate with 10 epochs of warm-up were also used. |
| Experiment Setup | Yes | A3 Appendix List of hyperparameters used in this work All possible permutations of the hyperparameters below were cross-validated. 1. For the LASSO and ridge regression models: α = 0 and 20 log-spaced values between 4.3 and 0 (as per np.logspace(-4.3,0,20)). 2. For the elastic net (EN) models: α as above and L1 ratios equal to [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 0.97, 0.99]. ... 7. For the LCEN models: α and L1 ratios as above. degree values equal to [1, 2, 3] were typically used, except when otherwise indicated (such as in the Relativistic energy dataset). lag = 0 was used, except for the GEFCom 2014 dataset, which used lag = 168. cutoff values between 1 10 3 and 5.5 10 1 were used; higher values were used only when intentionally creating models with fewer selected features. A cutoff = 0 is used in the ablation tests for the LASSO-EN model (Section A4). |