A Selective Under-Sampling (SUS) Method for Imbalanced Regression

Authors: Jovana Aleksic, Miguel García-Remesal

JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We assessed this method on 15 regression data sets from different imbalanced domains, 5 synthetic high-dimensional imbalanced data sets and 2 more complex imbalanced age estimation image data sets. Our results suggest that SUS and SUSiter typically outperform other state-of-the-art techniques like SMOGN, or random under-sampling, when used with neural networks as learners.
Researcher Affiliation Academia Jovana Aleksic EMAIL Universidad Politecnica de Madrid, Spain Weill Cornell Medicine in Qatar, Qatar Miguel Garcıa-Remesal EMAIL Biomedical Informatics Group, Departamento de Inteligencia Artificial, Universidad Politecnica de Madrid, Spain
Pseudocode Yes Algorithm 1 SUS Algorithm 2 SUSiter
Open Source Code No The paper does not provide any explicit statements about the release of source code, nor does it include links to repositories for the methodology described.
Open Datasets Yes For evaluating the performance of the presented approaches, we used three different types of data sets standard imbalanced data, synthetic high-dimensional imbalanced data, and more complex age estimation image data sets IMDB-WIKI (Rothe et al., 2018) and Age DB (Moschoglou et al., 2017). Table 1: Standard data sets information. Table 2: Synthetic high-dimensional data sets information. Table 3: Image data sets information.
Dataset Splits Yes The test and validation data is balanced as well. Selected test data covers 20% of the whole corresponding data set and is completely held away from the training process. After curation, the final dataset consists of 191,500 images for training, with 11,000 images each allocated to validation and testing.
Hardware Specification Yes We compare speeds on a desktop i Mac 2017 machine with 4 cores, and 3.5 GHz Quad-Core Intel Core i5 processor.
Software Dependencies No The paper mentions 'Python implementation by Kunz 2020', 'Adam optimization (Kingma & Ba, 2014)', 'Res Net50 model (He et al., 2016)', but does not provide specific version numbers for Python, PyTorch/TensorFlow, or other key libraries used for the experiments.
Experiment Setup Yes Training is run for 300 epochs, we use Adam optimization (Kingma & Ba, 2014), and a learning rate of 10-2. For all experiments involving the IMDB-WIKI and Age DB datasets, we utilize the Res Net50 model (He et al., 2016). Each model is trained over 90 epochs using the Adam optimizer (Kingma & Ba, 2014), starting with an initial learning rate of 10-3, which is reduced by a factor of 0.1 at the 60th and 80th epochs. The batch size is consistently set to 256.