A Selective Under-Sampling (SUS) Method for Imbalanced Regression
Authors: Jovana Aleksic, Miguel García-Remesal
JAIR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We assessed this method on 15 regression data sets from different imbalanced domains, 5 synthetic high-dimensional imbalanced data sets and 2 more complex imbalanced age estimation image data sets. Our results suggest that SUS and SUSiter typically outperform other state-of-the-art techniques like SMOGN, or random under-sampling, when used with neural networks as learners. |
| Researcher Affiliation | Academia | Jovana Aleksic EMAIL Universidad Politecnica de Madrid, Spain Weill Cornell Medicine in Qatar, Qatar Miguel Garcıa-Remesal EMAIL Biomedical Informatics Group, Departamento de Inteligencia Artificial, Universidad Politecnica de Madrid, Spain |
| Pseudocode | Yes | Algorithm 1 SUS Algorithm 2 SUSiter |
| Open Source Code | No | The paper does not provide any explicit statements about the release of source code, nor does it include links to repositories for the methodology described. |
| Open Datasets | Yes | For evaluating the performance of the presented approaches, we used three different types of data sets standard imbalanced data, synthetic high-dimensional imbalanced data, and more complex age estimation image data sets IMDB-WIKI (Rothe et al., 2018) and Age DB (Moschoglou et al., 2017). Table 1: Standard data sets information. Table 2: Synthetic high-dimensional data sets information. Table 3: Image data sets information. |
| Dataset Splits | Yes | The test and validation data is balanced as well. Selected test data covers 20% of the whole corresponding data set and is completely held away from the training process. After curation, the final dataset consists of 191,500 images for training, with 11,000 images each allocated to validation and testing. |
| Hardware Specification | Yes | We compare speeds on a desktop i Mac 2017 machine with 4 cores, and 3.5 GHz Quad-Core Intel Core i5 processor. |
| Software Dependencies | No | The paper mentions 'Python implementation by Kunz 2020', 'Adam optimization (Kingma & Ba, 2014)', 'Res Net50 model (He et al., 2016)', but does not provide specific version numbers for Python, PyTorch/TensorFlow, or other key libraries used for the experiments. |
| Experiment Setup | Yes | Training is run for 300 epochs, we use Adam optimization (Kingma & Ba, 2014), and a learning rate of 10-2. For all experiments involving the IMDB-WIKI and Age DB datasets, we utilize the Res Net50 model (He et al., 2016). Each model is trained over 90 epochs using the Adam optimizer (Kingma & Ba, 2014), starting with an initial learning rate of 10-3, which is reduced by a factor of 0.1 at the 60th and 80th epochs. The batch size is consistently set to 256. |