Enhancing Prediction Performance through Influence Measure
Authors: Shuguang Yu, Wenqian Xu, Xinyi Zhou, Xuechun Wang, Hongtu Zhu, Fan Zhou
ICLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The effectiveness of these methods is demonstrated through extensive simulations and real-world datasets. In this section, we present experimental results that illustrate how our newly proposed metrics, FIutil and FIactive, contribute to enhancing model prediction performance. We compare our algorithms with state-of-the-art strategies in both data trimming and active learning scenarios, thereby validating the effectiveness of our approach on both simulated and real-world datasets. |
| Researcher Affiliation | Academia | Shuguang Yua , Wenqian Xua, Xinyi Zhoua, Xuechun Wanga, Hongtu Zhub, and Fan Zhoua a Shanghai University of Finance and Economics, Shanghai, China b University of North Carolina at Chapel Hill, North Carolina, USA |
| Pseudocode | Yes | Algorithm 1 Calculation of FIutil; Algorithm 2 Calculation of FIactive; Algorithm 3 Data Trimming; Algorithm 4 Active Learning; Algorithm 5 Calculation of FIutil approx; Algorithm 6 Data Trimming with Approximation; Algorithm 7 Calculation of FIactive approx; Algorithm 8 Active Learning with Approximation |
| Open Source Code | No | The paper discusses methods and experiments but does not provide an explicit statement about releasing its own source code nor a link to a repository for the methodology described in this paper. |
| Open Datasets | Yes | Adult. Access Link: Adult Database; Bank. Access Link: Bank Database; Celeb A. Access Link: Celeb A Database; Jigsaw Toxicity. Access Link: Jigsaw Toxicity Database; MNIST. Access Link: MNIST Database; EMNIST. Access Link: EMNIST Database; CIFAR10. Access Link: CIFAR10 Database; Office-31. Access Link: Office-31 Dataset; AG News. Access Link: AG News Dataset |
| Dataset Splits | Yes | Validation on 2D Linear Model. Each dataset comprises 150 training samples, 100 validation samples, and 600 test samples. Validation on 2D Nonlinear Model. Each dataset comprises 500 training samples, 250 validation samples, and 250 test samples. Adult. This dataset contains 37,692 samples, with 30,162 for training and 7,530 for testing. ... we randomly sample 5,000 instances from the original training set to serve as the new training set, and similarly, 4,200 instances from the original test set to serve as the new test set. |
| Hardware Specification | No | The paper discusses computational complexity and performance but does not provide specific hardware details such as GPU/CPU models or memory used for running its experiments. |
| Software Dependencies | No | All differentiation operations can be easily computed using backpropagation (Goodfellow et al., 2016) in deep learning libraries such as Tensor Flow (Abadi et al., 2016) and Py Torch (Paszke et al., 2017). However, specific version numbers for these libraries are not provided. |
| Experiment Setup | Yes | Table 3: Parameter Settings of Data Trimming. Adult, Bank, Celeb A, Jigsaw Toxicity: optimizer SGD/Adam, learning rate 1e-2/1e-4, weight decay 1e-2/1e-6. Adult +noise, Bank +noise, Celeb A +noise, Jigsaw Toxicity +noise: optimizer SGD/Adam, learning rate 1e-2/1e-1, weight decay 1e-2/1e-4. |