Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension
Authors: Yijun Dong, Yicheng Li, Yunai Li, Jason D. Lee, Qi Lei
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experiments We conduct experiments to validate the theoretical findings on both synthetic and real tasks. In this section, we focus on two illustrative settings: synthetic regression (Section 4.1) and real-world image regression (Section 4.2). For brevity, we defer more experiments on image and sentiment classification tasks to Appendices E.2 and E.3, respectively. |
| Researcher Affiliation | Academia | 1New York University 2Shanghai Jiaotong University 3Princeton University. Correspondence to: Yijun Dong <EMAIL>, Qi Lei <EMAIL>. |
| Pseudocode | No | The paper describes methods and analyses using mathematical formulations and textual explanations but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about the release of source code, nor does it provide links to code repositories or mention code in supplementary materials. |
| Open Datasets | Yes | 4.2. UTKFace regression Beyond the synthetic regression, we investigate W2S on a real-world image regression task age estimation on the UTKFace dataset (Zhang et al., 2017). |
| Dataset Splits | Yes | UTKFace (Aligned & Cropped) (Zhang et al., 2017) consists of 23, 708 face images with age labels... We preprocess the images... and split the dataset into training and testing sets of sizes 20, 000 and 3, 708. |
| Hardware Specification | No | The experiments are supported by the PLI computing cluster. YD acknowledges support of NYU Courant Instructorship. JDL acknowledges support of Open Philanthropy, NSF IIS 2107304, NSF CCF 2212262, NSF CA- REER Award 2144994, and NSF CCF 2019844. This material is based upon work supported by the U.S. Department of Energy, Office of Science Energy Earthshot Initiative as part of the project Learning reduced models under extreme data conditions for design and rapid decision-making in complex systems under Award #DE-SC0024721. |
| Software Dependencies | No | We use ridge regression with a small fixed regularization hyperparameter αw, αw2s, αs, αc = 10^-6, close to the machine epsilon of single precision floating point numbers. ... We train the models with cross-entropy loss and Adam W optimizer. ... All training is conducted via Adam optimizers (Kingma & Ba, 2014) with a learning rate of 5e-5, a cosine learning rate schedule, and 40 warmup steps. |
| Experiment Setup | Yes | We use ridge regression with a small fixed regularization hyperparameter αw, αw2s, αs, αc = 10^-6, close to the machine epsilon of single precision floating point numbers. ... We preprocess the images to 224 x 224 pixels and split the dataset into training and testing sets of sizes 20, 000 and 3, 708. ... We train the models with cross-entropy loss and Adam W optimizer. We tune the training hyperparameters of weak and strong models using a validation set and train them for 800 steps with a learning rate 1e-3 and weight decay 1e-6. ... All training is conducted via Adam optimizers (Kingma & Ba, 2014) with a learning rate of 5e-5, a cosine learning rate schedule, and 40 warmup steps. We train for 3 epochs, which is sufficient for the train and validation losses to stabilize. |