Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout
Authors: Zhao Chen, Jiquan Ngiam, Yanping Huang, Thang Luong, Henrik Kretzschmar, Yuning Chai, Dragomir Anguelov
NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that Grad Drop outperforms the state-of-the-art multiloss methods within traditional multitask and transfer learning settings, and we discuss how Grad Drop reveals links between optimal multiloss training and gradient stochasticity. |
| Researcher Affiliation | Industry | Zhao Chen Waymo LLC Mountain View, CA 94043 EMAIL Jiquan Ngiam Google Research Mountain View, CA 94043 EMAIL Yanping Huang Google Research Mountain View, CA 94043 EMAIL Thang Luong Google Research Mountain View, CA 94043 EMAIL Henrik Kretzschmar Waymo LLC Mountain View, CA 94043 EMAIL Yuning Chai Waymo LLC Mountain View, CA 94043 EMAIL Dragomir Anguelov Waymo LLC Mountain View, CA 94043 EMAIL |
| Pseudocode | Yes | Algorithm 1 Gradient Sign Dropout Layer (Grad Drop Layer) |
| Open Source Code | No | The paper does not provide an unambiguous statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | We also rely exclusively on standard public datasets, and thus move discussion of most dataset properties to the Appendices. [...] We first test Grad Drop on the multitask learning dataset Celeb A [26] [...] We transfer Image Net2012 [5] to CIFAR-100 [21] [...] 3D vehicle detection from point clouds on the Waymo Open Dataset [42]. |
| Dataset Splits | No | The paper states that it 'relies exclusively on standard public datasets' and conducts 'training runs', but does not explicitly provide specific details on the train/validation/test dataset splits (e.g., percentages, sample counts, or a detailed splitting methodology) required for reproduction. |
| Hardware Specification | Yes | All experiments are run on NVIDIA V100 GPU hardware. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library or solver names with versions like Python 3.8, CPLEX 12.4) needed to replicate the experiment. |
| Experiment Setup | Yes | We will provide relevant hyperparameters within the main text, but we relegate a complete listing of hyperparameters to the Appendix. For many of our experiments, we renormalize the final gradients so that ||r||2 remains constant throughout the Grad Drop process. For our final Grad Drop model we use a leak parameter i set to 1.0 for the source set. All runs include gradient clipping at norm 1.0. |