Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Unsupervised Sound Separation Using Mixture Invariant Training
Authors: Scott Wisdom, Efthymios Tzinis, Hakan Erdogan, Ron Weiss, Kevin Wilson, John Hershey
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments, Separation performance is measured using scale-invariant signal-to-noise ratio (SI-SNR) [25]. Results on single anechoic and reverberant 2-source mixtures are shown in Figure 2, and results on single-source inputs are in Appendix E. |
| Researcher Affiliation | Collaboration | Scott Wisdom Google Research EMAIL Efthymios Tzinis UIUC EMAIL Hakan Erdogan Google Research EMAIL Ron J. Weiss Google Research EMAIL Kevin Wilson Google Research EMAIL John R. Hershey Google Research EMAIL |
| Pseudocode | No | The paper describes the method using text and mathematical equations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Mix IT code on Git Hub. https://github.com/google-research/sound-separation/tree/master/models/neurips2020_mixit. |
| Open Datasets | Yes | For speech separation experiments, we use the WSJ0-2mix [17] and Libri2Mix [9] datasets, sampled at 8 k Hz and 16 k Hz. We also employ the reverberant spatialized version of WSJ0-2mix [44] and a reverberant version of Libri2Mix we created... For our experiments, we use the recently released Free Universal Sound Separation (FUSS) dataset [47, 48] |
| Dataset Splits | No | For training, 3 second clips are used for WSJ0-2mix, and 10 second clips for Libri2Mix. Evaluation always uses single mixtures of two sources. On a held out test set, the supervised model achieves 15.0 d B SI-SNRi for speech, and the unsupervised Mix IT model achieves 11.4 d B SI-SNRi. The paper discusses training and testing data usage, but does not specify validation splits or exact percentages/counts for train/test/validation splits for reproduction. |
| Hardware Specification | Yes | All models are trained on 4 Google Cloud TPUs (16 chips) with the Adam optimizer [22], a batch size of 256, and learning rate of 10 3. |
| Software Dependencies | No | The paper mentions software components like |
| Experiment Setup | Yes | All models are trained on 4 Google Cloud TPUs (16 chips) with the Adam optimizer [22], a batch size of 256, and learning rate of 10 3. |