reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

SONICS: Synthetic Or Not - Identifying Counterfeit Songs

Authors: Awsaf Rahman, Zaber Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, Anowarul Fattah

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The comparative analysis of the proposed Spec TTTra models against other existing models is presented in Table 4. The results reveal a significant performance gain (6% for Conv Ne Xt, 8% for Efficient Vi T, 10% for Vi T, and 17% for Spec TTTra-α) in the overall F1 score when using long songs. This finding substantiates our claim that leveraging long-context information is crucial for enhancing fake song detection. [...] We conduct an ablation study to highlight the importance of both temporal and spectral tokens, with the findings summarized in Table 7.
Researcher Affiliation	Academia	Md Awsafur Rahman UC Santa Barbara, USA EMAIL Zaber Ibn Abdul Hakim , Najibul Haque Sarker Virginia Tech, USA EMAIL Bishmoy Paul Santa Clara University, USA EMAIL Shaikh Anowarul Fattah BUET, Bangladesh EMAIL
Pseudocode	Yes	Second, the pseudo-code for the Spectro-Temporal Tokenizer of the Spec TTTra model is presented in the Appendix.
Open Source Code	Yes	1Code & Data available at https://github.com/awsaf49/sonics
Open Datasets	Yes	1Code & Data available at https://github.com/awsaf49/sonics [...] Finally, as these fake songs are generated through paid subscriptions that allow for the use and sharing of content, our dataset will be made publicly available under a CC BY-NC 4.0 license.
Dataset Splits	Yes	We conduct all experiments using the proposed SONICS dataset, which is divided into train, valid, and test sets. To ensure comprehensive evaluation, the valid and test sets include cases with unseen algorithms (e.g., Suno v2, Suno v3, Udio 32) and unseen singers. We also prevent data leakage by ensuring that song pairs from the same (lyrics, style) inputs are exclusively in either the training or valid-test sets, not in both. The distribution of the train, test, and valid sets is shown in Table 3.
Hardware Specification	Yes	We conduct our training on an NVIDIA A6000 GPU with 48GB RAM, using Wand B for tracking. [...] To comprehensively evaluate the efficiency of the proposed Spec TTTra model alongside other methods, we measure various metrics across different song lengths using a P100 16GB GPU.
Software Dependencies	No	We use Vi T-small (patch size = 16) and Conv Ne Xt-tiny along with Efficient Vi T-B2 from the timm (Wightman, 2019) library. [...] For calculating FLOPs, we employed the fvcore (FAIR, 2023) library.
Experiment Setup	Yes	To train models, we resampled both real and fake songs to 16k Hz and generated spectrograms with n fft = win length = 2048, hop length = 512, and n mels = 128, yielding a 128 128 spectrogram for 5 sec and 128 3744 for 120 sec audio. Any song shorter than input length is zero-padded randomly, while for longer songs, a random crop is used. We also apply Mix Up (Zhang, 2017) and Spec Augment (Park et al., 2019) augmentations during training to improve generalization. [...] We train all models for 50 epochs from scratch using Binary Cross Entropy loss with 0.02 label smoothing (Szegedy et al., 2016). Optimization is performed with Adam W (Loshchilov, 2017) and a cosine learning rate scheduler from timm, including a 5-epoch warm-up.