Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Wavelet Networks: Scale-Translation Equivariant Learning From Raw Time-Series
Authors: David W. Romero, Erik J Bekkers, Jakub M. Tomczak, Mark Hoogendoorn
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically evaluate wavelet networks. To this end, we take existing neural architectures designed to process raw signals and construct equivalent wavelet networks (W-Nets). We then compare the performance of W-Nets and the corresponding baselines on tasks defined on raw environmental sounds, raw audio and raw electric signals. We replicate as close as possible the training regime of the corresponding baselines and utilize their implementation as a baseline whenever possible. Detailed descriptions of the specific architectures as well as the hyperparameters used for each experiment are provided in Appx. C. |
| Researcher Affiliation | Collaboration | David W. Romero EMAIL NVIDIA Research Erik J. Bekkers EMAIL Universiteit van Amsterdam Jakub M. Tomczak EMAIL Technische Universiteit Eindhoven Mark Hoogendoorn EMAIL Vrije Universiteit Amsterdam |
| Pseudocode | No | The paper describes the architecture and methods verbally and mathematically, but does not include any specific sections or figures labeled as pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/dwromero/wavelet_networks. |
| Open Datasets | Yes | First, we consider the task of classifying environmental sounds on the Urban Sound8K (US8K) dataset (Salamon et al., 2014). The US8K dataset consists of 8732 audio clips uniformly drawn from 10 environmental sounds, e.g., siren, jackhammer, etc, of 4 seconds or less, with a total of 9.7 hours of audio. Next, we consider the task of automatic music tagging on the Magna Tag ATune (MTAT) dataset (Law et al., 2009). The MTAT dataset consists of 25879 audio clips with a total of 170 hours of audio, along with several per-song tags. |
| Dataset Splits | Yes | For the comparison with the 1DCNN of Abdoli et al. (2019), we select the 50999-1DCNN as baseline, as it is the network type that requires the less human engineering. We note, however, that we were unable to replicate the results reported in Abdoli et al. (2019). In contrast to the 83 1,3% reported, we were only able to obtain a final accuracy of 62.0 6.791. This inconsistency is further detailed in Appx. C.1. To compare to models other than Mn-nets and 1DCNNs, e.g., Pons et al. (2017a); Tokozume & Harada (2017), we also provide 10-fold cross-validation results. This is done by taking 8 of the 10 official subsets for training, one for validation and one for test. We consistently select the (n 1)mod10 subset for validation when testing on the n-th subset. Finally, we also validate Wavelet networks for the task of condition monitoring in induction motors. To this end, we classify healthy and faulty bearings from raw data provided by Samotics. The dataset consists of 246 clips of 15 seconds sampled at 20k Hz. The dataset is slightly unbalanced containing 155 healthy and 91 faulty recordings [155, 91]. The dataset is previously split into a training set of [85, 52] and a test set of [70, 39] samples, respectively. These splits are provided ensuring that measurements from the same motor are not included both in the train and the test set. We utilize 20% of the training set for validation. |
| Hardware Specification | Yes | Our experiments are carried out in a Nvidia TITAN RTX GPU. |
| Software Dependencies | Yes | Any omitted parameters can safely be considered to be the default values in Py Torch 1.5.0. |
| Experiment Setup | Yes | Following the implementation of Dai et al. (2017), we utilize the Adam optimizer (Kingma & Ba, 2014) with lr=1e-2 and weight_decay=1e-4, and perform training on the official first 9 folds and test on the 10th fold. We noticed that reducing the learning rate from 1e-2 to 1e-3 increased the performance of our W-Nets. The reported results of the W-Net variants are obtained with this learning rate. We utilize batches of size 16 and perform training for 400 epochs. The learning rate is reduced by half after 20 epochs of no improvement in validation loss. Following Abdoli et al. (2019), we utilize a sampling rate of 16k Hz during our experiments. We zero-pad signals shorter than 4 seconds so that all input signals have a constant length of 64000 samples. Following the experimental description of the paper, we utilize the Ada Delta optimizer (Zeiler, 2012) with lr=1.0 and perform training in a 10-fold cross validation setting as described in Sec. 6. We use batches of size 100 and perform training for 100 epochs. |