reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Innovations Autoencoder and its Application in One-class Anomalous Sequence Detection

Authors: Xinyi Wang, Lang Tong

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We then demonstrate, using ﬁeld-collected and synthetic datasets, the eﬀectiveness of the proposed approach on detecting system anomalies in a microgrid (Pignati et al., 2015). 5. Performance Evaluation We present two sets of evaluations based on a combination of ﬁeld collected datasets from actual systems and synthetic datasets designed to test speciﬁc properties.
Researcher Affiliation	Academia	Xinyi Wang EMAIL Department of Electrical and Computer Engineering Cornell University Ithaca, NY 14850, USA Lang Tong EMAIL Department of Electrical and Computer Engineering Cornell University Ithaca, NY 14850, USA
Pseudocode	Yes	A pseudo code that implements the IAE learning is shown in the Appendix. Appendix B. Pseudocode Algorithm 1 Training the Innovations Autoencoder
Open Source Code	No	IAE was implemented by adapting the Wasserstein GAN with a few modiﬁcations 6. https://keras.io/examples/generative/wgan_gp/ The text does not provide an explicit statement of code release for the methodology described in this paper, nor a direct link to a repository containing their specific implementation.
Open Datasets	Yes	The BESS dataset contained direct bus voltage measurements sampled at 50 k Hz at a medium-voltage (20k V) substation collected from the EPFL campus smart grid as described by Sossan et al. (2016). The second ﬁeld-collected dataset (UTK) contained direct samples of voltage waveform at 6 k Hz collected at the University of Tennessee. Besides the two ﬁeld datasets (BESS and UTK), we also designed several synthetic datasets to evaluate speciﬁc properties of IAE and IAE-based anomaly detections. These datasets are described in Sec. 5.2 and Sec. 5.3. Table 1: Test Synthetic Datasets. νt i.i.d U[0, 1]. 1() is the indicator function. Table 3: Data Detection Test Cases. νt i.i.d N(0, 1), ν t i.i.d U[ 1.5, 1.5]
Dataset Splits	No	anomaly-free training samples are given and 100,000 samples were used for training for all cases. To construct the anomaly samples, we added a comparably small Gaussian Mixture noise on the anomaly-free measurements. The paper describes using anomaly-free samples for training and anomaly samples for testing, and mentions total training sample size, but it does not specify explicit train/validation/test split percentages or methodology for partitioning the overall datasets into these sets.
Hardware Specification	No	The paper does not contain any specific details regarding the hardware (e.g., GPU/CPU models, memory) used for running the experiments or training the models.
Software Dependencies	No	IAE was implemented by adapting the Wasserstein GAN with a few modiﬁcations 6. https://keras.io/examples/generative/wgan_gp/ The paper mentions Keras and Wasserstein GAN but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	For all cases in this paper, similar neural network structures were used: the encoder and decoder both contained three hidden layers with 100, 50, 25 neurons respectively with hyperbolic tangent activation. The discriminator contained three hidden layers with 100, 50, and 25 neurons, of which the ﬁrst two used hyperbolic tangent activation and the last one the linear activation. The tunning parameter used for each case is presented in the Appendix. Appendix C. Neural Network Parameter All the neural networks (encoder, decoder and discriminator) in the paper had three hidden layers, with the 100, 50, 25 neurons respectively. The input dimension for the generator was chosen such that n = 3m. In the paper, m = 20 was used for synthetic case, and m = 100 for real data cases. The encoder and decoder both used hyperbolic tangent activation. The ﬁrst two layers of the discriminator adopted hyperbolic tangent activation, and the last one linear activation. The tuning parameter was chosen to be the same for all synthetic cases, with µ = 0.1, λ = 5, α = 0.0002, β1 = 0.9, β2 = 0.999. For the two real data cases, the hyper-parameters were set to be µ = 0.01, λ = 3, α = 0.001, β1 = 0.9, β2 = 0.999.