reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Denise: Deep Robust Principal Component Analysis for Positive Semidefinite Matrices

Authors: Calypso Herrera, Florian Krach, Anastasis Kratsios, Pierre Ruyssen, Josef Teichmann

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that Denise matches state-of-the-art performance in terms of decomposition quality, while being approximately 2000 faster than the state-of-the-art, principal component pursuit (PCP), and 200 faster than the current speed-optimized method, fast PCP. In this sections we provide numerical results of Denise. We first train Denise with the supervised loss function on a synthetic training dataset and evaluate it on a synthetic test dataset.
Researcher Affiliation	Collaboration	Calypso Herrera EMAIL Department of Mathematics ETH Zürich Florian Krach EMAIL Department of Mathematics ETH Zürich Anastasis Kratsios EMAIL Department Mathematics Mc Master University Pierre Ruyssen EMAIL Google Brain Google Zürich Josef Teichmann EMAIL Department of Mathematics ETH Zürich
Pseudocode	Yes	A schematic version of these supervised and unsupervised training schemes is given in the pseudo-Algorithm 1. Algorithm 1 Training of Denise
Open Source Code	Yes	The source code is avaible at https://github.com/Deep RPCA/Denise .
Open Datasets	No	We create a synthetic dataset in order to train Denise using the Monte Carlo approximation (7) of the supervised loss function (3). ... We consider a real world dataset of about 1 000 20-by-20 correlation matrices of daily stock returns (on closing prices), for consecutive trading days, shifted every 5 days, between 1989 and 2019. The considered stocks belong to the S&P500 and have been sorted by the GICS sectors.
Dataset Splits	Yes	We create a synthetic dataset consisting of 10 million matrices for the training set. ... We create a synthetic test dataset consisting of 10,000 matrices for each of the test settings... The first 77% of the data is used as training set and the remaining 23% as test set.
Hardware Specification	Yes	In this setting, we trained our model using 16 Google Cloud TPU-v2 hardware accelerators. ... A machine with 2 Intel Xeon CPU E5-2697 v2 (12 Cores) 2.70GHz and 256 Gi B of RAM.
Software Dependencies	No	To implement Denise, we used the machine learning framework Tensorflow (Abadi et al., 2015) with Keras APIs (Chollet et al., 2015). All algorithms are implemented as part of the LRS matlab library (Sobral et al., 2015; Bouwmans et al., 2016).
Experiment Setup	Yes	All results were similar, hence we only present those using size n = 20, sparsity s0 = 0.95 and rank k0 = 3 in the training set. ... Training took around 8 hours (90 epochs)... we empirically determined λ in order to reach the same rank. In particular, with λ = 0.56/ n for the synthetic dataset and λ = 0.64/ n for the real dataset, we approximately obtain a rank of 3 for matrices L.