reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Online Nonnegative CP-dictionary Learning for Markovian Data

Authors: Hanbaek Lyu, Christopher Strohmeier, Deanna Needell

JMLR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we show that our algorithm converges much faster than standard algorithms for nonnegative tensor factorization tasks on both synthetic and real-world data. Also, we demonstrate the utility of our algorithm on a diverse set of examples from image, video, and time-series data, illustrating how one may learn qualitatively diﬀerent CP-dictionaries from the same tensor data by exploiting the tensor structure in multiple ways.
Researcher Affiliation	Academia	Hanbaek Lyu EMAIL Department of Mathematics University of Wisconsin Madison, WI 53709, USA Christopher Strohmeier EMAIL Department of Mathematics University of California, Los Angeles, CA 90095, USA Deanna Needell EMAIL Department of Mathematics University of California, Los Angeles, CA 90025, USA
Pseudocode	Yes	Algorithm 1 Online CP-Dictionary Learning (online CPDL)... Algorithm 2 Online CP-Dictionary Learning (Bounded Memory Implementation)... Algorithm 3 Intermediate Aggregation... Algorithm 4 Coding... Algorithm 5 Loading matrix update... Algorithm 6 Alternating Least Squares for NCPD... Algorithm 7 Multiplicative Update for NCPD
Open Source Code	Yes	All codes are available at https://github.com/Hanbaek Lyu/Online CPDL
Open Datasets	Yes	X20News R40 5000 26 0 (41.6MB) is a tensor representing semi-synthetic text data based on 20 Newsgroups dataset Rennie (2008)... XTwitter R90 5000 1000 0 (3.6GB) is an anonymized Twitter text data related to the COVID-19 pandemic from Feb. 1 to May 1 of 2020. The original data was collected in Kassab et al. (2021)... XHeadlines R203 7000 700 0 (8.0GB) is a tensor derived in Kassab et al. (2021) from news headlines published over a period of 17 years sourced from the Australian news source ABC Kulkarni (2018)... The original video is due to Barson et al. Barson et al. (2020)... we apply our algorithm to a weather dataset obtained from Beniaguev (2017).
Dataset Splits	Yes	From the given data tensor X, we obtain a sequence of tensors X1, . . . , XT obtained by subsampling 1/5 of the coordinates from the last mode... To this end, we sample 50-frame (2 sec. long) clips uniformly at random for 200 times... From this large data tensor, we sequentially extracted smaller (36 24 4) = (cities time measurements) tensors by dividing time into overlapping segments of length 24 hours, with overlap size 4 hours.
Hardware Specification	No	In order to make a fair comparison, we compare the reconstruction error against CPU times with the same machine, not against iteration counts, since a single iteration may have diﬀerent computational costs across diﬀerent algorithms. (No specific hardware model/details are provided for the 'same machine').
Software Dependencies	No	The coding step in (15) is a convex problem and can be easily solved by a number of known algorithms (e.g., LARS Efron et al. (2004), LASSO Tibshirani (1996), and featuresign search Lee et al. (2007)). (This lists algorithms, not specific software libraries with version numbers.)
Experiment Setup	Yes	We applied our online CPDL algorithm (Algorithm 1 for 400 iterations with λ = 1) to various reshapings of such patches to learn three separate dictionaries, each consisting of 24 atoms... In Figure 2, OCPDL (β) for for β {0.75, 1} denotes Algorithm 1 with weights wt = c t β/ log t (with w1 = c )... In all cases, the weights satisfy (A3) so the algorithm is guaranteed to converge to the stationary points of the objective function almost surely by Theorem 1. The constant c is chosen from {1, 10, 100, 1000} for β {0.75, 1}. Initial loading matrices for Algorithm 1 are chosen with i.i.d. entries drawn from the uniform distribution on [0, 1]... sequentially fed into the online CPDL algorithm with wt = c /t, λ = 2, and c = 105... to learn a single CP-dictionary atom (R = 1).