reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Online Tensor Methods for Learning Latent Variable Models

Authors: Furong Huang, U. N. Niranjan, Mohammad Umar Hakeem, Animashree Anandkumar

JMLR 2015 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct optimization of multilinear operations in SGD and avoid directly forming the tensors, to save computational and storage costs. We present optimized algorithm in two platforms. Our GPU-based implementation exploits the parallelism of SIMD architectures to allow for maximum speed-up by a careful optimization of storage and data transfer, whereas our CPU-based implementation uses eﬃcient sparse matrix computations and is suitable for large sparse data sets. For the community detection problem, we demonstrate accuracy and computational eﬃciency on Facebook, Yelp and DBLP data sets, and for the topic modeling problem, we also demonstrate good performance on the New York Times data set. We compare our results to the state-of-the-art algorithms such as the variational method, and report a gain of accuracy and a gain of several orders of magnitude in the execution time.
Researcher Affiliation	Academia	Furong Huang EMAIL U. N. Niranjan EMAIL Mohammad Umar Hakeem EMAIL Animashree Anandkumar EMAIL Electrical Engineering and Computer Science Dept. University of California, Irvine Irvine, USA 92697, USA
Pseudocode	Yes	Algorithm 1 Overall approach for learning latent variable models via a moment-based approach. Input: Observed data: social network graph or document samples. Output: Learned latent variable model and infer hidden attributes. (...) Algorithm 2 Randomized Tall-thin SVD Input: Second moment matrix M2. Output: Whitening matrix W. (...) Algorithm 3 Randomized Pseudoinverse Input: Pairs matrix Pairs (B, C). Output: Pseudoinverse of the pairs matrix (Pairs (B, C)) .
Open Source Code	Yes	The code is available at http://github.com/Furong Huang/Fast-Detection-of-Overlapping-Communities-via-Online-Tensor-Methods
Open Datasets	Yes	We learn interesting hidden topics in New York Times corpus from UCI bag-of-words data set1 with around 100, 000 words and 300, 000 documents in about two minutes. 1. https://archive.ics.uci.edu/ml/datasets/Bag+of+Words (...) The DBLP data contains bibliographic records7 with various publication venues, such as journals and conferences, which we model as communities. 7. http://dblp.uni-trier.de/xml/Dblp.xml (...) Facebook Dataset: A snapshot of the Facebook network of UNC (Traud et al., 2010) is provided with user attributes.
Dataset Splits	No	The paper focuses on learning latent variable models from observed data and evaluates performance against ground truth or other methods. It does not describe specific training, validation, or testing dataset splits, percentages, or sample counts. The evaluation metrics like recovery ratio and error function are applied to the learned models from the full datasets rather than held-out splits.
Hardware Specification	Yes	Table 3: System speciﬁcations. Hardware / software Version CPU Dual 8-core Xeon @ 2.0GHz Memory 64GB DDR3 GPU Nvidia Quadro K5000 CUDA Cores 1536 Global memory 4GB GDDR5 Cent OS Release 6.4 (Final) GCC 4.4.7 CUDA Release 5.0 CULA-Dense R16a
Software Dependencies	Yes	Table 3: System speciﬁcations. Hardware / software Version CPU Dual 8-core Xeon @ 2.0GHz Memory 64GB DDR3 GPU Nvidia Quadro K5000 CUDA Cores 1536 Global memory 4GB GDDR5 Cent OS Release 6.4 (Final) GCC 4.4.7 CUDA Release 5.0 CULA-Dense R16a
Experiment Setup	Yes	We choose θ = 1 in our experiments to ensure that there is suﬃcient penalty for non-orthogonality, which prevents us from obtaining degenerate solutions. (...) For the mixed membership model, we set the concentration parameter α0 = 1. (...) Table 7: Yelp, Facebook and DBLP main quantitative evaluation of the tensor method versus the variational method: bk is the community number speciﬁed to our algorithm, Thre is the threshold for picking signiﬁcant estimated membership entries.