reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Provable Convex Co-clustering of Tensors

Authors: Eric C. Chi, Brian J. Gaines, Will Wei Sun, Hua Zhou, Jian Yang

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our theoretical findings are supported by extensive simulated studies. Finally, we apply the Co Co estimator to the cluster analysis of advertisement click tensor data from a major online company.
Researcher Affiliation	Collaboration	Eric C. Chi EMAIL Department of Statistics North Carolina State University Raleigh, NC 27695, USA Brian R. Gaines EMAIL Advanced Analytics R&D SAS Institute Inc. Cary, NC 27513, USA Will Wei Sun EMAIL Krannert School of Management Purdue University West Lafayette, IN 47907, USA Hua Zhou EMAIL Department of Biostatistics University of California Los Angeles, CA 90095, USA Jian Yang EMAIL Advertising Sciences Yahoo Research Sunnyvale, CA 94089, USA
Pseudocode	Yes	Algorithm 1 Convex Co-Clustering (Co Co) Estimation Algorithm Initialize λ(0); for m = 0, 1, . . .
Open Source Code	No	The paper mentions using 'Matlab using the Tensor Toolbox (Bader et al., 2015)' and 'the open source R package ggplot2 (Wickham, 2009)' for simulations and plotting. This refers to third-party tools used by the authors, not their own implementation code for the Co Co estimator.
Open Datasets	No	The paper applies the Co Co estimator to 'advertisement click tensor data from a major online company' and explicitly states that it is a 'proprietary data set'. For simulation studies, data was generated, but no public access information for these generated datasets is provided.
Dataset Splits	No	The paper describes generating synthetic data for simulations and evaluating performance over '100 replicates' or '200 simulated replicates' with different noise levels and cluster sizes. However, it does not specify any training/test/validation splits for the models being evaluated, which is crucial for reproducibility.
Hardware Specification	Yes	Timing comparisons were performed on a 3.2 GHz quad-core Intel Core i5 processor and 8 GB of RAM.
Software Dependencies	No	The paper mentions 'Matlab using the Tensor Toolbox (Bader et al., 2015)', 'the open source R package ggplot2 (Wickham, 2009)', 'FASTA (Goldstein et al., 2014, 2015)', and 'Tensorlab Matlab toolbox (Vervliet et al., 2016)'. While these tools are named, specific version numbers for Matlab and R or the libraries (Tensor Toolbox, ggplot2, FASTA, Tensorlab) are not provided, which is necessary for reproducible setup.
Experiment Setup	Yes	For Co Co, the smoothing parameter γ is chosen with the data-driven extended BIC method that is detailed in Section 7.1. In Section 7.1, it describes e BIC(γ) = n log RSSγ + 2dfγ log(n), where RSSγ is the residual sum of squares X ˆUγ 2 F and dfγ is the degrees of freedom for a particular value of γ. It also mentions selecting optimal γ from a grid S = {γ1, γ2, . . . γs}. Section 6 describes the weight specification strategy, including how τd is set to be the median Euclidean distance between k-nearest neighbors and that k-nearest neighbors are used for inducing sparsity. Appendix F details two methods for selecting the rank of the Tucker decomposition (SCORE algorithm with ρ ∈ [10^-4, 10^-2] and a heuristic based on nd/2).