Provable Convex Co-clustering of Tensors
Authors: Eric C. Chi, Brian J. Gaines, Will Wei Sun, Hua Zhou, Jian Yang
JMLR 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our theoretical findings are supported by extensive simulated studies. Finally, we apply the Co Co estimator to the cluster analysis of advertisement click tensor data from a major online company. |
| Researcher Affiliation | Collaboration | Eric C. Chi EMAIL Department of Statistics North Carolina State University Raleigh, NC 27695, USA Brian R. Gaines EMAIL Advanced Analytics R&D SAS Institute Inc. Cary, NC 27513, USA Will Wei Sun EMAIL Krannert School of Management Purdue University West Lafayette, IN 47907, USA Hua Zhou EMAIL Department of Biostatistics University of California Los Angeles, CA 90095, USA Jian Yang EMAIL Advertising Sciences Yahoo Research Sunnyvale, CA 94089, USA |
| Pseudocode | Yes | Algorithm 1 Convex Co-Clustering (Co Co) Estimation Algorithm Initialize λ(0); for m = 0, 1, . . . |
| Open Source Code | No | The paper mentions using 'Matlab using the Tensor Toolbox (Bader et al., 2015)' and 'the open source R package ggplot2 (Wickham, 2009)' for simulations and plotting. This refers to third-party tools used by the authors, not their own implementation code for the Co Co estimator. |
| Open Datasets | No | The paper applies the Co Co estimator to 'advertisement click tensor data from a major online company' and explicitly states that it is a 'proprietary data set'. For simulation studies, data was generated, but no public access information for these generated datasets is provided. |
| Dataset Splits | No | The paper describes generating synthetic data for simulations and evaluating performance over '100 replicates' or '200 simulated replicates' with different noise levels and cluster sizes. However, it does not specify any training/test/validation splits for the models being evaluated, which is crucial for reproducibility. |
| Hardware Specification | Yes | Timing comparisons were performed on a 3.2 GHz quad-core Intel Core i5 processor and 8 GB of RAM. |
| Software Dependencies | No | The paper mentions 'Matlab using the Tensor Toolbox (Bader et al., 2015)', 'the open source R package ggplot2 (Wickham, 2009)', 'FASTA (Goldstein et al., 2014, 2015)', and 'Tensorlab Matlab toolbox (Vervliet et al., 2016)'. While these tools are named, specific version numbers for Matlab and R or the libraries (Tensor Toolbox, ggplot2, FASTA, Tensorlab) are not provided, which is necessary for reproducible setup. |
| Experiment Setup | Yes | For Co Co, the smoothing parameter γ is chosen with the data-driven extended BIC method that is detailed in Section 7.1. In Section 7.1, it describes e BIC(γ) = n log RSSγ + 2dfγ log(n), where RSSγ is the residual sum of squares X ˆUγ 2 F and dfγ is the degrees of freedom for a particular value of γ. It also mentions selecting optimal γ from a grid S = {γ1, γ2, . . . γs}. Section 6 describes the weight specification strategy, including how τd is set to be the median Euclidean distance between k-nearest neighbors and that k-nearest neighbors are used for inducing sparsity. Appendix F details two methods for selecting the rank of the Tucker decomposition (SCORE algorithm with ρ ∈ [10^-4, 10^-2] and a heuristic based on nd/2). |