reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Graph Cuts with Arbitrary Size Constraints Through Optimal Transport

Authors: Chakib Fettal, lazhar labiod, Mohamed Nadif

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiments We evaluated the clustering performance of our two variants OT-ncut and OT-rcut algorithms against the spectral clustering algorithm and state-of-the-art OT-based graph clustering approaches. 5.1 Datasets We perform experiments on graphs constructed from image datasets, namely, MNIST (Deng, 2012), Fashion MNIST (Xiao et al., 2017) and KMNIST (Clanuwat et al., 2018). ... Table 2: Average ( sd) clustering performance and running times on the graph built from images. ... Table 3: Average ( sd) clustering performance and running times on the graph datasets.
Researcher Affiliation	Academia	Chakib Fettal EMAIL CDC Informatique Centre Borelli, UMR 9010, Université Paris Cité Lazhar Labiod EMAIL Centre Borelli, UMR 9010, Université Paris Cité Mohamed Nadif EMAIL Centre Borelli, UMR 9010, Université Paris Cité
Pseudocode	Yes	Algorithm 1: Nonconvex Accelerated PGD for OT-cut
Open Source Code	Yes	For reproducibility purposes, we release our code1. 1https://github.com/chakib401/OT-cut
Open Datasets	Yes	We perform experiments on graphs constructed from image datasets, namely, MNIST (Deng, 2012), Fashion MNIST (Xiao et al., 2017) and KMNIST (Clanuwat et al., 2018). We also consider four graph datasets: DBLP, a co-term citation network; and ACM, a coauthor citation networks (Fan et al., 2020). EU-Email an email network from a large European research institution (Leskovec & Krevl, 2014). Indian-Village describes interactions among villagers in Indian villages (Banerjee et al., 2013). ... Results on long-tailed versions of CIFAR-10 are reported in table 5.
Dataset Splits	No	The paper uses various datasets (MNIST, Fashion-MNIST, KMNIST, DBLP, ACM, Village, EU-Email, CIFAR-10) for clustering experiments. However, it does not explicitly provide details about specific training, test, or validation dataset splits, percentages, or sample counts for these experiments, which are typically found in supervised learning contexts. For clustering, the entire dataset is generally used for the algorithm.
Hardware Specification	Yes	All experiments were run five times and were performed on a 64gb RAM machine with a 12th Gen Intel(R) Core(TM) i9-12950HX (24 CPUs) processor with a frequency of 2.3GHz.
Software Dependencies	No	Our two variants, OT-ncut and OT-rcut are implemented via the Python optimal transport package (POT) (Flamary et al., 2021). ... For the baselines, we use the Scikit-Learn (Pedregosa et al., 2011) implementation of spectral clustering. ... We use the implementations of Louvain, Fluid, and Greedy provided in the networkx package (Hagberg et al., 2008). We use the implementation of Infomap provided in Edler et al. (2024). The paper mentions software packages used but does not provide specific version numbers for them.
Experiment Setup	Yes	We use random initialization and use uniform target distributions unless explicitly stated otherwise. We also set α = 1/2 and the number of iterations to 30 for the image graphs and 20 for the other graphs. We also use normalized laplacian matrices.