reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Estimating class separability of text embeddings with persistent homology.

Authors: Kostis Gourgoulias, Najah Ghalyan, Maxime Labonne, yash satsangi, Sean Moran, Joseph Sabelja

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results, validated across binary and multi-class text classification tasks, show that the proposed method s estimates of class separability align with those obtained from supervised methods. This approach offers a novel perspective on monitoring and improving the fine-tuning of sentence transformers for classification tasks, particularly in scenarios where labeled data is scarce. We also discuss how tracking these quantities can provide additional insights into the properties of the trained classifier. Section 4 is titled "Experiments" and describes experimental validation using datasets and models.
Researcher Affiliation	Industry	Kostis Gourgoulias EMAIL JPMorgan Chase & Co. Najah Ghalyan EMAIL JPMorgan Chase & Co. Maxime Labonne JPMorgan Chase & Co. Yash Satsangi JPMorgan Chase & Co. Sean Moran JPMorgan Chase & Co. Joseph Sabelja JPMorgan Chase & Co.
Pseudocode	No	The paper describes methods and procedures in narrative text, but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	No	The paper mentions using external libraries like the 'ripser Python library' and 'sentence-transformers Python library' but does not provide a link or explicit statement about releasing the authors' own implementation code for the methodology described.
Open Datasets	Yes	Datasets: We use the train splits of Set Fit/amazon_counterfactual and the Set Fit/sst2 datasets from Hugging Face3. ... Datasets: We use the train split of two datasets: Set Fit/emotion and financial_phrasebank, both of which can be found on Hugging Face.
Dataset Splits	Yes	We split each set into a training set with 1000 examples and a tracking set with 1000 examples. At the end of every epoch, we embed the examples in the tracking set with each sentence transformer. ... To reduce the dependence on a particular train-tracking-validation split, we compute the scores over seven randomly-picked train-tracking-validation splits.
Hardware Specification	No	The paper does not provide specific details about the hardware used, such as GPU/CPU models, memory, or cloud instance types.
Software Dependencies	Yes	In this work, we only concern ourselves with the persistence times of the connected components (i.e., corresponding to H0) which we get from the ripser Python library (Tralie et al., 2018). ... We consider three pre-trained sentence transformers available on Hugging Face and through the sentence-transformers Python library (Reimers and Gurevych, 2019)4. The models are sentence-transformers/all-Mini LM-L6-v2 (Mini LM), sentence-transformers/paraphrase-Tiny BERT-L6v2 (Tiny Bert), and sentence-transformers/paraphrase-albert-small-v2 (Albert). ...4We use the sentence-transformers version 2.4.0 to load the models and encode text into sentence embedding vectors.
Experiment Setup	Yes	Finetuning Procedure: We attach a randomly-initialized one-layer sigmoid head to construct a binary text classifier that outputs the probability of one of the classes. ... We use the prodigy optimizer (Mishchenko and Defazio, 2023) with d_coef = 1e 1, cross-entropy loss, and batch size 32 for all models. Each training run lasts 7 epochs. As we fine-tune on relatively small datasets, we only train the last layer of each ST and the classifier head.