reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Fine-grained Prediction of Political Leaning on Social Media with Unsupervised Deep Learning

Authors: Tiziano Fagni, Stefano Cresci

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated our technique in two challenging classiﬁcation tasks and we compared it to baselines and other state-of-the-art approaches. Our technique obtains the best results among all unsupervised techniques, with micro F1 = 0.426 in the 8-class task and micro F1 = 0.772 in the 3-class task.
Researcher Affiliation	Academia	Tiziano Fagni EMAIL Stefano Cresci EMAIL Institute of Informatics and Telematics (IIT) National Research Council (CNR) via G. Moruzzi 1, 56124 Pisa, Italy
Pseudocode	No	The paper describes the methodology using high-level overview diagrams (Figure 1, Figure 3) and lists steps textually (e.g., Section 5, steps for clustering). It does not contain structured pseudocode or algorithm blocks with code-like formatting.
Open Source Code	No	The paper states that data is publicly available, but there is no explicit statement about open-source code for the methodology or a link to a code repository. 'Our data are publicly available for scientiﬁc purposes5. 5. https://doi.org/10.5281/zenodo.5793346'
Open Datasets	Yes	Our data are publicly available for scientiﬁc purposes5. 5. https://doi.org/10.5281/zenodo.5793346
Dataset Splits	Yes	Finally, we performed a stratiﬁed sampling to split our dataset into a training (90% 18,169 users), a validation (3% 604 users) and a test (7% 1,426 users) partition.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU/CPU models, memory, or processor types used for running the experiments.
Software Dependencies	No	The paper mentions 'sklearn Python software package' and 'gensim library', and 'UMAP with default parameters' but does not specify version numbers for these software components. '13. https://scikit-learn.org/stable/' '12. https://radimrehurek.com/gensim/'
Experiment Setup	Yes	In this work, we ﬁxed k = 5 in Equation (3)... Th = 0.5 is a reasonable value... We leveraged UMAP with default parameters... we assume that we know the number of clusters we want to obtain at the end of clustering process (i.e., 8 clusters for the party prediction task and 3 clusters for pole prediction task)... Parties + clustering: ... step 2 with a feature reduction to 64 features, and step 4 using Gaussian Mixture with default parameters... Parties enriched + clustering: ... clustering process for the party prediction task is performed by applying only step 3 and step 4 using KMeans as the clustering algorithm. For the pole prediction task, we used instead step 1, step 2 with a feature reduction to 64 features, and step 4 using the KMeans algorithm.