reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Mixture of Experts Based Multi-Task Supervise Learning from Crowds

Authors: Tao Han, Huaixuan Shi, Xinyi Ding, Xi-Ao Ma, Huamao Gu, Yili Fang

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that MMLC-owf outperforms state-ofthe-art methods and MMLC-df enhances the quality of existing learning-from-crowds methods. Experiments We verify the effectiveness of our method through experiments1. We compare our learning-from-crowds method MMLC-owf with the following baselines:...
Researcher Affiliation	Academia	Tao Han, Huaixuan Shi, Xinyi Ding, Xi-Ao Ma, Huamao Gu, Yili Fang* School of Computer Science and Technology, Zhejiang Gongshang University, Hangzhou 310018, China EMAIL, EMAIL
Pseudocode	No	The paper describes the proposed methodology in prose and mathematical formulations, but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our code is available at https://github.com/Crowds24/MMLC.
Open Datasets	Yes	We identified three representative datasets that exemplify different types of crowdsourcing scenarios and data characteristics. Lable Me (Rodrigues and Pereira 2018; Russell et al. 2008): This dataset consists of 1000 images categorized into 8 classes... Text (Dumitrache, Aroyo, and Welty 2018): This dataset comprises 1594 sentences... Music (Rodrigues, Pereira, and Ribeiro 2014): This dataset consists of 700 music compositions...
Dataset Splits	No	The paper mentions conducting experiments over 'five rounds experiments' and altering data density for analysis, but does not provide specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined standard splits) for reproducibility.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using tools and models like VGG16, BERT, Marsyas, Google's MMoE, and Crowd AR for feature extraction, but it does not specify any version numbers for these software components or for the programming languages/frameworks used.
Experiment Setup	Yes	To accommodate the feature scales of the three experimental datasets, our model s architecture varies accordingly. For the Lable Me dataset, our model employs 16 expert modules, each comprising 3 fully connected Re LU layers, with a final layer output dimension of 32. For the Text and Music datasets, we utilize 10 expert modules. Each module consists of 3 and 2 fully connected Re LU layers, with output dimensions of 32 and 16, respectively.