reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Knowledge-Enhanced Hierarchical Heterogeneous Graph for Personality Identification with Limited Training Data

Authors: Yuxuan Song, Qiudan Li, Yilin Wu, David Jingjun Xu, Daniel Dajun Zeng

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on three widely used datasets demonstrate that the model outperforms stateof-the-art methods when training with only 100 samples (approximately 1% of the total data set).
Researcher Affiliation	Academia	Yuxuan Song1, 2, Qiudan Li1*, Yilin Wu2, 1, David Jingjun Xu3, Daniel Dajun Zeng1, 2 1The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2The School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, China songyuxuan2023, qiudan.li, EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations, mathematical formulas (Equations 1-10), and an architectural diagram (Figure 1), but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements or links indicating that the source code for the methodology described is publicly available.
Open Datasets	Yes	We conducted experiments on three commonly used datasets: PANDORA, PAN2015, and My Personality with Big Five personality labels. PANDORA (Gjurkovi c et al. 2021): This dataset is a large-scale collection of user-generated content sourced from the Reddit platform. PAN2015 (Rangel Pardo et al. 2015): This dataset comes from the data science competition PAN2015... My Personality (Xue et al. 2018): This dataset comes from an open-source project on Facebook...
Dataset Splits	Yes	A training set containing 100 posts and a validation set containing 100 posts were formed by a random algorithm based on depth-first search. The test set comprised the remaining data, and posts from the same user were ensured to appear in only one of the sets. The basic information of each dataset is shown in Table 2.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types) used for running its experiments. It only mentions the software framework (PyTorch) and training parameters.
Software Dependencies	No	The paper mentions that the model is implemented using PyTorch and uses BERTbase-uncased, but it does not specify version numbers for PyTorch or any other software libraries.
Experiment Setup	Yes	The model is implemented using Py Torch and trained by Adam W optimizer with an initial learning rate of 10 4 and a dropout ratio of 0.5. BERTbase-uncased is used to extract global text representations and word vectors. The hidden layer dimensions of all GCNs are set to 400, the batch size is 100, and the contrastive learning temperature θ is 10. All hyperparameters are tuned over the validation set to obtain the optimized results.