Knowledge-Enhanced Hierarchical Heterogeneous Graph for Personality Identification with Limited Training Data

Authors: Yuxuan Song, Qiudan Li, Yilin Wu, David Jingjun Xu, Daniel Dajun Zeng

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on three widely used datasets demonstrate that the model outperforms stateof-the-art methods when training with only 100 samples (approximately 1% of the total data set).
Researcher Affiliation Academia Yuxuan Song1, 2, Qiudan Li1*, Yilin Wu2, 1, David Jingjun Xu3, Daniel Dajun Zeng1, 2 1The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2The School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, China songyuxuan2023, qiudan.li, EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the methodology using textual explanations, mathematical formulas (Equations 1-10), and an architectural diagram (Figure 1), but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements or links indicating that the source code for the methodology described is publicly available.
Open Datasets Yes We conducted experiments on three commonly used datasets: PANDORA, PAN2015, and My Personality with Big Five personality labels. PANDORA (Gjurkovi c et al. 2021): This dataset is a large-scale collection of user-generated content sourced from the Reddit platform. PAN2015 (Rangel Pardo et al. 2015): This dataset comes from the data science competition PAN2015... My Personality (Xue et al. 2018): This dataset comes from an open-source project on Facebook...
Dataset Splits Yes A training set containing 100 posts and a validation set containing 100 posts were formed by a random algorithm based on depth-first search. The test set comprised the remaining data, and posts from the same user were ensured to appear in only one of the sets. The basic information of each dataset is shown in Table 2.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types) used for running its experiments. It only mentions the software framework (PyTorch) and training parameters.
Software Dependencies No The paper mentions that the model is implemented using PyTorch and uses BERTbase-uncased, but it does not specify version numbers for PyTorch or any other software libraries.
Experiment Setup Yes The model is implemented using Py Torch and trained by Adam W optimizer with an initial learning rate of 10 4 and a dropout ratio of 0.5. BERTbase-uncased is used to extract global text representations and word vectors. The hidden layer dimensions of all GCNs are set to 400, the batch size is 100, and the contrastive learning temperature θ is 10. All hyperparameters are tuned over the validation set to obtain the optimized results.