Knowledge-Enhanced Hierarchical Heterogeneous Graph for Personality Identification with Limited Training Data
Authors: Yuxuan Song, Qiudan Li, Yilin Wu, David Jingjun Xu, Daniel Dajun Zeng
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three widely used datasets demonstrate that the model outperforms stateof-the-art methods when training with only 100 samples (approximately 1% of the total data set). |
| Researcher Affiliation | Academia | Yuxuan Song1, 2, Qiudan Li1*, Yilin Wu2, 1, David Jingjun Xu3, Daniel Dajun Zeng1, 2 1The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China 2The School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 3Department of Information Systems, College of Business, City University of Hong Kong, Hong Kong, China songyuxuan2023, qiudan.li, EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology using textual explanations, mathematical formulas (Equations 1-10), and an architectural diagram (Figure 1), but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology described is publicly available. |
| Open Datasets | Yes | We conducted experiments on three commonly used datasets: PANDORA, PAN2015, and My Personality with Big Five personality labels. PANDORA (Gjurkovi c et al. 2021): This dataset is a large-scale collection of user-generated content sourced from the Reddit platform. PAN2015 (Rangel Pardo et al. 2015): This dataset comes from the data science competition PAN2015... My Personality (Xue et al. 2018): This dataset comes from an open-source project on Facebook... |
| Dataset Splits | Yes | A training set containing 100 posts and a validation set containing 100 posts were formed by a random algorithm based on depth-first search. The test set comprised the remaining data, and posts from the same user were ensured to appear in only one of the sets. The basic information of each dataset is shown in Table 2. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, processor types) used for running its experiments. It only mentions the software framework (PyTorch) and training parameters. |
| Software Dependencies | No | The paper mentions that the model is implemented using PyTorch and uses BERTbase-uncased, but it does not specify version numbers for PyTorch or any other software libraries. |
| Experiment Setup | Yes | The model is implemented using Py Torch and trained by Adam W optimizer with an initial learning rate of 10 4 and a dropout ratio of 0.5. BERTbase-uncased is used to extract global text representations and word vectors. The hidden layer dimensions of all GCNs are set to 400, the batch size is 100, and the contrastive learning temperature θ is 10. All hyperparameters are tuned over the validation set to obtain the optimized results. |