WenyanGPT: A Large Language Model for Classical Chinese Tasks
Authors: Xinyu Yao, Mengdi Wang, Bo Chen, Xiaobing Zhao
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on Wenyan BENCH demonstrate that Wenyan GPT significantly outperforms current advanced LLMs in various Classical Chinese tasks. |
| Researcher Affiliation | Academia | 1School of Information Engineering, Minzu University of China 2National Language Resource Monitoring and Research Center of Minority Languages EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods and processes in paragraph text and through diagrams (Figure 2: Overall Training Framework of Wenyan GPT, Figure 3: Instruction Fine-Tuning Data Construction Process), but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The abstract states: "We make the model s training data, instruction fine-tuning data, and evaluation benchmark dataset publicly available..." but does not explicitly state that the source code for Wenyan GPT itself is publicly available. A GitHub link is provided in a footnote for the baseline model Xunzi, not for Wenyan GPT. |
| Open Datasets | Yes | We make the model s training data, instruction fine-tuning data, and evaluation benchmark dataset publicly available to promote further research and development in the field of Classical Chinese processing. |
| Dataset Splits | Yes | We make the model s training data, instruction fine-tuning data, and evaluation benchmark dataset publicly available to promote further research and development in the field of Classical Chinese processing. In order to evaluate the model s performance on Classical Chinese tasks, we devise a benchmark known as Wenyan Bench. |
| Hardware Specification | No | The paper mentions using the LLaMA3-8B-Chinese model and training efficiency with bfloat16 data format, but does not provide specific details about the hardware (e.g., GPU models, CPU types) used for the experiments. |
| Software Dependencies | No | The paper mentions using LLaMA3-8B-Chinese as the base model but does not specify any software dependencies with version numbers (e.g., specific Python, PyTorch, or CUDA versions). |
| Experiment Setup | Yes | The hyper-parameter settings in pre-training are shown in Table 2. Hyper parameter Value per device train batch size 16 gradient accumulation steps 1 learning rate 1.0e-4 num train epochs 1 lr scheduler type cosine warmup ratio 0.1 The hyper-parameter settings in fine-tuning are shown in Table 4. Hyper parameter Value per device train batch size 8 gradient accumulation steps 2 learning rate 1.0e-4 num train epochs 1 lr scheduler type cosine warmup ratio 0.1 |