WenyanGPT: A Large Language Model for Classical Chinese Tasks

Authors: Xinyu Yao, Mengdi Wang, Bo Chen, Xiaobing Zhao

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on Wenyan BENCH demonstrate that Wenyan GPT significantly outperforms current advanced LLMs in various Classical Chinese tasks.
Researcher Affiliation Academia 1School of Information Engineering, Minzu University of China 2National Language Resource Monitoring and Research Center of Minority Languages EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes methods and processes in paragraph text and through diagrams (Figure 2: Overall Training Framework of Wenyan GPT, Figure 3: Instruction Fine-Tuning Data Construction Process), but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The abstract states: "We make the model s training data, instruction fine-tuning data, and evaluation benchmark dataset publicly available..." but does not explicitly state that the source code for Wenyan GPT itself is publicly available. A GitHub link is provided in a footnote for the baseline model Xunzi, not for Wenyan GPT.
Open Datasets Yes We make the model s training data, instruction fine-tuning data, and evaluation benchmark dataset publicly available to promote further research and development in the field of Classical Chinese processing.
Dataset Splits Yes We make the model s training data, instruction fine-tuning data, and evaluation benchmark dataset publicly available to promote further research and development in the field of Classical Chinese processing. In order to evaluate the model s performance on Classical Chinese tasks, we devise a benchmark known as Wenyan Bench.
Hardware Specification No The paper mentions using the LLaMA3-8B-Chinese model and training efficiency with bfloat16 data format, but does not provide specific details about the hardware (e.g., GPU models, CPU types) used for the experiments.
Software Dependencies No The paper mentions using LLaMA3-8B-Chinese as the base model but does not specify any software dependencies with version numbers (e.g., specific Python, PyTorch, or CUDA versions).
Experiment Setup Yes The hyper-parameter settings in pre-training are shown in Table 2. Hyper parameter Value per device train batch size 16 gradient accumulation steps 1 learning rate 1.0e-4 num train epochs 1 lr scheduler type cosine warmup ratio 0.1 The hyper-parameter settings in fine-tuning are shown in Table 4. Hyper parameter Value per device train batch size 8 gradient accumulation steps 2 learning rate 1.0e-4 num train epochs 1 lr scheduler type cosine warmup ratio 0.1