reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MIETT: Multi-Instance Encrypted Traffic Transformer for Encrypted Traffic Classification

Authors: Xu-Yang Chen, Lu Han, De-Chuan Zhan, Han-Jia Ye

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	After fine-tuning, MIETT achieves state-of-the-art (SOTA) performance across five datasets, demonstrating its effectiveness in classifying encrypted traffic and understanding complex network behaviors. We provide extensive empirical validation of the proposed MIETT model on multiple datasets, showing that our approach outperforms existing methods in terms of accuracy and F1-score.
Researcher Affiliation	Academia	School of Artificial Intelligence, Nanjing University, China National Key Laboratory for Novel Software Technology, Nanjing University, China EMAIL
Pseudocode	No	The paper describes the methodology using textual explanations and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets	Yes	For the encrypted traffic classification task, we evaluate our method on five datasets: ISCXVPN 2016 (Draper-Gil et al. 2016), ISCXTor 2016 (Lashkari et al. 2017), and the Cross Platform (Van Ede et al. 2020) dataset, which includes two subsets (Android and i OS), as well as the CIC Io T Dataset 2023 (Neto et al. 2023).
Dataset Splits	Yes	The data is split into training, validation, and test sets with a ratio of 8:1:1.
Hardware Specification	Yes	All experiments are conducted on a server with two NVIDIA RTX A6000 GPUs.
Software Dependencies	No	The paper mentions the use of 'Adam W optimizer' but does not specify version numbers for any software, libraries, or programming languages used in the experiments.
Experiment Setup	Yes	During the pre-training stage, we set the training steps to 150,000 and randomly select five of the first ten packets for training. The masking ratio for the Masked Flow Prediction (MFP) task is set to 15%. The weights for the Packet Relative Position Prediction (PRPP) and MFP tasks are both set to 0.2. In the fine-tuning stage, we train for 30 epochs using the first five packets. For both stages, the packet length (L) is set to 128, the number of packets (N) is set to 5, the embedding dimension (d) is set to 768, and the number of Two-Level Attention (TLA) layers is set to 12. The learning rate is set to 2e-5, and the Adam W optimizer is used.