Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]
Hierarchical Vector Quantized Transformer for Multi-class Unsupervised Anomaly Detection
Authors: Ruiying Lu, YuJie Wu, Long Tian, Dongsheng Wang, Bo Chen, Xiyang Liu, Ruimin Hu
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | By evaluating on MVTec-AD and Vis A datasets, our model surpasses the state-of-the-art alternatives and possesses good interpretability. The code is available at https://github.com/Ruiying Lu/HVQ-Trans. |
| Researcher Affiliation | Academia | Ruiying Lu1 , Yu Jie Wu2 , Long Tian2* , Dongsheng Wang3 Bo Chen3, Xiyang Liu2, Ruimin Hu1 School of Cyber Engineering1, Software Engineering Institute2 National Key Laboratory of Radar Signal Processing3 Xidian University EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Ruiying Lu/HVQ-Trans. |
| Open Datasets | Yes | MVTec-AD [2] is a wildly-used industrial anomaly detection dataset with 15 classes... Vis A [45] is a recently published large dataset... CIFAR-10 [45] is a classical image classification dataset of 10 categories. |
| Dataset Splits | Yes | For each class, the training samples are normal while the test samples can be either normal or anomalous. In order to implement many-versus-many anomaly detection, we select 5 normal classes while the rest classes are viewed as anomalies. |
| Hardware Specification | Yes | Our model is trained for 1000 epochs on 2 GPUs (NVIDIA Ge Force RTX 3080 10GB) with batch size 16. |
| Software Dependencies | No | The paper mentions software like Efficient Net and Adam W but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | The input image size of MVTec-AD is 224 224 3... The feature maps become 14 14 272, namely, the patch size is 16. Then we reduce the channel dimension of each patch into 256, followed by feeding them into a 4-layer van Trans-enc followed by the corresponding and a 4-layer VQTrans-dec. We use Adam W [53] with weight decay 0.0001 for optimization. Our model is trained for 1000 epochs... with batch size 16. The learning rate is initialized as 1 10 4 and dropped by 0.1 after 800 epochs. |