Efficient Visual Representation Learning with Heat Conduction Equation

Authors: Zhemin Zhang, Xun Gong

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Hc Net achieves competitive performance across various visual tasks, offering new insights for the development of physics inspired model architecture design. To demonstrate the effectiveness of the Hc Net, we conduct experiments on Image Net-1K [Deng et al., 2009]. To further evaluate the generalization and robustness of our backbone, we also conduct experiments on ADE20K [Zhou et al., 2017] for semantic segmentation, and COCO [Lin et al., 2014] for object detection. Finally, we perform comprehensive ablation studies to analyze each component of the Hc Net.
Researcher Affiliation Academia Zhemin Zhang1 , Xun Gong1,2 1School of Computing and Artificial Intelligence, Southwest Jiaotong University 2Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education EMAIL, EMAIL
Pseudocode No The paper includes figures illustrating the Heat Conduction Layer and Refinement Approximation Layer with mathematical equations and component diagrams, but it does not contain a dedicated pseudocode block or algorithm section with structured, step-by-step instructions for an algorithm.
Open Source Code Yes The code is publicly available at: https://github.com/ZheminZhang1/HcNet.
Open Datasets Yes To demonstrate the effectiveness of the Hc Net, we conduct experiments on Image Net-1K [Deng et al., 2009]. To further evaluate the generalization and robustness of our backbone, we also conduct experiments on ADE20K [Zhou et al., 2017] for semantic segmentation, and COCO [Lin et al., 2014] for object detection.
Dataset Splits Yes Implementation details. This setting mostly follows Swin [Liu et al., 2021]. We pretrain the backbones on the Image Net-1K dataset and apply the fine-tuning strategy used in Swin Transformer [Liu et al., 2021] on the COCO training set. Here we employ the widely-used Uper Net [Xiao et al., 2018] as the basic framework and followed Swin s [Liu et al., 2021] experimental settings. In Table 4, we report both the single-scale (SS) and multi-scale (MS) m Io U for better comparison. The default input resolution is 512 × 512.
Hardware Specification No The paper does not explicitly describe the specific hardware used for running experiments, such as GPU models, CPU models, or memory details.
Software Dependencies No We use the Py Torch toolbox [Paszke et al., 2019] to implement all our experiments. However, no specific version number for PyTorch or other software dependencies is provided.
Experiment Setup Yes We employ an Adam W [Kingma and Ba, 2014] optimizer for 300 epochs using a cosine decay learning rate scheduler and 20 epochs of linear warm-up. A batch size of 256, an initial learning rate of 0.001, and a weight decay of 0.05 are used. Vi T-B/16 uses an image size 384 × 384 and others use 224 × 224. We include most of the augmentation and regularization strategies of Swin transformer [Liu et al., 2021] in training.