Efficient Visual Representation Learning with Heat Conduction Equation
Authors: Zhemin Zhang, Xun Gong
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that Hc Net achieves competitive performance across various visual tasks, offering new insights for the development of physics inspired model architecture design. To demonstrate the effectiveness of the Hc Net, we conduct experiments on Image Net-1K [Deng et al., 2009]. To further evaluate the generalization and robustness of our backbone, we also conduct experiments on ADE20K [Zhou et al., 2017] for semantic segmentation, and COCO [Lin et al., 2014] for object detection. Finally, we perform comprehensive ablation studies to analyze each component of the Hc Net. |
| Researcher Affiliation | Academia | Zhemin Zhang1 , Xun Gong1,2 1School of Computing and Artificial Intelligence, Southwest Jiaotong University 2Engineering Research Center of Sustainable Urban Intelligent Transportation, Ministry of Education EMAIL, EMAIL |
| Pseudocode | No | The paper includes figures illustrating the Heat Conduction Layer and Refinement Approximation Layer with mathematical equations and component diagrams, but it does not contain a dedicated pseudocode block or algorithm section with structured, step-by-step instructions for an algorithm. |
| Open Source Code | Yes | The code is publicly available at: https://github.com/ZheminZhang1/HcNet. |
| Open Datasets | Yes | To demonstrate the effectiveness of the Hc Net, we conduct experiments on Image Net-1K [Deng et al., 2009]. To further evaluate the generalization and robustness of our backbone, we also conduct experiments on ADE20K [Zhou et al., 2017] for semantic segmentation, and COCO [Lin et al., 2014] for object detection. |
| Dataset Splits | Yes | Implementation details. This setting mostly follows Swin [Liu et al., 2021]. We pretrain the backbones on the Image Net-1K dataset and apply the fine-tuning strategy used in Swin Transformer [Liu et al., 2021] on the COCO training set. Here we employ the widely-used Uper Net [Xiao et al., 2018] as the basic framework and followed Swin s [Liu et al., 2021] experimental settings. In Table 4, we report both the single-scale (SS) and multi-scale (MS) m Io U for better comparison. The default input resolution is 512 × 512. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware used for running experiments, such as GPU models, CPU models, or memory details. |
| Software Dependencies | No | We use the Py Torch toolbox [Paszke et al., 2019] to implement all our experiments. However, no specific version number for PyTorch or other software dependencies is provided. |
| Experiment Setup | Yes | We employ an Adam W [Kingma and Ba, 2014] optimizer for 300 epochs using a cosine decay learning rate scheduler and 20 epochs of linear warm-up. A batch size of 256, an initial learning rate of 0.001, and a weight decay of 0.05 are used. Vi T-B/16 uses an image size 384 × 384 and others use 224 × 224. We include most of the augmentation and regularization strategies of Swin transformer [Liu et al., 2021] in training. |