Context Matters: Query-aware Dynamic Long Sequence Modeling of Gigapixel Images
Authors: Zhengrui Guo, Qichen Sun, Jiabo Ma, Lishuang Feng, Jinzhuo Wang, Hao Chen
ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive experiments on biomarker prediction, gene mutation prediction, cancer subtyping, and survival analysis across over 10 WSI datasets, our method demonstrates superior performance compared to the state-of-the-art approaches. Tab. 1 shows the evaluation results on classification tasks, including biomarker prediction (BCNB-ER), gene mutation prediction (TCGA-LUAD TP53), and cancer subtyping (UBC-OCEAN). Moreover, Tab. 2 demonstrates the survival prediction results across eight TCGA cancer types. 5.2. Ablation Study |
| Researcher Affiliation | Academia | 1Hong Kong University of Science and Technology, HK, China 2Beijing Institute of Collaborative Innovation, Beijing, China 3Peking University, Beijing, China. Correspondence to: Zhengrui Guo <EMAIL>, Hao Chen <EMAIL>. |
| Pseudocode | Yes | A. Algorithm Pseudo Code of Querent Algorithm 1 Querent: Query-Aware Dynamic Long Sequence Modeling for Gigapixel WSI Analysis |
| Open Source Code | Yes | Codes are here. |
| Open Datasets | Yes | BCNB1 (Xu et al., 2021) for biomarker prediction. 1https://bcnb.grand-challenge.org/ TCGA-LUAD2 (Tomczak et al., 2015) for TP53 gene mutation prediction. 2https://portal.gdc.cancer.gov/ UBC-OCEAN3 for ovarian cancer subtyping. 3https://www.kaggle.com/competitions/UBC-OCEAN TCGA Subsets4 (Tomczak et al., 2015) for survival analysis. 4https://portal.gdc.cancer.gov/ |
| Dataset Splits | Yes | We use 5-fold cross-validation for model training and evaluation and report the results mean and standard deviation. |
| Hardware Specification | No | The paper mentions "batch size of 1 WSI per GPU" but does not provide specific hardware models like NVIDIA A100 or CPU details. |
| Software Dependencies | No | The paper mentions "Py Torch" but does not specify its version number or any other software dependencies with version numbers. |
| Experiment Setup | Yes | We implement Querent using Py Torch. For the feature extraction backbone, we utilize CPath pre-trained vision-language foundation model PLIP (Huang et al., 2023) to obtain 512-dimensional patch features. Each region contains 16/24/28 patches (depending on the datasets used), which provides a good balance between computational efficiency and contextual coverage. The model consists of 8 attention heads, with the hidden dimension set to 512. For the region-level metadata networks (fmin and fmax), we use single-layer perceptrons with GELU activation. The query projection network fq shares the same architecture. During the region importance estimation, we select the top-16 most relevant regions for each query patch, which empirically provides sufficient contextual information while maintaining computational efficiency. The attention pooling network fa consists of a two-layer MLP with a hidden dimension of 512 and GELU activation. Dropout with a rate of 0.1 is applied throughout the network to prevent overfitting. We train the model using the Adam W optimizer with a learning rate of 1e 4 for classification tasks and 2e 4 for survival analysis, with a weight decay of 1e 5. The model is trained for 50 epochs with a batch size of 1 WSI per GPU. We employ gradient clipping with a maximum norm of 1.0 to ensure stable training. |