Context Matters: Query-aware Dynamic Long Sequence Modeling of Gigapixel Images

Authors: Zhengrui Guo, Qichen Sun, Jiabo Ma, Lishuang Feng, Jinzhuo Wang, Hao Chen

ICML 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive experiments on biomarker prediction, gene mutation prediction, cancer subtyping, and survival analysis across over 10 WSI datasets, our method demonstrates superior performance compared to the state-of-the-art approaches. Tab. 1 shows the evaluation results on classification tasks, including biomarker prediction (BCNB-ER), gene mutation prediction (TCGA-LUAD TP53), and cancer subtyping (UBC-OCEAN). Moreover, Tab. 2 demonstrates the survival prediction results across eight TCGA cancer types. 5.2. Ablation Study
Researcher Affiliation Academia 1Hong Kong University of Science and Technology, HK, China 2Beijing Institute of Collaborative Innovation, Beijing, China 3Peking University, Beijing, China. Correspondence to: Zhengrui Guo <EMAIL>, Hao Chen <EMAIL>.
Pseudocode Yes A. Algorithm Pseudo Code of Querent Algorithm 1 Querent: Query-Aware Dynamic Long Sequence Modeling for Gigapixel WSI Analysis
Open Source Code Yes Codes are here.
Open Datasets Yes BCNB1 (Xu et al., 2021) for biomarker prediction. 1https://bcnb.grand-challenge.org/ TCGA-LUAD2 (Tomczak et al., 2015) for TP53 gene mutation prediction. 2https://portal.gdc.cancer.gov/ UBC-OCEAN3 for ovarian cancer subtyping. 3https://www.kaggle.com/competitions/UBC-OCEAN TCGA Subsets4 (Tomczak et al., 2015) for survival analysis. 4https://portal.gdc.cancer.gov/
Dataset Splits Yes We use 5-fold cross-validation for model training and evaluation and report the results mean and standard deviation.
Hardware Specification No The paper mentions "batch size of 1 WSI per GPU" but does not provide specific hardware models like NVIDIA A100 or CPU details.
Software Dependencies No The paper mentions "Py Torch" but does not specify its version number or any other software dependencies with version numbers.
Experiment Setup Yes We implement Querent using Py Torch. For the feature extraction backbone, we utilize CPath pre-trained vision-language foundation model PLIP (Huang et al., 2023) to obtain 512-dimensional patch features. Each region contains 16/24/28 patches (depending on the datasets used), which provides a good balance between computational efficiency and contextual coverage. The model consists of 8 attention heads, with the hidden dimension set to 512. For the region-level metadata networks (fmin and fmax), we use single-layer perceptrons with GELU activation. The query projection network fq shares the same architecture. During the region importance estimation, we select the top-16 most relevant regions for each query patch, which empirically provides sufficient contextual information while maintaining computational efficiency. The attention pooling network fa consists of a two-layer MLP with a hidden dimension of 512 and GELU activation. Dropout with a rate of 0.1 is applied throughout the network to prevent overfitting. We train the model using the Adam W optimizer with a learning rate of 1e 4 for classification tasks and 2e 4 for survival analysis, with a weight decay of 1e 5. The model is trained for 50 epochs with a batch size of 1 WSI per GPU. We employ gradient clipping with a maximum norm of 1.0 to ensure stable training.