Diversifying Query: Region-Guided Transformer for Temporal Sentence Grounding
Authors: Xiaolong Sun, Liushuai Shi, Le Wang, Sanping Zhou, Kun Xia, Yabing Wang, Gang Hua
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of RGTR, outperforming state-of-the-art methods on three public benchmarks and exhibiting good generalization and robustness on out-of-distribution splits. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, National Engineering Research Center for Visual Information and Applications, and Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University 2Multimodal Experiences Research Lab, Dolby Laboratories |
| Pseudocode | No | The paper describes methods and processes in paragraph form and through architectural diagrams but does not contain a clearly labeled 'Pseudocode' or 'Algorithm' block with structured steps. |
| Open Source Code | Yes | Code https://github.com/TensorsSun/RGTR |
| Open Datasets | Yes | We evaluate the proposed method on three temporal sentence grounding benchmarks, including the QVHighlights (Lei, Berg, and Bansal 2021), Charades STA (Gao et al. 2017), and TACo S (Regneri et al. 2013). |
| Dataset Splits | Yes | We adopt the Recall@1 (R1) under the Io U thresholds of 0.3, 0.5, and 0.7. Since QVHighlights contains multiple ground-truth moments per sentence, we also report the mean average precision (m AP) with Io U thresholds of 0.5, 0.75, and the average m AP over a set of Io U thresholds [0.5: 0.05: 0.95]. For Charades-STA and TACo S, we compute the mean Io U of top-1 predictions. |
| Hardware Specification | No | The paper mentions using Slow Fast and CLIP for feature extraction and refers to training models, but it does not specify any particular GPU models, CPU models, or detailed hardware specifications used for experiments. |
| Software Dependencies | No | The paper mentions using pre-trained CLIP and Slowfast models, and the Adam W optimizer, but does not provide specific version numbers for these or other software libraries/frameworks. |
| Experiment Setup | Yes | We set the embedding dimension D to 256. The number of anchor pairs K is set to 20 for QVHighlights, 10 for Charades-STA and TACo S. The NMS threshold is set to 0.8. The balancing parameters are set as: λalign = 0.3, λiou = 1, and λsal is set as 1 for QVHighlights, 4 for Charades-STA and TACo S. We train all models with batch size 32 for 200 epochs using the Adam W optimizer with weight decay 1e-4. The learning rate is set to 1e-4. |