reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

AnyTouch: Learning Unified Static-Dynamic Representation across Multiple Visuo-tactile Sensors

Authors: Ruoxuan Feng, Jiangyu Hu, Wenke Xia, Tianci Gao, Ao Shen, Yuhao Sun, Bin Fang, Di Hu

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct both quantitative and qualitative experiments to analyze the transferability of multi-sensor data and assess the impact of our framework on the multi-sensor representation space. Building on this, we comprehensively evaluate the static and dynamic tactile perception capabilities of Any Touch across various tactile datasets and through a real-world experiment: fine-grained pouring. The experimental results demonstrate the static and dynamic perception abilities and cross-sensor transferability of Any Touch.
Researcher Affiliation	Academia	1Renmin University of China 2Wuhan University of Science and Technology 3Beijing University of Posts and Telecommunications
Pseudocode	No	The paper describes the framework components and training paradigm in text and diagrams (e.g., Figure 2) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The code, Tac Quad dataset and Any Touch model are fully available at gewu-lab.github.io/Any Touch/.
Open Datasets	Yes	The code, Tac Quad dataset and Any Touch model are fully available at gewu-lab.github.io/Any Touch/. We use 9 different tactile datasets for training, including: Touch and Go (TAG) (Yang et al., 2022), Vis Gel (Li et al., 2019), Cloth (Yuan et al., 2018), Object Folder Real (OF Real) (Gao et al., 2023) , TVL (Fu et al., 2024), YCB-Slide (Suresh et al., 2023) and SSVTP (Kerr et al., 2022), Octopi (Yu et al., 2024) and the coarse-grained subset of our Tac Quad.
Dataset Splits	Yes	We follow the data split in (Yang et al., 2024; Cheng et al., 2024) for Feel. Object Folder 1.0 and Oject Folder 2.0 are two simulated object datasets using TACTO (Wang et al., 2022a) and Taxim (Si & Yuan, 2022). We use them as unseen datasets from unseen sensors, and follow the data split in Yang et al. (2024).
Hardware Specification	Yes	We train the first stage for 20 epochs and the second stage for 12 epochs on 4 NVIDIA A800 GPUs.
Software Dependencies	No	We base our encoders on Open CLIP-Large (Cherti et al., 2023). No explicit version numbers for software components like Python, PyTorch, CUDA, or Open CLIP-Large are provided.
Experiment Setup	Yes	We use the Adam W (Loshchilov, 2017) optimizer with a learning rate of 2e-4. After a warm-up period of 1 epoch, we implement linear learning rate decay. For each tactile video clip, we use T = 3 frames. We train the first stage for 20 epochs and the second stage for 12 epochs... We use a mask ratio ρ = 0.75. During the alignment, we use the text modality as the anchor, freezing the text encoder while performing Lo RA fine-tuning on the vision encoder. We set the alignment strength αTV = αTL = 1.0 and αVL = 0.2, and set the weight of cross-sensor matching λ = 0.1. Following (Yang et al., 2024), we use L = 5 sensor tokens for each type of sensor. In both stages, we set the probability of using universal sensor tokens pu to increase linearly from 0 to 0.75.