TextToucher: Fine-Grained Text-to-Touch Generation
Authors: Jiahang Tu, Hao Fu, Fengyu Yang, Hanbin Zhao, Chao Zhang, Hui Qian
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the superiority of our Text Toucher method. [...] We conduct experiments on two representative datasets. [...] Quantitative Evaluation. The quantitative results are presented in Tab. 1. [...] Qualitative Evaluation. Fig. 4 shows the qualitative comparisons with alternative methods. [...] Ablation Studies Text Condition Types. In Tab. 2, we explore various combinations of text condition types in the text-to-touch generation task. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Zhejiang University 2College of Computer Science and Technology, Yale University 3Uni X AI EMAIL, EMAIL |
| Pseudocode | No | The paper describes the methodology, including multimodal large language model annotation and dual-grain text conditioning design, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code https://github.com/Ttu Hamg/Text Toucher |
| Open Datasets | Yes | Datasets We conduct experiments on two representative datasets. HCT (Fu et al. 2024) comprises visual-tactile data collected using a handheld 3D-printed data collection device. [...] Another dataset, SSVTP (Kerr et al. 2022), utilizes a UR5 robotic arm equipped with an RGB camera and a tactile sensor to collect data from various deformable surface environments, such as clothing seams, buttons, and zippers. |
| Dataset Splits | No | The paper mentions using the HCT and SSVTP datasets but does not specify how these datasets were split into training, validation, and test sets, nor does it refer to predefined standard splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions several models and frameworks such as 'Diffusion Transformer (Di T) (Peebles and Xie 2023)', 'LLa VA (Liu et al. 2024)', and 'T5 language model (Raffel et al. 2020)', but it does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | Timestep threshold θt. We study the effect of varying tactile text conditions at different sampling timesteps for tactile generation. We set c = {csen, cobj} (three types of tactile information) for early timesteps before θt, and use tactile texture and shape description c afterwards. The results in Tab. 6 show that this approach with time-varying tactile text conditions improves model performance, achieving the best results at θt of 600 or 800 timesteps. [...] Token number ngs. We study the impact of varying the number of tokens ngs in gel status embeddings. As shown in Tab. 5, we find that using ngs = 4 tokens effectively represents different gel status and surpasses other settings across various metrics by a margin. |