Multimodal Inference with Incremental Tabular Attributes
Authors: Xinda Chen, Zhen Xing, Zixian Zhang, Weimin Tan, Bo Yan
IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated the MIITA framework for six tasks in five widely used datasets that span entirely different domains, each containing both tabular and visual modalities. As shown in Table 1, MIITA consistently outperformed state-of-the-art (SOTA) single-modal and multimodal methods, representing MIITA is a general framework applicable for all scenarios including medical, advertisement and game. The visualization result of two difficult samples is shown in Figure 3. All results were averaged over four runs to mitigate randomness, with the corresponding variances provided in the appendix. |
| Researcher Affiliation | Academia | Shanghai Key Laboratory of Intelligent Information Processing, Computation and Artificial Intelligence Innovative College, Fudan University, Shanghai, China |
| Pseudocode | No | The paper describes the MIITA framework and its components using natural language and mathematical equations (e.g., Lrec = Eqϕ(zinc|xinc) xinc ˆxinc 2), but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We use five public datasets that are commonly used in multimodal learning areas using tabular and visual modalities: ADNI (AD) [Jr et al., 2008], Data Visual Marketing (DV) [Huang et al., 2023], Pokemon Primary Type (PK), Hearth Stone Card s category (HS), CS:GO skin quality (CG) [Lu et al., 2023]. |
| Dataset Splits | No | In each, we remove specific columns from the train and validation sets while keeping them in the inference set to simulate real-world incremental tabular inference. The deleted columns were chosen logically, reflecting their historical appearance. Details are in the appendix. The main text mentions train, validation, and inference sets but does not provide specific percentages or sample counts for these splits. |
| Hardware Specification | No | The computations in this research were performed using the CFFF platform of Fudan University. This mentions a platform but does not specify any particular hardware components like CPU/GPU models or memory. |
| Software Dependencies | No | The paper mentions various models and frameworks used as baselines or components (e.g., XGBoost, SCARF, FT-Transformer, Sim CLR, etc.) and refers to general methods (e.g., 'backward difference encoder'). However, it does not provide specific version numbers for any software dependencies (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | Key setting of MIITA are: β (VAE) {2, 5, 10}, λ1 (KL) / λ2 (cov-cross) {0.5, 1.0, 2.0}, and pseudo-label threshold = 0.7. The loss weights in Eq.13 were adjusted based on early gradient magnitudes. The default setting is {1, 0.5, 0.8, 0.4, 1}. |