Multimodal Inference with Incremental Tabular Attributes

Authors: Xinda Chen, Zhen Xing, Zixian Zhang, Weimin Tan, Bo Yan

IJCAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated the MIITA framework for six tasks in five widely used datasets that span entirely different domains, each containing both tabular and visual modalities. As shown in Table 1, MIITA consistently outperformed state-of-the-art (SOTA) single-modal and multimodal methods, representing MIITA is a general framework applicable for all scenarios including medical, advertisement and game. The visualization result of two difficult samples is shown in Figure 3. All results were averaged over four runs to mitigate randomness, with the corresponding variances provided in the appendix.
Researcher Affiliation Academia Shanghai Key Laboratory of Intelligent Information Processing, Computation and Artificial Intelligence Innovative College, Fudan University, Shanghai, China
Pseudocode No The paper describes the MIITA framework and its components using natural language and mathematical equations (e.g., Lrec = Eqϕ(zinc|xinc) xinc ˆxinc 2), but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statement about releasing source code, nor does it provide a link to a code repository.
Open Datasets Yes We use five public datasets that are commonly used in multimodal learning areas using tabular and visual modalities: ADNI (AD) [Jr et al., 2008], Data Visual Marketing (DV) [Huang et al., 2023], Pokemon Primary Type (PK), Hearth Stone Card s category (HS), CS:GO skin quality (CG) [Lu et al., 2023].
Dataset Splits No In each, we remove specific columns from the train and validation sets while keeping them in the inference set to simulate real-world incremental tabular inference. The deleted columns were chosen logically, reflecting their historical appearance. Details are in the appendix. The main text mentions train, validation, and inference sets but does not provide specific percentages or sample counts for these splits.
Hardware Specification No The computations in this research were performed using the CFFF platform of Fudan University. This mentions a platform but does not specify any particular hardware components like CPU/GPU models or memory.
Software Dependencies No The paper mentions various models and frameworks used as baselines or components (e.g., XGBoost, SCARF, FT-Transformer, Sim CLR, etc.) and refers to general methods (e.g., 'backward difference encoder'). However, it does not provide specific version numbers for any software dependencies (e.g., 'PyTorch 1.9', 'Python 3.8').
Experiment Setup Yes Key setting of MIITA are: β (VAE) {2, 5, 10}, λ1 (KL) / λ2 (cov-cross) {0.5, 1.0, 2.0}, and pseudo-label threshold = 0.7. The loss weights in Eq.13 were adjusted based on early gradient magnitudes. The default setting is {1, 0.5, 0.8, 0.4, 1}.