reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

MuHBoost: Multi-Label Boosting For Practical Longitudinal Human Behavior Modeling

Authors: Nguyen Thach, Patrick Habecker, Anika Eisenbraun, W. Alex Mason, Kimberly Tyler, Bilal Khan, Hau Chan

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to evaluate Mu HBoost and its variants on 13 health and well-being prediction tasks defined from four realistic ubiquitous health datasets. Our results show that our three developed methods outperform all considered baselines across three standard MLC metrics, demonstrating their effectiveness while ensuring resource efficiency.
Researcher Affiliation	Academia	Correspondence to EMAIL. University of Nebraska-Lincoln. Lehigh University.
Pseudocode	Yes	Algorithm 1 Cluster Sampling (Extended) ... Algorithm 2 Mu HBoost[CC]
Open Source Code	Yes	We refer readers to Sections 4.1 and 4.2 as well as Appendices C.1 and C.2 for complete details on reproducing our results, including the link to our anonymous Git Hub repository10. ... 10https://github.com/Anony Mouse3005/Mu HBoost
Open Datasets	Yes	We consider four ubiquitous health datasets that contain both time-series and auxiliary data as well as sufficient information to form multiple labels for each: Life Snaps (Yfantidou et al., 2022), GLOBEM (Xu et al., 2022), college students (Co St) (Hayat & Hasan, 2023), and PWUD (Tyler et al., 2024). The first two are publicly available and have previously been considered by Englhardt et al. (2024); Kim et al. (2024) (as mentioned in Section 2), whereas the latter two are novel and require submitting an IRB protocol and ethical research plan to their authors.
Dataset Splits	Yes	For each considered set of experimental configurations, we used the split ratio of 50/10/40 (for train/validation/test set), which follows Summary Boost s evaluation, for a total of 10 different splits. For partitioning multi-label datasets into training (train+validation) and test sets, we adopt the iterative stratification algorithm13 (Sechidis et al., 2011).
Hardware Specification	Yes	All experiments were conducted under Ubuntu 20.04 on a Linux virtual machine equipped with NVIDIA GeForce RTX 3050 Ti GPU and 12th Gen Intel(R) Core(TM) i7-12700H CPU @ 2.3GHz.
Software Dependencies	Yes	We used PyTorch 1.13, CUDA 11.7, Open AI 1.23, and scikit-learn 1.3.
Experiment Setup	Yes	The number of boosting rounds T is set to a large value of 100 for training till convergence (typically within 10 20 rounds) and the size of the representative subset s is set as large as possible11 (without exceeding the maximum context length) to 10. We set µ, the nonzero hyperparameter for raising the bar for each weak learner, to 0 since we notice no significant increases in predictive performance otherwise. The stopping threshold at each round for both Mu HBoost and Mu HBoost[LP+], defined by 1 1/K µ, is hence 1 1/ min{N, 2Q} (below which is considered satisfactory and no further resampling is needed in accordance with Summary Boost). For Mu HBoost[CC] (K = 2), we introduce the discount factor γ = 0.95 into this threshold, which becomes 1 (1/2)γq for each label q [0, Q 1], to relax the training further down the chain (i.e., subject to error propagation).