reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic Perspective

Authors: Yujin Oh, Pengfei Jin, Sangjoon Park, Sekeun Kim, Siyeop Yoon, Jin Sung Kim, Kyungsang Kim, Xiang Li, Quanzheng Li

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the effectiveness of d Mo E based on the segmentation task on multiple clinical datasets with diverse segmentation masks for diagnosis and treatment planning tasks. Experimental results demonstrate that d Mo E not only advances state-of-the-art (SOTA) fairness learning approaches but also presents a way to incorporate distributional attributes to provide robust and equitable diagnosis and clinical decision-making across diverse demographic and clinical attributes. Extensive medical image segmentation experiments for diagnosis and treatment planning demonstrate d Mo E s robustness, demonstrating its effectiveness in mitigating biases from imbalanced medical data distributions. We conduct extensive experiments on two benchmark datasets and an in-house dataset.
Researcher Affiliation	Academia	1Center for Advanced Medical Computing and Analysis (CAMCA), Department of Radiology, Massachusetts General Hospital (MGH) and Harvard Medical School, MA 02114, USA 2Department of Radiation Oncology, Yonsei University College of Medicine, Yonsei University, Seoul 03772, Republic of Korea 3Institute for Innovation in Digital Healthcare, Yonsei University, Seoul 03772, Republic of Korea 4Oncosoft Inc, Seoul 03776, Republic of Korea. Correspondence to: Xiang Li <EMAIL>, Quanzheng Li <EMAIL>.
Pseudocode	No	The paper includes equations for the MoE layer, router network, and optimization, and detailed network architectures in tables in Appendix A.1, but no section or figure explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	The source code is available at https://github.com/tvseg/d Mo E.
Open Datasets	Yes	For the 2D segmentation experiments, we utilize two datasets: 1) Harvard-Fair Seg (Tian et al., 2024) and 2) HAM10000 (Tschandl et al., 2018). Harvard-Fair Seg is a scanning laser ophthalmoscopy (SLO) fundus image dataset comprising 10,000 samples with pixel-wise optic cup and outer neuroretinal rim segmentation masks for diagnosing glaucoma. HAM10000 is a dermatology image dataset comprising 10,015 2D RGB samples with binary segmentation masks for diagnosing skin lesions.
Dataset Splits	Yes	Fairness and segmentation performance are evaluated on the test benchmark, which consists of 2,000 samples. For age categorization, patients are divided into four groups at 20-year intervals, with the test benchmark consisting of 1,061 samples. To address biases arising from the imbalanced T-stage distribution when training deep neural networks for radiotherapy target segmentation, we utilize a training dataset comprising 721 primary prostate cancer patients from Yonsei Cancer Center, Seoul, South Korea, and validate network performance using an independent test set, composed of 132 primary prostate cancer patients from Yongin Severance Hospital, Yongin, South Korea and 143 test samples from Gangnam Severance Hospital, Seoul, South Korea. We further provide the trainset and testset for each dataset, along with the attribute subgroup-wise data distribution and percentiles for the trainset in Table 6.
Hardware Specification	Yes	For 2D segmentation tasks, we use a single NVIDIA A100 80GB GPU, while for the 3D segmentation task, we use a single NVIDIA RTX A6000 48GB GPU.
Software Dependencies	Yes	We implement the networks using Py Torch (Paszke et al., 2019) in Python with CUDA 11.8 The Adam W (Loshchilov & Hutter, 2017) optimizer with exponential learning rate decay is used for all experiments.
Experiment Setup	Yes	For all experiments, we set the d Mo E module hyperparameters with Top-𝑘as 2 and the number of experts 𝑛as 8. Each expert layer consists of a standard MLP with two linear layers, a Re LU activation, and a dropout layer. For 2D segmentation tasks, we use Trans UNet (Chen et al., 2021) as the backbone with the standard Vi T-B architecture. The input images are center-cropped and resized into 2D patches of size 224 224 pixels with a batch size of 42. The network is trained with a learning rate of 0.01 for 300 epochs on the Harvard-Fair Seg dataset and 100 epochs on HAM10000, following the benchmark setting. For 3D radiotherapy target segmentation task, we adopt the 3D Residual U-Net (Çiçek et al., 2016) as the backbone architecture. The network is trained using randomly cropped 3D patches of size 384 384 128 voxels and a batch size of 4. The training is conducted with a learning rate of 5 10 5 over 100 epochs and early stopping based on the validation dataset.