Constructing Fair Latent Space for Intersection of Fairness and Explainability

Authors: Hyungjun Joo, Hyeonggeun Han, Sehwan Kim, Sangwoo Hong, Jungwoo Lee

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the fair latent space with various fairness metrics and demonstrate that our approach can effectively provide explanations for biased decisions and assurances of fairness. We conducted experiments with the Diffusion Autoencoder (Diff AE) (Preechakul et al. 2022), a recent encoder-decoder structure generative model, to evaluate our proposed approach and demonstrate its practical applicability. We evaluated the fairness of the latent space by using fairness metrics. The classification results are shown in Tab. 1.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Seoul National University 2Next Quantum, Seoul National University 3Hodoo AI Labs EMAIL
Pseudocode No The paper describes methods through mathematical equations and textual explanations, but does not include structured pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state that source code for the methodology is provided, nor does it include a link to a code repository.
Open Datasets Yes We conduct experiments on Diff AE using three datasets. In Celeb A (Liu et al. 2015), which features 40 binary attribute labels... In Celeb AHQ (Karras et al. 2018), a high-resolution face image dataset... For UTK Face (Zhang, Song, and Qi 2017), which includes annotations for gender, age, and ethnicity...
Dataset Splits No Let B = {X, Y, S} = {(xi, yi, si)}n i=1 represent the training batch, comprising images xi, labels yi, and sensitive attributes si. When we transformed the representations obtained from the Celeb AHQ test dataset according to the unit vector of attractiveness-classifier ˆh, the original model exhibited a clear correlation as shown in Fig. 4(L).
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No We conducted experiments with the Diffusion Autoencoder (Diff AE) (Preechakul et al. 2022), a recent encoder-decoder structure generative model, to evaluate our proposed approach and demonstrate its practical applicability. Additionally, we employed Glow (Kingma and Dhariwal 2018) as the invertible neural network. We conducted gender classification on the generated images using the CLIP model (VIT/B-32) with two classifier prompts: photo of a male, man, or boy and photo of a female, woman, or girl , following previous works (Cho, Zala, and Bansal 2023; Shrestha et al. 2024).
Experiment Setup No The loss function for constructing a fair latent space in training the invertible network is derived by integrating these losses. Drawing inspiration from the technique of segregating information into distinct dimensions (Esser, Rombach, and Ommer 2020), we decompose the latent dimensions in the representation from the invertible network as Z = [ZY , ZS] Rd where ZY Rdy and ZS Rds. The loss function, based on theoretical justification and designed to construct a fair latent space, is defined as follows. Lfair(ZY ) = λdg Ldg(ZY ) + λeq Leq(ZY ) + λdi Ldi(ZY ).