AudioGenX: Explainability on Text-to-Audio Generative Models

Authors: Hyunju Kang, Geonhee Han, Yoonjae Jeong, Hogun Park

AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of Audio Gen X in producing faithful explanations, benchmarked against existing methods using novel evaluation metrics specifically designed for audio generation tasks.
Researcher Affiliation Collaboration Hyunju Kang1*, Geonhee Han1*, Yoonjae Jeong2, Hogun Park1 1Department of Artificial Intelligence, Sungkyunkwan University, Suwon, Republic of Korea 2Audio AI Lab, NCSOFT, Seongnam, Republic of Korea EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Audio Gen X
Open Source Code Yes Our code is available at the following link 1. 1https://github.com/hjkng/audiogenX
Open Datasets Yes We use Audio Caps (Kim et al. 2019) as the source of textual prompts.
Dataset Splits Yes For hyperparameter tuning, we select 100 validation captions, while the test dataset consists of 1,000 randomly selected captions.
Hardware Specification No The paper mentions memory usage in MB for efficiency analysis, but no specific hardware details (e.g., GPU/CPU models, RAM size) used for running the experiments are provided.
Software Dependencies No The paper mentions using the Adam optimizer, but no specific version numbers for software components like Python, PyTorch, or CUDA are provided.
Experiment Setup Yes The Explainer is trained for 50 epochs with a learning rate as 10 3 using the Adam optimizer. Hyperparameters are set as α = 1 10 3 and β = 1 10 1 as coefficients for the explanation objective function.