AudioGenX: Explainability on Text-to-Audio Generative Models
Authors: Hyunju Kang, Geonhee Han, Yoonjae Jeong, Hogun Park
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of Audio Gen X in producing faithful explanations, benchmarked against existing methods using novel evaluation metrics specifically designed for audio generation tasks. |
| Researcher Affiliation | Collaboration | Hyunju Kang1*, Geonhee Han1*, Yoonjae Jeong2, Hogun Park1 1Department of Artificial Intelligence, Sungkyunkwan University, Suwon, Republic of Korea 2Audio AI Lab, NCSOFT, Seongnam, Republic of Korea EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Audio Gen X |
| Open Source Code | Yes | Our code is available at the following link 1. 1https://github.com/hjkng/audiogenX |
| Open Datasets | Yes | We use Audio Caps (Kim et al. 2019) as the source of textual prompts. |
| Dataset Splits | Yes | For hyperparameter tuning, we select 100 validation captions, while the test dataset consists of 1,000 randomly selected captions. |
| Hardware Specification | No | The paper mentions memory usage in MB for efficiency analysis, but no specific hardware details (e.g., GPU/CPU models, RAM size) used for running the experiments are provided. |
| Software Dependencies | No | The paper mentions using the Adam optimizer, but no specific version numbers for software components like Python, PyTorch, or CUDA are provided. |
| Experiment Setup | Yes | The Explainer is trained for 50 epochs with a learning rate as 10 3 using the Adam optimizer. Hyperparameters are set as α = 1 10 3 and β = 1 10 1 as coefficients for the explanation objective function. |