Target Scanpath-Guided 360-Degree Image Enhancement
Authors: Yujia Wang, Fang-Lue Zhang, Neil A. Dodgson
AAAI 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results have demonstrated the effectiveness of our approach for scanpath-guided 360-degree image enhancement. ... Quantitative Evaluation of TASSC As shown in Table 1, the Silhouette Coefficient of 0.47 indicates good clustering, with data points well-grouped within clusters and separated from others. ... Quantitative Results of Visual Guidance Table 2 shows that the 200 enhanced images exhibit significant improvements across all metrics: LEV decreased by an average of 10.98%, DTW decreased by 6.78%, and REC increased by 17.84%. |
| Researcher Affiliation | Academia | Victoria University of Wellington, New Zealand EMAIL |
| Pseudocode | Yes | Algorithm 1: Temporal Alignment and Spatial Similarity Clustering (TASSC) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. It discusses limitations and future work without mentioning code availability. |
| Open Datasets | No | Currently, there is no existing dataset for the scanpath-guided 360 image enhancement task. Inspired by Sal G-GAN (Jiang et al. 2021), we constructed a dataset of 1000 360 image pairs to train our model. ... To evaluate the performance and generalizability of our model, we constructed a test set of 200 360 images, which includes: (i) 200 source images captured by us, covering various scenes. |
| Dataset Splits | Yes | we constructed a dataset of 1000 360 image pairs to train our model. ... To evaluate the performance and generalizability of our model, we constructed a test set of 200 360 images |
| Hardware Specification | Yes | The entire training process was carried out on two NVIDIA A100 GPUs. |
| Software Dependencies | Yes | We employed the Low-Rank Adaptation (Lo RA) technique to fine-tune the Stable Diffusion v1-5 model for our 360 image enhancement task. Specifically, we applied Lo RA to all attention modules within the U-Net... Finally, the selected points P and image features F are input into the SAM decoder to generate the final segmentation mask: Mediting. |
| Experiment Setup | Yes | Training Details The second and third stages are separately trained. ... When training the second stage, all parameters of SAM were frozen, and we only updated the parameters of the DSDE and Attention modules. We used the Adam optimizer with a gradually decreasing learning rate from 1e4 to 1e-6. The training lasted for 100 epochs with a batch size of 32. During the fine-tuning of the SD v1-5 model, we only updated the parameters of matrices A and B, keeping the original model weights W unchanged. We used the Adam optimizer with a global learning rate (η) set to 1e4, and designed specific learning rate scaling factors (βi) for each layer. Max-norm regularization was applied, with maxnorm set to 1, to enhance training stability. This training lasted for 50 epochs with a batch size of 16. |