reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

An Intelligent Agentic System for Complex Image Restoration Problems

Authors: Kaiwen Zhu, Jinjin Gu, Zhiyuan You, Yu Qiao, Chao Dong

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate Agentic IR s potential in handling complex IR tasks, representing a promising path toward achieving general intelligence in visual processing. Our experiments demonstrate the potential of Agentic IR in solving real-world problems. In the experimental section, we first build the test dataset. We designed 16 combinations of mixed degradations involving 2 or 3 types of degradation and divided them into three groups: A, B, and C. Group A contains 8 combinations, while groups B and C each contain 4 combinations. The degradation combinations in groups A and B consist of 2 degradations, whereas those in group C consist of 3 degradations to simulate more complex situations. During the exploration phase, the agent is exposed only to group A that is, the agent is familiar with the degradations present in group A but is unaware of those in groups B and C. This setup helps us investigate the system s generalization ability. We applied each of the 16 degradation combinations to every one of the 100 images in the Mi O100 (Kong et al., 2024a;b) dataset. For each combination in group A, we allocated 20 images for exploration, totaling 160 images. The remaining 1,440 images are used for testing. More detailed information can be found in Appendix A.3.
Researcher Affiliation	Academia	Kaiwen Zhu1,2 Jinjin Gu3 Zhiyuan You4,5 Yu Qiao2 Chao Dong5,2 1Shanghai Jiao Tong University 2Shanghai Artificial Intelligence Laboratory 3The University of Sydney 4The Chinese University of Hong Kong 5Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences EMAIL, EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	A.1 INFERENCE WORKFLOW Algorithm 1 and 2 describe the entire inference workflow of Agentic IR, where the functions EVALUATE, SCHEDULE, REFLECT, RESCHEDULE and PICKBEST are implemented by LLMs. Algorithm 1: Inference workflow Input: Low-quality image I Output: Restored high-quality image 1 agenda EVALUATE(I); 2 plan SCHEDULE(agenda); 3 while plan is not empty do 4 I, success DFS(I, plan); 5 if success then 6 output I; 7 else 8 plan the remaining plan for I; 9 output I; Algorithm 2: DFS Input: Image I, list of subtasks plan Output: Restored image, successful or not 1 if plan is empty then 2 output I, true; 3 attempts ; 4 inferiors ; 5 while true do 6 subtask the first subtask of plan; 7 I result of subtask on I; 8 pass REFLECT( I, subtask); 9 if pass then 10 Remove subtask from plan; 11 I, success DFS( I, plan); 12 if success then 13 output I, true; 14 else 15 Add subtask to attempts; 16 Add I to inferiors; 17 if size of attempts = size of plan then 18 plan RESCHEDULE(plan, attempts); 19 I PICKBEST(inferiors); 20 output I, false;
Open Source Code	Yes	The code is available at https://github.com/Kaiwen-Zhu/Agentic IR.
Open Datasets	Yes	We applied each of the 16 degradation combinations to every one of the 100 images in the Mi O100 (Kong et al., 2024a;b) dataset. For each combination in group A, we allocated 20 images for exploration, totaling 160 images. The remaining 1,440 images are used for testing. More detailed information can be found in Appendix A.3. DIV2K (Agustsson & Timofte, 2017) and Flickr2K (Timofte et al., 2017) datasets are used for fine-tuning Depict QA (You et al., 2024a) on degradation evaluation.
Dataset Splits	Yes	We applied each of the 16 degradation combinations to every one of the 100 images in the Mi O100 (Kong et al., 2024a;b) dataset. For each combination in group A, we allocated 20 images for exploration, totaling 160 images. The remaining 1,440 images are used for testing. More detailed information can be found in Appendix A.3.
Hardware Specification	Yes	We fine-tune Depict QA for one epoch with batch size 64 on 4 NVIDIA Tesla V100 GPUs, using learning rate 0.0005, weight decay 0.001, and Adam optimizer (β1 = 0.9, β2 = 0.95).
Software Dependencies	No	The paper mentions using GPT-4, and Adam optimizer parameters (β1 = 0.9, β2 = 0.95), but does not provide specific software dependencies with version numbers like Python 3.8 or PyTorch 1.9.
Experiment Setup	Yes	We fine-tune Depict QA for one epoch with batch size 64 on 4 NVIDIA Tesla V100 GPUs, using learning rate 0.0005, weight decay 0.001, and Adam optimizer (β1 = 0.9, β2 = 0.95).