Repositioning the Subject within Image
Authors: Yikai Wang, Chenjie Cao, Ke Fan, Qiaole Dong, Yifan Li, Xiangyang Xue, Yanwei Fu
TMLR 2024 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To assess SEELE s effectiveness in subject repositioning, we assemble a real-world subject repositioning dataset called Re S. Results of SEELE on Re S demonstrate its efficacy. Code and Re S dataset are available at https://yikai-wang.github.io/seele/. |
| Researcher Affiliation | Collaboration | 1Fudan University; 2DAMO Academy, Alibaba Group; 3Hupan Lab |
| Pseudocode | No | The paper describes the SEELE framework and its components using textual descriptions and flow diagrams, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and Re S dataset are available at https://yikai-wang.github.io/seele/. |
| Open Datasets | Yes | Code and Re S dataset are available at https://yikai-wang.github.io/seele/. We release the Re S dataset at https://yikai-wang.github.io/ seele/ to encourage research in subject repositioning. When addressing subject moving and completion, we employ the MSCOCO dataset (Lin et al., 2014), which provides object masks. For image harmonization, the i Harmony4 dataset (Cong et al., 2020) is utilized, offering unharmonized-harmonized image pairs along with subject-to-harmonize masks. |
| Dataset Splits | Yes | MSCOCO comprises 80k training images and 40k testing images, while i Harmony4 includes 65k training images and 7k testing images. |
| Hardware Specification | Yes | Training is conducted on two A6000 GPUs over 9,000 steps |
| Software Dependencies | No | The paper mentions using the Adam W optimizer (Loshchilov & Hutter, 2017) but does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks. |
| Experiment Setup | Yes | For each task, we utilize the Adam W optimizer (Loshchilov & Hutter, 2017) with a learning rate of 8.0e 5, weight decay of 0.01, and a batch size of 32. Training is conducted on two A6000 GPUs over 9,000 steps, selecting the best checkpoints based on the held-out validation set. |