reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Highly Compressed Tokenizer Can Generate Without Training

Authors: Lukas Lao Beyer, Tianhong Li, Xinlei Chen, Sertac Karaman, Kaiming He

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through a series of experiments, we demonstrate that simple latent space manipulations of tokens can result in image editing capabilities typically associated with generative models. For quantitative evaluation of editing and generation quality, we will consider a class-conditional generation pipeline based on a small seed image dataset subsampled from the Image Net training data, along with a set of CLIP text prompts used to guide generation towards target classes.
Researcher Affiliation	Collaboration	1MIT LIDS 2MIT CSAIL 3Meta FAIR. Correspondence to: Lukas Lao Beyer <EMAIL>.
Pseudocode	Yes	Algorithm A1 Test-Time Optimization For CLIP-Guided Latent Editing. Input: img the seed image, and prompt a text prompt Output: recons the optimized image. Algorithm A2 Test-Time Optimization with Optional Tweaks. Input: img the seed image, ℓan objective function taking an image Output: recons the optimized image
Open Source Code	Yes	Code is available at https://github.com/ lukaslaobeyer/token-opt.
Open Datasets	Yes	For quantitative evaluation of editing and generation quality, we will consider a class-conditional generation pipeline based on a small seed image dataset subsampled from the Image Net training data, along with a set of CLIP text prompts used to guide generation towards target classes. A fixed number of Image Net ILSVRC2012 (Deng et al., 2009) training set images are randomly selected.
Dataset Splits	Yes	A fixed number of Image Net ILSVRC2012 (Deng et al., 2009) training set images are randomly selected. For each Image Net class, an equal number of images is sampled at random without replacement. The 1D tokens for images from this small seed image dataset are used to initialize the test-time token optimization. We generate 50k prompts, distributed according to the Image Net validation set class statistics.
Hardware Specification	Yes	Running 300 iterations of the text-guided image editing optimization (with CLIP loss smoothed over 8 random crops) in half precision using the VQ-LL-32 tokenizer takes 7 seconds per image on an NVIDIA A100.
Software Dependencies	No	The paper mentions 'Py Torch implementation' but does not specify any version numbers for PyTorch or other key software dependencies.
Experiment Setup	Yes	In practice, we use the Adam optimizer with a learning rate of 0.1, β1 = 0.9 and β2 = 0.999. We use a cosine schedule to ramp the noise from σ2 1 = 0.3 to σ2 200 = 0. Token regularization is highlighted in green. We obtain best results with λ = 0.02. Token EMA (not shown) uses a decay factor of 0.98. In our experiments, we set σinit = 0.3 as chosen for best performance from a sweep including {0.05, 0.3, 1.0}.