Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Stealix: Model Stealing via Prompt Evolution

Authors: Zhixiong Zhuang, Hui-Po Wang, Maria-Irina Nicolae, Mario Fritz

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that Stealix significantly outperforms other methods, even those with access to class names or fine-grained prompts, while operating under the same query budget.
Researcher Affiliation Collaboration 1Saarland University, Saarbrücken, Germany 2Bosch Center for Artificial Intelligence, Renningen, Germany 3CISPA Helmholtz Center for Information Security, Saarbrücken, Germany.
Pseudocode Yes Algorithm 1 Stealix Algorithm 2 Prompt Refinement Algorithm 3 Prompt Reproduction
Open Source Code No The project page is at https://zhixiongzh.github.io/stealix/. This is a high-level project overview page, not a direct link to a code repository.
Open Datasets Yes Dataset. We train the victim model on four datasets: Euro SAT (Helber et al., 2019), PASCAL VOC (Everingham et al., 2010), Domain Net (Peng et al., 2019), and CIFAR10 (Alex, 2009).
Dataset Splits Yes For CIFAR-10, we utilize the standard training and test splits provided by Py Torch, which consist of 50,000 training images and 10,000 test images at a resolution of 32 32 pixels. In the case of PASCAL, we follow the preprocess from DA-Fusion (Trabucco et al., 2024) to assign classification labels based on the largest object present in each image, resulting in 1,464 training images and 1,449 validation images with an image size of 256 256 pixels. The Euro SAT dataset is split into training and validation sets using an 80/20 ratio while maintaining class distribution through stratified sampling, yielding 21,600 training images and 5,400 validation images at a resolution of 64 64 pixels. For Domain Net, we select the first 10 classes in alphabetical order across six diverse domains: clipart, infograph, paintings, quickdraw, real images, and sketches. We apply the same 80/20 stratified split as used for Euro SAT, resulting in 11,449 training images and 2,863 validation images, each resized to 64 64 pixels.
Hardware Specification Yes The experiments are run on a NVIDIA V100 GPU and an AMD EPYC 7543 32-Core CPU.
Software Dependencies No We use Open CLIP-Vi T/H as the vision-language model (Cherti et al., 2023) for prompt refinement... We employ Stable Diffusion-v2 (Rombach et al., 2022) as the generative model... While 'Stable Diffusion-v2' is a specific version, only one such software dependency is explicitly mentioned with a version number. The prompt requires 'multiple key software components with their versions'.
Experiment Setup Yes Victim model. All models use Res Net-34... Victim models are trained with SGD, Nesterov with momentum 0.9, a 0.01 learning rate, 5 10 4 weight decay, and cosine annealing for 50 epochs. Stealix. We use Open CLIP-Vi T/H as the vision-language model (Cherti et al., 2023) for prompt refinement, with a learning rate of 0.1 and 500 optimization steps using the Adam W optimizer. We employ Stable Diffusion-v2 (Rombach et al., 2022) as the generative model, with a guidance scale of 9 and 25 inference steps. PC evaluation uses M = 10 images. ... we set the population size to N = 10, with Np = 5 parents selected via tournament selection with a tournament size of k = 5, and retain one elite per generation. The mutation probability is set to pm = 0.6. Following prior work (Truong et al., 2021), we use Res Net-18 as the attacker model. To focus on the impact of query data quality and ensure a fair comparison across methods, we train the attacker model using the same hyperparameters as the victim model without tuning: 50 epochs with SGD.