Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

An Efficient Adversarial Attack for Tree Ensembles

Authors: Chong Zhang, Huan Zhang, Cho-Jui Hsieh

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on several large GBDT and RF models with up to hundreds of trees demonstrate that our method can be thousands of times faster than the previous mixed-integer linear programming (MILP) based approach, while also providing smaller (better) adversarial examples than decision-based black-box attacks on general β„“p (p = 1, 2, ∞) norm perturbations. Our code is available at https://github.com/ chong-z/tree-ensemble-attack.
Researcher Affiliation Academia Chong Zhang Huan Zhang Cho-Jui Hsieh Department of Computer Science, UCLA EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Our proposed LT-Attack for constructing adversarial examples.
Open Source Code Yes Our code is available at https://github.com/ chong-z/tree-ensemble-attack.
Open Datasets Yes We evaluate the proposed algorithm on 9 public datasets (Smith et al., 1988; Lecun et al., 1998; Chang, Lin, 2011; Wang et al., 2012; Baldi et al., 2014; Xiao et al., 2017; Dua, Graff, 2017) with both the standard (natural) GBDT and RF models, and on an additional 10th dataset (Bosch, 2016) with the robustly trained GBDT.
Dataset Splits No The paper refers to 'training data size' and 'test examples' but does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper mentions '20 threads per task' and discusses the runtime of various methods, but it does not provide specific hardware details such as CPU/GPU models or memory specifications used for running the experiments.
Software Dependencies No The paper mentions the use of 'XGBoost framework' and 'Gurobi Solver' but does not specify their version numbers, which is required for reproducibility.
Experiment Setup Yes For the initial point, we draw 20 random initial adversarial examples from a Gaussian distribution, and optimize with a fine-grained binary search before feeding to the proposed algorithm. We return the best adversarial example found among them (see Appendix D.1 for details). ... To escape converged local minimums we change each coordinate to a nearby value from Gaussian distribution with 0.1 probability, and continue the iteration if a better adversarial example was found within 300 trials.