reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Robust Learning against Relational Adversaries

Authors: Yizhen Wang, Mohannad Alhanahnah, Xiaozhu Meng, Ke Wang, Mihai Christodorescu, Somesh Jha

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results of both tasks show our learning framework significantly improves the robustness of models against relational adversaries. In the process, it outperforms adversarial training, the most noteworthy defense mechanism, by a wide margin. We now evaluate the effectiveness of N&P against relational attacks for real-world attacks. Our empirical evaluation shows that input normalization can significantly enhance model robustness.
Researcher Affiliation	Collaboration	Yizhen Wang Visa Research EMAIL Mohannad Alhanahnah University of Wisconsin Madison EMAIL Xiaozhu Meng Rice University EMAIL Ke Wang Visa Research EMAIL Mihai Christodorescu Visa Research EMAIL Somesh Jha University of Wisconsin Madison EMAIL
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The normalizer in our paper is now open sourced on https://github.com/ Mohannadcse/Normalizer-authorship.
Open Datasets	Yes	We use the dataset provided by Quiring et al. [2019], which is collected from Google Code Jam, 6https://github.com/EQuiw/code-imitator/tree/master/data/dataset_2017.
Dataset Splits	Yes	We sample 19,000 benign PEs and 19,000 malicious PEs to construct the training (60%), validation (20%), and test (20%) sets.
Hardware Specification	Yes	The standard adversarial training is too computationally expensive for the attack on source code level. We make a number of adaptations that reduce the number of MCTS roll-outs and generate adversarial examples in batch for better parallelism so that the process finishes within a month on a 72-core CPU server.
Software Dependencies	No	The paper mentions software like LIEF and Clang, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	We use the same network architecture as Al-Dujaili et al. [2018], a fully-connected neural net with three hidden layers, each with 300 ReLU nodes, to set up a fair comparison. We train each model to minimize the negative log-likelihood loss for 20 epochs, and pick the version with the lowest validation loss.