Generalization Bounds for Adversarial Contrastive Learning
Authors: Xin Zou, Weiwei Liu
JMLR 2023 | Venue PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results validate our theory. ... In this section, we conduct several experiments to support our theory. We emphasize that we are not proposing a method to try to get better robustness on the downstream tasks, we do the experiments to verify our two claims from our theoretical results: (1) As shown in Remark 11, using the blocks may improve the robust performance; (2) As shown in Remark 28, using the norms of the layers of the neural networks as the regularizer may help improve the robust performance. Data sets. We use two data sets (Krizhevsky and Hinton, 2009) in our experiments: (1) the CIFAR-10 data set and (2) the CIFAR-100 data set. |
| Researcher Affiliation | Academia | Xin Zou EMAIL School of Computer Science Wuhan University Wuhan, Hubei, China; Weiwei Liu EMAIL School of Computer Science Wuhan University Wuhan, Hubei, China |
| Pseudocode | Yes | Algorithm 1: The AERM algorithm for adversarial contrastive learning |
| Open Source Code | No | The paper does not provide explicit statements about code release, nor does it include a link to a code repository. The text mainly focuses on theoretical analysis and experimental validation of their theory without offering the implementation details as open-source code. |
| Open Datasets | Yes | We use two data sets (Krizhevsky and Hinton, 2009) in our experiments: (1) the CIFAR-10 data set and (2) the CIFAR-100 data set. CIFAR-10 contains 50000/10000 train/test images with size 32 32, which are categorized into 10 classes. CIFAR-100 contains 50000/10000 train/test images with size 32 32, which are categorized into 100 classes. |
| Dataset Splits | Yes | CIFAR-10 contains 50000/10000 train/test images with size 32 32, which are categorized into 10 classes. CIFAR-100 contains 50000/10000 train/test images with size 32 32, which are categorized into 100 classes. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions "Pytorch scales the images to tensors with entries within the range [0, 1]" and using the "Stochastic Gradient Descent (SGD) optimizer", but it does not specify any version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | Model. We use a neural network with two convolutional layers and one fully connected layer. Following He et al. (2020), we use the Stochastic Gradient Descent (SGD) optimizer with momentum 0.9 but set the weight decay to be 5 10 4 and the learning rate to be 0.001. ... We use a hyper-parameter λ to balance the trade-off of the the contrastive upstream pre-train risk and the Frobenius norm of the parameters of the model. We choose to minimize the following regularized empirical risk: L(f) = be Lsun(f) + λN(f) (14) where N(f) is a regularizer that constrains the Frobenius norm of the parameters of the model f, here we choose N(f) = Pd l=1 |||Wl|||F. ... The results are shown in Table 1. From Table 1, we can see that the F-norm regularizer can improve the adversarial accuracy (the prediction performance of a model on adversarial examples generated by attacker) of the mean classifier, which is in line with our Theorem 24. ... Figure 1 presents the results for clean accuracy and adversarial accuracy, respectively. From Figure 1, we can see that a larger block size will yield better adversarial accuracy. |