Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1]

Optimization with Access to Auxiliary Information

Authors: El Mahdi Chayti, Sai Praneeth Karimireddy

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Experiments Baselines. We will consider fine-tuning and the naive approach as baselines. Fine-tuning is equivalent to using the gradients of the helper all at the beginning and then only using the gradients of the main objective f. We note that in our experiments, K = 1 corresponds to SGD with momentum; this means we are also comparing with SGDm. 6.1 Toy example We consider a simple problem that consists in optimizing a function f(x) = 1/2 x^2 by enlisting the help of the function h(x) = 1/2(1 + δ)(x − ζ/(1 + δ))^2 for x ∈ R. [...] Figure 1: Effect of the bias ζ (zeta in the figure) on the naive approach (Naive), Aux MOM and Fine Tuning (FT) for K = 10,δ = 1 and η = min(1/2, 1/(δK)). We can see that the naive approach fails to converge for large bias values, whereas Aux MOM converges all the time, no matter the value of the bias. Fine Tuning converges much slower for small values of δ, but beats Aux MOM for δ = 10. [...] 6.2 Leveraging noisy or mislabeled data [...] 6.3 Training with Coresets [...] 6.4 Semi-supervised logistic regression
Researcher Affiliation Academia El Mahdi Chayti EMAIL EPFL Sai Praneeth Karimireddy EMAIL UC Berkeley
Pseudocode Yes Algorithm 1 stochastic optimization of f with access to the auxiliary h Require: x0, η, T, K for t = 1 to T do sample gf h(xt 1, ξt f h) f(xt 1) h(xt 1) update mt f h f(xt 1) h(xt 1) momentum define yt 0 = xt 1 for k = 1 to K do sample gh(yt k 1, ξt,k h ) h(yt k 1) use it and mt to form dt k f(yt k 1) yt k = yt k 1 ηdt k end for update xt
Open Source Code Yes The code for our experiments is available at https://github.com/elmahdichayti/Opt Aux Inf.
Open Datasets Yes Rotated features. We consider a simple feed-forward neural network [...] to classify the MNIST dataset (Le Cun & Cortes, 2010) , which is the main task f. [...] CIFAR10/100 experiments. We performed the same experiments on the CIFAR10 and CIFAR100 datasets [...] Semi-supervised logistic regression. We consider a semi-supervised logistic regression task on the Mushrooms dataset from the libsvmtools repository (Chang & Lin, 2011), which has 8124 samples, each with 112 features.
Dataset Splits Yes We consider a semi-supervised logistic regression task on the Mushrooms dataset from the libsvmtools repository (Chang & Lin, 2011), which has 8124 samples, each with 112 features. We divide this dataset into three equal parts: one for training and, one for testing, and the third one is unlabeled.
Hardware Specification No The paper does not explicitly describe the hardware used, such as specific GPU/CPU models or other hardware specifications. It mentions neural networks but does not specify the computational resources.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). It mentions optimization methods like SGD and Adam but not the software used for implementation with specific versions.
Experiment Setup Yes Effect of the bias ζ. Figure 1 shows that indeed our algorithm Aux MOM does correct for the bias. We note that in this simple example, having a big value of ζ means that the gradients of h point opposite to those of f, and hence, it s better to not use them in a naive way. However, our approach can correct for this automatically and hence does not suffer from increasing values of ζ. In real-world data, it is very difficult to quantify ζ, thus why we can still benefit a little bit (in non-extreme cases) using the naive way. [...] Rotated features. [...] We used a 256 batch size for f and a 64 batch size for h; in our experiments changing the batch size of h to 256 or 512 led to similar results. [...] Figure 4: Test accuracy obtained using different angles as helpers, for K = 10, step size η = 0.01 and momentum parameter a = 0.1.