Learning Optimal Representations with the Decodable Information Bottleneck

Authors: Yann Dubois, Douwe Kiela, David J. Schwab, Ramakrishna Vedantam

NeurIPS 2020 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our framework in practical settings, focusing on: (i) the relation between V-sufficiency and Alice s best achievable performance; (ii) the relation between V-minimality and generalization; (iii) the consequence of a mismatch between VAlice and the functional family VBob w.r.t. which Z is sufficient or minimal especially in IB s setting VBob = U; (iv) the use of our framework to predict generalization of trained networks. Many of our experiments involve sweeping over the complexity of families V V V+, we do this by varying widths of MLPs with V U in the infinite width limit [40, 41].
Researcher Affiliation Collaboration Yann Dubois Facebook AI Research EMAIL Douwe Kiela Facebook AI Research EMAIL David J. Schwab Facebook AI Research CUNY Graduate Center EMAIL Ramakrishna Vedantam Facebook AI Research EMAIL
Pseudocode Yes Figure 2: Practical DIB (a) Pseudo-code for ˆLDIB(D)
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes Right two plots: 2D representations encoded by an multi-layer perceptron (MLP) for odd-even classification of 200 MNIST [22] examples.
Dataset Splits No The paper discusses 'train and test performance' and 'train-test gap' but does not explicitly mention validation data splits or their proportions.
Hardware Specification No The paper describes model architectures like 'Res Net18 encoder' and '3-MLP encoder', but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for experiments.
Software Dependencies No The paper mentions 'Pytorch' as a reference but does not specify version numbers for PyTorch or any other software dependencies used in the experiments.
Experiment Setup Yes We use a 3-MLP encoder with around 21M parameters and a 1024 dimensional Z. Since we want to investigate the generalization of ERMs resulting from Bob s criterion, we do not use (possibly implicit) regularizers such as large learning rate [44]. For more experimental details see Appx. D.1.