Stabilizing the Kumaraswamy Distribution

Authors: Max Wasserman, Gonzalo Mateos

TMLR 2025 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments in Sections 4.2-4.3 provide evidence toward the benefits of adopting the KS in timely application domains enabled by our stable parameterization without making claims on improvements over state-of-the-art models, which future work may investigate. ... Across the experimental domains, our stable KS is performant and often easier to use than alternative variational distributions supported on bounded intervals.
Researcher Affiliation Academia Max Wasserman EMAIL Department of Computer Science University of Rochester Gonzalo Mateos EMAIL Department of Electrical and Computer Engineering University of Rochester
Pseudocode Yes Algorithm 1 Variational Bandit Encoder Require: {xk}K k=1, {µk}K k=1, η, βKL 1: Variation posterior q KS 2: Replay buffer D 3: for t = 1 . . . T do 4: Encode: (ak, bk) = eϕ(xk) 5: Sample: zk q(zk; ak, bk) 6: TS: a = argmaxk{ zk} 7: Reward: r Bernoulli(µa) 8: D D {(xa, a, r)} 9: Construct ˆLβKL as in (16) 10: ϕ ϕ + η ϕ ˆLβKL 11: end for
Open Source Code No The paper mentions "Reviewed on Open Review: https: // openreview. net/ forum? id= XXXX" which is a review platform, not a code repository. It refers to 'other' authors' code and implementations but does not explicitly state that the code for the methodology described in *this* paper is released or available.
Open Datasets Yes Using the well established VAE framework on MNIST and CIFAR-10 datasets... Figure 6 presents results across three standard citation networks: Cora, Citeseer, and Pubmed.
Dataset Splits Yes Using the well established VAE framework on MNIST and CIFAR-10 datasets...In a typical link prediction setup, the GNN has access to the features X RN d of all N nodes, but only a subset of positive edges in the training Dtr and validation Dval sets.
Hardware Specification Yes We repeat experiments 5 times on an Apple M2 CPU and report the mean and standard deviation across these runs in Figure 4. ...on the largest dataset (Pubmed), the average time (ms) per epoch for VEE-KS, VEE-tanh N , and VEE-Beta was 381 61, 301 26, and 447 86 respectively, on an Apple M2 CPU.
Software Dependencies No The paper mentions the use of 'Py Torch and Tensor Flow' but does not specify version numbers for these or any other software libraries or frameworks. It only refers to them generally as implementations or libraries.
Experiment Setup Yes For both experiments (MNIST and CIFAR-10) we use a learning rate of 0.001, batch size of 500, and optimize with Adam for 200 epochs. ...we use a latent dimension of D = 20, an encoder with two hidden layers with 500 units each, with leaky-Re LU non-linearities, followed by a dropout layer (with parameter 0.9). ...The learning rate is set to η = 10 2. ...All models use an MLP with 3 hidden layers of width 32. ...All models use a 2-layer GNN with Graph Convolutional Network (GCN) layers and a hidden/output nodal dimension of 32. ...We train for 300 epochs, with a learning rate of .01, averaging results over 5 runs with different seeds.