reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Small Transformers Compute Universal Metric Embeddings

Authors: Anastasis Kratsios, Valentin Debarnot, Ivan Dokmanić

JMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we complement the theoretical embedding guarantees from Section 4 by preliminary computer experiments on synthetic data. We show that the proposed feature maps can indeed be trained in a standard deep learning framework, that the theoretical advantages of PT mixture-Wasserstein embeddings over Euclidean and hyperbolic carry over to practice, and that the PT-based feature maps generalize beyond Xn.
Researcher Affiliation	Academia	Anastasis Kratsios EMAIL Mc Master University Department of Mathematics 1280 Main Street West, Hamilton, Ontario, L8S 4K1, Canada Valentin Debarnot EMAIL Universit at Basel Department of Computer Science Basel, 4051, Switzerland Ivan Dokmani c EMAIL Universit at Basel Department of Computer Science Basel, 4051, Switzerland
Pseudocode	Yes	Algorithm 1 Initialize Bias Require: Set of n-vectors X def.= {x(1), . . . , x(n)} in RK, b0 def.= 0 Initialize ﬁrst shift for n = 1, . . . , N do xn 1 def.= x(n) 1 + b1 Dummy vectors end for for k = 1, . . . , K do Iteratively build bias components bk def.= maxn N Re LU(x(n) k 1 x(n) k 1) for n N do xn k def.= x(n) k + bk Dummy vectors end for end for return b def.= (b1, . . . , b K) Return Bias)
Open Source Code	Yes	The Python codes used to produce the results of this section are available at https://github.com/swing-research/Universal-Embeddings.
Open Datasets	No	The paper uses synthetic data. For example: "We consider a regular binary tree X = (V, E) (Figure 7a) of depth six with a total of \|V \| = 127 vertices.", and "We randomly sample data points {xi}n i=1 from the uniform probability measure on SN." No concrete access information for a publicly available dataset is provided.
Dataset Splits	Yes	We partition the vertices V into training and testing sets, Vtrain Vtest = V , with \|Vtrain\| = 111 and \|Vtest\| = 16. The test vertices (colored white in Figure 7a) are used to evaluate the quality of out-of-sample representations (that is to say, the generalization) computed by the different representation maps.
Hardware Specification	No	The paper mentions training on "pytorch" and using an "Adam optimizer" but does not specify any particular hardware (GPU/CPU models, memory, etc.).
Software Dependencies	No	All networks are trained by the Adam optimizer in pytorch, with weight decay parameter 10 6, initial learning rate 10 4 and ﬁnal learning rate 10 6. The paper mentions PyTorch and Adam optimizer but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	All networks are trained by the Adam optimizer in pytorch, with weight decay parameter 10 6, initial learning rate 10 4 and ﬁnal learning rate 10 6. In practice we set α = 1 as it does not have a strong inﬂuence on empirical performance. We use K = 5 mixture components and d = 15 for the dimension of the hyperbolic space to ensure a fair comparison with the probabilistic transformer s effective dimension. We train a PT for 160 iterations with the Adam optimizer (Kingma and Ba, 2015). In each iteration, we use random batch of 32 points among the 10,000 ﬁxed training points chosen from a uniform measure on the sphere.