reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multimodal Distributional Semantics

Authors: E. Bruni, N. K. Tran, M. Baroni

JAIR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report the results of a systematic comparison of the network of semantic relations entertained by a set of concrete nouns in the traditional text-based and novel image-based distributional spaces conﬁrming that image-based features are, indeed, semantically meaningful. Moreover, as expected, they provide somewhat complementary information with respect to text-based features. Having thus found a practical and eﬀective way to extract perceptual information, we must consider next how to combine text- and image-derived features to build a multimodal distributional semantic model. We propose a ﬂexible architecture to integrate text- and image-based distributional information, and we show in a set of empirical tests that our integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.
Researcher Affiliation	Academia	Elia Bruni EMAIL Center for Mind/Brain Sciences, University of Trento, Italy; Nam Khanh Tran EMAIL L3S Research Center, Hannover, Germany; Marco Baroni EMAIL Center for Mind/Brain Sciences, University of Trento, Italy Department of Information Engineering and Computer Science, University of Trento, Italy
Pseudocode	No	The paper describes the methods and processes using descriptive text and mathematical equations, such as in Section 3 "A Framework for Multimodal Distributional Semantics" and Section 4 "Implementation Details", but no explicit pseudocode or algorithm blocks are provided.
Open Source Code	Yes	Both our implementation of the multimodal framework and of the visual feature extraction procedure are publicly available and open source.8 Moreover the visual feature extraction procedure is presented by Bruni, Bordignon, Liska, Uijlings, and Sergienya (2013). 8. See https://github.com/s2m/FUSE/ and https://github.com/vsem/, respectively.
Open Datasets	Yes	We adopt as our source corpus the ESP-Game data set11 that contains 100K images... 11. http://www.cs.cmu.edu/~biglou/resources/; MEN is publicly available and it can be downloaded from: http://clic.cimec.unitn.it/~elia.bruni/MEN.; We collect co-occurrence counts from the concatenation of two corpora, uk Wa C and Wackypedia... they are freely and publicly available,10 and they are widely used in linguistic research. 10. http://wacky.sslmit.unibo.it/.
Dataset Splits	Yes	We use indeed 2,000 MEN pairs (development set) for model tuning and 1,000 pairs for evaluation (test set).
Hardware Specification	No	The paper describes the computational methods and software libraries used (e.g., SVDLIBC, VLFeat, Gensim, CLUTO), but does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions specific software tools and libraries such as Tree Tagger, SVDLIBC, VLFeat toolbox, Gensim, and CLUTO toolkit, but it does not provide specific version numbers for these components. For example, 'For their extraction we use the vl_phow command included in the VLFeat toolbox (Vedaldi & Fulkerson, 2010)' refers to the toolbox but not its version.
Experiment Setup	Yes	We performed two separate parameter optimizations, one speciﬁcally for the semantic relatedness task (using MEN development, see Section 5.2.1) and the other speciﬁcally for the clustering task (using Battig, see Section 5.3.1). We determined the best model by performing an exhaustive search across SVD k (from 24 to 212 in powers of 2), FL and SL with α varying from 0 to 1 (inclusive) in steps of 0.1 and similarly for β. In total, 198 models were explored and the one with the highest performance on the development data was chosen. To tune the parameter k we used the MEN development set (see Section 5.2.1). By varying k between 500 and 5000 in steps of 500, we found the optimal k being 5000. The Latent Dirichlet Allocation (Mix LDA) model is trained on this matrix and tuned on the MEN development set by varying the number of topics Kt. The optimal value we ﬁnd is Kt = 128.