reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multimodal Learning with Deep Boltzmann Machines

Authors: Nitish Srivastava, Ruslan Salakhutdinov

JMLR 2014 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on bi-modal image-text and audio-video data. The fused representation achieves good classiﬁcation results on the MIR-Flickr data set matching or outperforming other deep models as well as SVM based models that use Multiple Kernel Learning. We further demonstrate that this multimodal model helps classiﬁcation and retrieval even when only unimodal data is available at test time.
Researcher Affiliation	Academia	Nitish Srivastava EMAIL Department of Computer Science University of Toronto 10 Kings College Road, Rm 3302 Toronto, Ontario, M5S 3G4, Canada. Ruslan Salakhutdinov EMAIL Department of Statistics and Computer Science University of Toronto 10 Kings College Road, Rm 3302 Toronto, Ontario, M5S 3G4, Canada.
Pseudocode	Yes	Algorithm 1 Learning Procedure for a Multimodal Deep Boltzmann Machine.
Open Source Code	No	The extracted features are publicly available at http://www.cs.toronto.edu/~nitish/multimodal. (Footnote 5 in Section 6.1). This link provides extracted features, not the source code for the methodology described in the paper.
Open Datasets	Yes	We used the MIR Flickr Data set (Huiskes and Lew, 2008) in our experiments. ... We combined several data sets in this experiment. CUAVE (Patterson et al., 2002): ... AVLetters (Matthews et al., 2002): ... AVLetters 2 (Cox et al., 2008): ... TIMIT (Fisher et al., 1986):
Dataset Splits	Yes	From the 25,000 annotated images we use 10,000 images for training, 5,000 for validation and 10,000 for testing, following Huiskes et al. (2010).
Hardware Specification	No	The paper mentions a "fast GPU implementation" but does not specify any particular GPU model or other hardware components used for the experiments.
Software Dependencies	No	We used publicly available code (Vedaldi and Fulkerson, 2008; Bastan et al., 2010) for extracting these features. This refers to third-party tools used for feature extraction, but does not provide specific version numbers for these tools or for the authors' own implementation dependencies.
Experiment Setup	Yes	The image pathway consists of a Gaussian RBM with 3857 linear visible units and 1024 hidden units. ... The text pathway consists of a Replicated Softmax Model with 2000 visible units and 1024 hidden units... The joint layer contains 2048 hidden units. All hidden units are binary. Each Gaussian visible unit was set to have unit variance (σi = 1) which was kept ﬁxed and not learned. Each layer of weights was pretrained using CD-n where n was gradually increased from 1 to 20. All word count vectors were normalized so that they sum to one. ... we retained each unit with probability p = 0.8. ... we typically used 5 mean-ﬁeld updates.