reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Hinge-Loss Markov Random Fields and Probabilistic Soft Logic

Authors: Stephen H. Bach, Matthias Broecheler, Bert Huang, Lise Getoor

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper brings together and expands work on scalable models for structured data that can be either discrete, continuous, or a mixture of both (Broecheler et al., 2010a; Bach et al., 2012, 2013, 2015b). The eﬀectiveness of HL-MRFs and PSL has been demonstrated on many problems, including information extraction (Liu et al., 2016) and automatic knowledge base construction (Pujara et al., 2013), extracting and evaluating natural-language arguments on the Web (Samadi et al., 2016), high-level computer vision (London et al., 2013), drug discovery (Fakhraei et al., 2014) and predicting drug-drug interactions (Sridhar et al., 2016), natural language semantics (Beltagy et al., 2014; Sridhar et al., 2015; Deng and Wiebe, 2015; Ebrahimi et al., 2016), automobile-traﬃc modeling (Chen et al., 2014), recommender systems (Kouki et al., 2015), information retrieval (Alshukaili et al., 2016), and predicting attributes (Li et al., 2014) and trust (Huang et al., 2013; West et al., 2014) in social networks. ... We evaluate them on core relational learning and structured prediction tasks, such as collective classiﬁcation and link prediction. We show that HL-MRFs oﬀer predictive accuracy comparable to analogous discrete models while scaling much better to large data sets. ... In this section we evaluate the empirical performance of our MAP inference algorithm. ... To demonstrate the ﬂexibility and eﬀectiveness of learning with HL-MRFs, we test them on four diverse tasks: node labeling, link labeling, link prediction, and image completion.
Researcher Affiliation	Collaboration	Stephen H. Bach EMAIL Computer Science Department Stanford University Stanford, CA 94305, USA Matthias Broecheler EMAIL Data Stax Bert Huang EMAIL Computer Science Department Virginia Tech Blacksburg, VA 24061, USA Lise Getoor EMAIL Computer Science Department University of California, Santa Cruz Santa Cruz, CA 95064, USA
Pseudocode	Yes	Algorithm 1 MAP Inference for HL-MRFs Input: HL-MRF P(y\|x), ρ > 0 Initialize y(L,j) as local copies of variables y(C,j) that are in φj, j = 1, . . . , m Initialize y(L,k+m) as local copies of variables y(C,k+m) that are in ck, k = 1, . . . , r Initialize Lagrange multipliers αˆi corresponding to copies y(L,ˆi), ˆi = 1, . . . , m + r while not converged do for j = 1, . . . , m do αj αj + ρ(y(L,j) y(C,j)) y(L,j) y(C,j) 1 ραj if ℓj(y(L,j), x) > 0 then y(L,j) arg miny(L,j) wj ℓj(y(L,j), x) pj + ρ 2 y(L,j) y(C,j) + 1 2 if ℓj(y(L,j), x) < 0 then y(L,j) Projℓj=0(y(C,j) 1 ραj) end if end if end for for k = 1, . . . , r do αk+m αk+m + ρ(y(L,k+m) y(C,k+m)) y(L,k+m) Projck(y(C,k+m) 1 ραk+m) end for for i = 1, . . . , n do yi 1 \|copies(yi)\| P yc copies(yi) yc + αc Clip yi to [0,1] end for
Open Source Code	Yes	An open source implementation, tutorials, and data sets are available at http://psl.linqs.org. ... Code is available at https://github.com/stephenbach/bach-jmlr17-code. ... Code is available at https://github.com/stephenbach/bach-jmlr17-code. ... An open source implementation, tutorials, and data sets are available at http://psl.linqs.org.
Open Datasets	Yes	We classify documents in citation networks using data from the Cora and Citeseer scientiﬁc paper repositories. ... We test this model on the Jester dataset, a repository of ratings from 24,983 users on a set of 100 jokes (Goldberg et al., 2001). ... We test these models using the experimental setup of Poon and Domingos (2011): we reconstruct images from the Olivetti face data set and the Caltech101 face category.
Dataset Splits	Yes	For each of 20 runs, we split the data sets 50/50 into training and testing partitions, and seed half of each set. ... We perform eight-fold cross-validation. ... We sample a random 2,000 users from the set of those who rated all 100 jokes, which we then split into 1,000 train and 1,000 test users. From each train and test matrix, we sample a random 50% to use as the observed features x; the remaining ratings are treated as the variables y. ... Following their experimental setup, we hold out the last ﬁfty images and predict either the left half of the image or the bottom half. ... We train these HL-MRFs using SP with a 5.0 step size on the ﬁrst 200 images of each data set and test on the last ﬁfty.
Hardware Specification	Yes	All experiments are performed on a single machine with a 4-core 3.4 GHz Intel Core i7-3770 processor with 32GB of RAM. ... In our experiments, training each model takes about 45 minutes on a 12-core machine
Software Dependencies	Yes	We implement ADMM in Java and compare with the IPM in MOSEK (version 6) by encoding the entire MPE problem as a linear program or a second-order cone program as appropriate and passing the encoded problem via the Java native interface wrapper.
Experiment Setup	Yes	We weight local features with a parameter of 0.5 and choose parameters in [0, 1] for the relationship potentials representing a mix of more and less inﬂuential relationships. ... When training with SP and MPLE, we use 100 gradient steps and a step size of 1.0 (unless otherwise noted), and we average the iterates as in voted perceptron. For LME, we set C = 0.1. ... For prediction, we performed 2500 rounds of Gibbs sampling, 500 of which were discarded as burn-in. ... For inference with discrete MRFs, we perform 5000 rounds of Gibbs sampling, of which the ﬁrst 500 are burn-in. ... we set the rank of the decomposition to 30 and use 100 iterations of burn in and 100 iterations of sampling. ... We train these HL-MRFs using SP with a 5.0 step size on the ﬁrst 200 images of each data set and test on the last ﬁfty.