reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Joint Label Inference in Networks

Authors: Deepayan Chakrabarti, Stanislav Funiak, Jonathan Chang, Sofus A. Macskassy

JMLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our primary example, and the focus of this paper, is the joint inference of label types such as hometown, current city, and employers for people connected by a social network; by predicting these user proﬁle ﬁelds, the network can provide a better experience to its users. ... We empirically demonstrate their scalability on a large subset of the Facebook social network (Chakrabarti et al., 2014) and a 5 million node sample of the Google+ network (Gong et al., 2012), using publicly available user proﬁles and friendships. On Facebook, Edge Explain signiﬁcantly outperforms label propagation for several label types, with lifts of up to 120% for recall@1 and 60% for recall@3. ... Empirical evidence demonstrating the eﬀectiveness of Edge Explain is presented in Section 8.
Researcher Affiliation	Collaboration	Deepayan Chakrabarti EMAIL IROM, Mc Combs School of Business, University of Texas, Austin. Stanislav Funiak EMAIL Cerebras Systems. Jonathan Chang EMAIL Health Coda. Sofus A. Macskassy EMAIL Branch Metrics. This research was done while the authors were employees at Facebook, Inc.
Pseudocode	Yes	Algorithm 1 HYBRID function HYBRID(initial label-probs, θ1, θ2) for all nodes u do Lu = number of distinct labels in S u v label-probst(v) nearly-certain-friends = \|{v\|u v, max(label-probst(v)) > 2θ1/\|Lu\| t T }\| all-friends = \|{v\|u v}\| if nearly-certain-friends > θ2 all-friends then Update label-probst(u) via VAR else Update label-probst(u) via REL end if end for until convergence or maximum iterations return label-probs(u) end function
Open Source Code	No	The paper does not contain an explicit statement that the authors' code is open-source or provide a link to a code repository for the methodology described.
Open Datasets	Yes	On a large subset of the Facebook social network, collected in a previous study (Chakrabarti et al., 2014)... On a large subset of the Facebook social network (Chakrabarti et al., 2014) and a 5 million node sample of the Google+ network (Gong et al., 2012), using publicly available user proﬁles and friendships. ... The data is available at ftp://ftp.fu-berlin.de/pub/misc/movies/database/.
Dataset Splits	Yes	The set of users is randomly split into ﬁve parts and the accuracy is measured via 5-fold cross-validation, with the known proﬁle information from four folds being used to predict labels for all types for users in the ﬁfth fold. ... Figure 14 shows the precision@1 for diﬀerent fractions of the network being labeled.
Hardware Specification	No	The paper states, 'The entire set of nodes is split among 200 machines,' but does not provide specific details on the CPU, GPU, memory, or any other hardware components of these machines.
Software Dependencies	No	The paper mentions, 'We implemented Edge Explain in Giraph (Ching, 2013),' and 'deviations due to garbage collection stoppages in Java.' However, it does not provide specific version numbers for Giraph or Java, which are necessary for reproducible software dependency information.
Experiment Setup	Yes	We set N = 1, 000 and run 50 repetitions for each simulation setting. ... We implemented two features: (a) for each user u and label type t, the label distribution for each (user, type) pair was clipped to retain only the top 8 entries optimally (Kyrillidis et al., 2013), and (b) the friendship graph is sparsiﬁed so as to retain, for each user u, the top K friends whose ages are closest to that of u. ... The optimal value is K = 100 for recall at 1, and K = 200 for recall at 3. ... Figure 8 shows that the lift in recall at 1 for various values of the parameter α, with respect to α = 0.1. ... Plot (a) shows, for (θ1 = 0.7, θ2 = 0.95), the lift in accuracy of HYBRID against the others.