Joint Label Inference in Networks

Authors: Deepayan Chakrabarti, Stanislav Funiak, Jonathan Chang, Sofus A. Macskassy

JMLR 2017 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our primary example, and the focus of this paper, is the joint inference of label types such as hometown, current city, and employers for people connected by a social network; by predicting these user profile fields, the network can provide a better experience to its users. ... We empirically demonstrate their scalability on a large subset of the Facebook social network (Chakrabarti et al., 2014) and a 5 million node sample of the Google+ network (Gong et al., 2012), using publicly available user profiles and friendships. On Facebook, Edge Explain significantly outperforms label propagation for several label types, with lifts of up to 120% for recall@1 and 60% for recall@3. ... Empirical evidence demonstrating the effectiveness of Edge Explain is presented in Section 8.
Researcher Affiliation Collaboration Deepayan Chakrabarti EMAIL IROM, Mc Combs School of Business, University of Texas, Austin. Stanislav Funiak EMAIL Cerebras Systems. Jonathan Chang EMAIL Health Coda. Sofus A. Macskassy EMAIL Branch Metrics. This research was done while the authors were employees at Facebook, Inc.
Pseudocode Yes Algorithm 1 HYBRID function HYBRID(initial label-probs, θ1, θ2) for all nodes u do Lu = number of distinct labels in S u v label-probst(v) nearly-certain-friends = |{v|u v, max(label-probst(v)) > 2θ1/|Lu| t T }| all-friends = |{v|u v}| if nearly-certain-friends > θ2 all-friends then Update label-probst(u) via VAR else Update label-probst(u) via REL end if end for until convergence or maximum iterations return label-probs(u) end function
Open Source Code No The paper does not contain an explicit statement that the authors' code is open-source or provide a link to a code repository for the methodology described.
Open Datasets Yes On a large subset of the Facebook social network, collected in a previous study (Chakrabarti et al., 2014)... On a large subset of the Facebook social network (Chakrabarti et al., 2014) and a 5 million node sample of the Google+ network (Gong et al., 2012), using publicly available user profiles and friendships. ... The data is available at ftp://ftp.fu-berlin.de/pub/misc/movies/database/.
Dataset Splits Yes The set of users is randomly split into five parts and the accuracy is measured via 5-fold cross-validation, with the known profile information from four folds being used to predict labels for all types for users in the fifth fold. ... Figure 14 shows the precision@1 for different fractions of the network being labeled.
Hardware Specification No The paper states, 'The entire set of nodes is split among 200 machines,' but does not provide specific details on the CPU, GPU, memory, or any other hardware components of these machines.
Software Dependencies No The paper mentions, 'We implemented Edge Explain in Giraph (Ching, 2013),' and 'deviations due to garbage collection stoppages in Java.' However, it does not provide specific version numbers for Giraph or Java, which are necessary for reproducible software dependency information.
Experiment Setup Yes We set N = 1, 000 and run 50 repetitions for each simulation setting. ... We implemented two features: (a) for each user u and label type t, the label distribution for each (user, type) pair was clipped to retain only the top 8 entries optimally (Kyrillidis et al., 2013), and (b) the friendship graph is sparsified so as to retain, for each user u, the top K friends whose ages are closest to that of u. ... The optimal value is K = 100 for recall at 1, and K = 200 for recall at 3. ... Figure 8 shows that the lift in recall at 1 for various values of the parameter α, with respect to α = 0.1. ... Plot (a) shows, for (θ1 = 0.7, θ2 = 0.95), the lift in accuracy of HYBRID against the others.