reproducibilityindex.ai

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Estimating Diffusion Networks: Recovery Conditions, Sample Complexity and Soft-thresholding Algorithm

Authors: Manuel Gomez-Rodriguez, Le Song, Hadi Daneshm, Bernhard Schölkopf

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using this algorithm, we perform various experiments illustrating the consequences of our theoretical results and demonstrating that it typically outperforms other state-of-the-art algorithms.
Researcher Affiliation	Academia	Manuel Gomez-Rodriguez EMAIL MPI for Software Systems Paul-Ehrlich-Strasse, 67663 Kaiserslautern Germany Le Song EMAIL College of Computing Georgia Institute of Technology Atlanta, GA 30332, USA Hadi Daneshmand EMAIL Computer Science Department Universit atstrasse 6, 8092 Z urich Switzerland Bernhard Sch ölkopf EMAIL MPI for Intelligent Systems Spemannstrasse 38, 72076 T ubingen Germany
Pseudocode	Yes	Algorithm 1 ℓ1-regularized network inference Require: Cn, λn, K, L for all i V do k = 0 while k < K do αk+1 i = αk i L αiℓn(αk i ) λn L + k = k + 1 end while ˆαi = αK 1 i end for return {ˆαi}i V
Open Source Code	No	The paper does not explicitly provide any links to source code repositories, nor does it state that the code for this work is available or released.
Open Datasets	No	We focus on synthetic networks that mimic the structure of real-world diﬀusion networks in particular, social networks. We consider two models of directed real-world social networks: the Forest Fire model (Barab asi and Albert, 1999) and the Kronecker Graph model (Leskovec et al., 2010), and use simple pairwise transmission models such as exponential, power-law or Rayleigh. We use networks with 128 nodes and, for each edge, we draw its associated transmission rate from a uniform distribution U(0.5, 1.5). In general, we proceed as follows: we generate a network G and transmission rates A , simulate a set of cascades and, for each cascade, record the node infection times.
Dataset Splits	No	The paper describes generating synthetic data and running experiments on it, but does not specify conventional training, validation, or test splits. It mentions evaluating accuracy using "100 independent cascade sets", which implies multiple runs for testing, but not explicit dataset splits.
Hardware Specification	No	The paper does not explicitly describe the hardware (e.g., specific CPU or GPU models, memory) used to run its experiments.
Software Dependencies	No	The paper does not provide specific details (e.g., names with version numbers) about ancillary software dependencies used for the experiments.
Experiment Setup	Yes	Experimental Setup We focus on synthetic networks that mimic the structure of real-world diﬀusion networks in particular, social networks. We consider two models of directed real-world social networks: the Forest Fire model (Barab asi and Albert, 1999) and the Kronecker Graph model (Leskovec et al., 2010), and use simple pairwise transmission models such as exponential, power-law or Rayleigh. We use networks with 128 nodes and, for each edge, we draw its associated transmission rate from a uniform distribution U(0.5, 1.5). In general, we proceed as follows: we generate a network G and transmission rates A , simulate a set of cascades and, for each cascade, record the node infection times. Then, given the infection times, we infer a network ˆG. Finally, when we illustrate the consequences of Th. 2, we evaluate the accuracy of the inferred neighborhood of a node ˆ N (i) using probability of success P( ˆE = E ), estimated by running our method of 100 independent cascade sets. When we compare our algorithm to Net Rate and First-Edge, we use the F1 score, which is deﬁned as 2PR/(P + R), where precision (P) is the fraction of edges in the inferred network ˆG present in the true network G , and recall (R) is the fraction of edges of the true network G present in the inferred network ˆG. Parameters (n, p, d) According to Th. 2, the number of cascades that are necessary to successfully infer the incoming edges of a node will increase polynomially to the node s neighborhood size di and logarithmically to the super-neighborhood size pi. Here, we ﬁrst infer the incoming links of nodes on the same type of canonical networks as depicted in Fig. 2. We choose nodes the same in-degree but diﬀerent super-neighboorhod set sizes pi and experiment with diﬀerent scalings β of the number of cascades n = 10βd log p. We set the regularization parameter λn as a constant factor of p log(p)/n as suggested by Theorem 2 and, for each node, we used cascades which contained at least one node in the super-neighborhood of the node under study. We used an exponential transmission model and time window T = 10. As predicted by Theorem 2, very diﬀerent p values lead to curves that line up with each other quite well. Next, we infer the incoming links of nodes of a larger hierarchical Kronecker network. Again, we choose nodes with the same in-degree (di = 3) but diﬀerent super-neighboorhod set sizes pi under diﬀerent scalings β of the number of cascades n = 10βd log p. We used an exponential transmission model and T = 5. Fig. 3(a) summarizes the results, where, for each node, we used cascades which contained at least one node in the super-neighborhood of the node under study. Similarly as in the case of the canonical networks, very diﬀerent p values lead to curves that line up with each other quite well.