Estimating Diffusion Networks: Recovery Conditions, Sample Complexity and Soft-thresholding Algorithm

Authors: Manuel Gomez-Rodriguez, Le Song, Hadi Daneshm, Bernhard Schölkopf

JMLR 2016 | Venue PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using this algorithm, we perform various experiments illustrating the consequences of our theoretical results and demonstrating that it typically outperforms other state-of-the-art algorithms.
Researcher Affiliation Academia Manuel Gomez-Rodriguez EMAIL MPI for Software Systems Paul-Ehrlich-Strasse, 67663 Kaiserslautern Germany Le Song EMAIL College of Computing Georgia Institute of Technology Atlanta, GA 30332, USA Hadi Daneshmand EMAIL Computer Science Department Universit atstrasse 6, 8092 Z urich Switzerland Bernhard Sch ölkopf EMAIL MPI for Intelligent Systems Spemannstrasse 38, 72076 T ubingen Germany
Pseudocode Yes Algorithm 1 ℓ1-regularized network inference Require: Cn, λn, K, L for all i V do k = 0 while k < K do αk+1 i = αk i L αiℓn(αk i ) λn L + k = k + 1 end while ˆαi = αK 1 i end for return {ˆαi}i V
Open Source Code No The paper does not explicitly provide any links to source code repositories, nor does it state that the code for this work is available or released.
Open Datasets No We focus on synthetic networks that mimic the structure of real-world diffusion networks in particular, social networks. We consider two models of directed real-world social networks: the Forest Fire model (Barab asi and Albert, 1999) and the Kronecker Graph model (Leskovec et al., 2010), and use simple pairwise transmission models such as exponential, power-law or Rayleigh. We use networks with 128 nodes and, for each edge, we draw its associated transmission rate from a uniform distribution U(0.5, 1.5). In general, we proceed as follows: we generate a network G and transmission rates A , simulate a set of cascades and, for each cascade, record the node infection times.
Dataset Splits No The paper describes generating synthetic data and running experiments on it, but does not specify conventional training, validation, or test splits. It mentions evaluating accuracy using "100 independent cascade sets", which implies multiple runs for testing, but not explicit dataset splits.
Hardware Specification No The paper does not explicitly describe the hardware (e.g., specific CPU or GPU models, memory) used to run its experiments.
Software Dependencies No The paper does not provide specific details (e.g., names with version numbers) about ancillary software dependencies used for the experiments.
Experiment Setup Yes Experimental Setup We focus on synthetic networks that mimic the structure of real-world diffusion networks in particular, social networks. We consider two models of directed real-world social networks: the Forest Fire model (Barab asi and Albert, 1999) and the Kronecker Graph model (Leskovec et al., 2010), and use simple pairwise transmission models such as exponential, power-law or Rayleigh. We use networks with 128 nodes and, for each edge, we draw its associated transmission rate from a uniform distribution U(0.5, 1.5). In general, we proceed as follows: we generate a network G and transmission rates A , simulate a set of cascades and, for each cascade, record the node infection times. Then, given the infection times, we infer a network ˆG. Finally, when we illustrate the consequences of Th. 2, we evaluate the accuracy of the inferred neighborhood of a node ˆ N (i) using probability of success P( ˆE = E ), estimated by running our method of 100 independent cascade sets. When we compare our algorithm to Net Rate and First-Edge, we use the F1 score, which is defined as 2PR/(P + R), where precision (P) is the fraction of edges in the inferred network ˆG present in the true network G , and recall (R) is the fraction of edges of the true network G present in the inferred network ˆG. Parameters (n, p, d) According to Th. 2, the number of cascades that are necessary to successfully infer the incoming edges of a node will increase polynomially to the node s neighborhood size di and logarithmically to the super-neighborhood size pi. Here, we first infer the incoming links of nodes on the same type of canonical networks as depicted in Fig. 2. We choose nodes the same in-degree but different super-neighboorhod set sizes pi and experiment with different scalings β of the number of cascades n = 10βd log p. We set the regularization parameter λn as a constant factor of p log(p)/n as suggested by Theorem 2 and, for each node, we used cascades which contained at least one node in the super-neighborhood of the node under study. We used an exponential transmission model and time window T = 10. As predicted by Theorem 2, very different p values lead to curves that line up with each other quite well. Next, we infer the incoming links of nodes of a larger hierarchical Kronecker network. Again, we choose nodes with the same in-degree (di = 3) but different super-neighboorhod set sizes pi under different scalings β of the number of cascades n = 10βd log p. We used an exponential transmission model and T = 5. Fig. 3(a) summarizes the results, where, for each node, we used cascades which contained at least one node in the super-neighborhood of the node under study. Similarly as in the case of the canonical networks, very different p values lead to curves that line up with each other quite well.