Skip to content

Commit 4d9671c

Browse files
Remove greedy from CELF docs and re-write intro to be more informative.
1 parent d6b878d commit 4d9671c

File tree

1 file changed

+19
-6
lines changed
  • doc/modules/ROOT/pages/algorithms/influence-maximization

1 file changed

+19
-6
lines changed

doc/modules/ROOT/pages/algorithms/influence-maximization/celf.adoc

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,26 @@ include::partial$/operations-reference/beta-note.adoc[]
1212

1313
[[alpha-algorithms-celf-intro]]
1414
== Introduction
15-
The CELF algorithm for influence maximization aims to find `k` nodes that maximize the expected spread of influence in the network.
16-
It simulates the influence spread using the Independent Cascade model, which calculates the expected spread by taking the average spread over the `mc` Monte-Carlo simulations.
17-
In the propagation process, a node is influenced in case that a uniform random draw is less than the probability `p`.
15+
The influence maximization problem asks for a set of `k` nodes that maximize the expected spread of influence in the network.
16+
The set of these initial `k` is called the `seed set`.
17+
18+
The Neo4j GDS Library supports approximate computation of under the Independent Cascade propagation model.
19+
In this propagation mode, nodes in the seed set become influenced and the process works as follows.
20+
An influenced node influences each of its neighbors with probability `p`.
21+
The spread is then the number of nodes that become influenced.
22+
23+
The Neo4j GDS Library supports the CELF algorithm, introduced in 2007 by Leskovec et al. in https://www.cs.cmu.edu/~jure/pubs/detect-kdd07.pdf[Cost-effective Outbreak Detection in Networks] to compute a seed set.
24+
25+
The CELF algorithm is based on the https://www.cs.cornell.edu/home/kleinber/kdd03-inf.pdf[Greedy] algorithm for hte problem.
26+
It works iteratively in `k` steps to create the returned seed set `S`,
27+
where at each step the node yielding the maximum expected spread gain is added to `S`.
28+
29+
The expected spread gain of a node `u` not in `S` is estimated by running `mc` monte carlo simulations of the propagation process and counting for each the number of nodes that would become influenced if `u` were to be added in `S`.
30+
31+
The CELF algorithm extends on Greedy by introducing a _lazy forwarding_ mechanism, which
32+
prunes a lot of nodes from being examined, thereby massively reducing the number of conducted simulations.
33+
This makes CELF massively faster than Greedy on large networks.
1834

19-
Leskovec et al. 2007 introduced the CELF algorithm in their study https://www.cs.cmu.edu/~jure/pubs/detect-kdd07.pdf[Cost-effective Outbreak Detection in Networks] to deal with the NP-hard problem of influence maximization.
20-
The CELF algorithm is based on a "lazy-forward" optimization.
21-
Τhe CELF algorithm dramatically improves the efficiency of the xref:algorithms/influence-maximization/greedy.adoc[Greedy] algorithm and should be preferred for large networks.
2235

2336
[[alpha-algorithms-celf-syntax]]
2437
== Syntax

0 commit comments

Comments
 (0)