You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/modules/ROOT/pages/machine-learning/node-embeddings/hashgnn.adoc
+83-20Lines changed: 83 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,6 +29,7 @@ Moreover, the heterogeneous generalization also gives comparable results when co
29
29
30
30
The execution does not require GPUs as GNNs typically use, and parallelizes well across many CPU cores.
31
31
32
+
32
33
=== The algorithm
33
34
34
35
To clarify how HashGNN works, we will walk through a virtual example <<algorithms-embeddings-hashgnn-virtual-example, below>> of a three node graph for the reader who is curious about the details of the feature selection and prefers to learn from examples.
@@ -53,24 +54,46 @@ The number `K` is called `embeddingDensity` in the configuration of the algorith
53
54
54
55
The algorithm ends with another optional step that maps the binary embeddings to dense vectors.
55
56
57
+
56
58
=== Features
57
59
58
60
The original HashGNN algorithm assumes that nodes have binary features as input, and produces binary embedding vectors as output (unless output densification is opted for).
59
-
Since this is not always the case for real-world graphs, our algorithm also comes with an option to binarize node properties.
61
+
Since this is not always the case for real-world graphs, our algorithm also comes with options to binarize node properties, or generate binary features from scratch.
62
+
63
+
64
+
==== Using binary node properties as features
65
+
66
+
If your node properties have only 0 or 1 values (or arrays of such values), you can use them directly as input to the HashGNN algorithm.
67
+
To do that, you provide them as `featureProperties` in the configuration.
68
+
69
+
70
+
==== Feature generation
71
+
72
+
To use the feature generation, specify a map including `dimension` and `densityLevel` for the `generateFeatures` configuration keyword.
73
+
This will generate `dimension` number of features, where nodes have approximately `densityLevel` features switched on.
74
+
The active features for each node are selected uniformly at random with replacement.
75
+
Although the active features are random, the feature vector for a node acts as an approximately unique signature for that node.
76
+
This is akin to onehot encoding of the node IDs, but approximate in that it has a much lower dimension than the node count of the graph.
77
+
Please note that while using feature generation, it is not supported to supply any `featureProperties` which otherwise is mandatory.
78
+
79
+
80
+
==== Feature binarization
60
81
61
-
This is done using a type of hyperplane rounding and is configured via a map parameter `binarizeFeatures` containing `densityLevel` and `dimension`.
62
-
The hyperplane rounding uses hyperplanes defined by vectors that are potentially sparse.
82
+
Feature binarization uses hyperplane rounding and is configured via `featureProperties` and a map parameter `binarizeFeatures` containing `threshold` and `dimension`.
83
+
The hyperplane rounding uses hyperplanes defined by vectors filled with Gaussian random values.
63
84
The `dimension` parameter determines the number of generated binary features that the input features are transformed into.
64
-
Each input feature is given `densityLevel` binary features with weight `1.0` and the same number of binary features with weight `-1.0`.
65
-
The remaining features have weight `0.0`.
66
-
For each node and each binary feature, we take the sum over the node's input feature values multiplied by the corresponding binary feature weight.
67
-
Each feature which has positive total weight is added to the transformed features of the node.
85
+
For each hyperplane (one for each `dimension`) and node we compute the dot product of the node's input feature vector and the normal vector of the hyperplane.
86
+
If this dot product is larger than the given `threshold`, the node gets the feature corresponding to that hyperplane.
68
87
69
-
If the graph already has binary features, the algorithm can also use these directly if `binarizeFeatures` is not specified.
70
-
This is usually the best option if the graph has only binary features and a sufficient number of them.
88
+
Although hyperplane rounding can be applied to a binary input, it is often best to use the already binary input directly.
89
+
However, sometimes using binarization with a different `dimension` than the number of input features can be useful to either act as dimensionality reduction or introduce redundancy that can be leveraged by HashGNN.
90
+
91
+
[NOTE]
92
+
====
93
+
The hyperplane rounding may not work well if the input feature are of different magnitudes since those of larger ones will influence of the generated binary features more.
94
+
If this is not the intended behavior for your application we recommend normalizing your node properties (by feature dimension) prior to running HashGNN using xref:alpha-algorithms/scale-properties.adoc[Scale properties] or similar method.
95
+
====
71
96
72
-
Even if the graph has binary features, one can apply the hyperplane rounding pre-processing.
73
-
Using a higher dimension than the number of input feature introduces redundancy which allows HashGNN to produce a richer representation.
74
97
75
98
=== Neighbor influence
76
99
@@ -80,11 +103,12 @@ Increasing the value leads to neighbors being selected more often.
80
103
The probability of selecting a feature from the neighbors as a function of `neighborInfluence` has a hockey-stick-like shape, somewhat similar to the shape of `y=log(x)` or `y=C - 1/x`.
81
104
This implies that the probability is more sensitive for low values of `neighborInfluence`.
82
105
106
+
83
107
=== Heterogeneity support
84
108
85
109
The GDS implementation of HashGNN provides a new generalization to heterogeneous graphs in that it can distinguish between different relationship types.
86
110
To enable the heterogeneous support set `heterogeneous` to true.
87
-
The generalization works as the original HashGNN algorithm, but whenever a hash function is applied to a feature of a neighbor node, the algorithm uses a hash function that depends not only on the iteration and on a number `k<embeddingDensity`, but also on the type of the relationship connecting to the neighbor.
111
+
The generalization works as the original HashGNN algorithm, but whenever a hash function is applied to a feature of a neighbor node, the algorithm uses a hash function that depends not only on the iteration and on a number `k < embeddingDensity`, but also on the type of the relationship connecting to the neighbor.
88
112
Consider an example where HashGNN is run with one iteration, and we have `(a)-[:R]->(x), (b)-[:R]->(x)` and `(c)-[:S]->(x)`.
89
113
Assume that a feature `f` of `(x)` is selected for `(a)` and the hash value is very small.
90
114
This will make it very likely that the feature is also selected for `(b)`.
@@ -94,6 +118,7 @@ We can conclude that nodes with similar neighborhoods (including node properties
94
118
An advantage of running heterogeneous HashGNN to running a homogenous embedding such as FastRP is that it is not necessary to manually select multiple projections or creating meta-path graphs before running FastRP on these multiple graphs.
95
119
With the heterogeneous algorithm, the full heterogeneous graph can be used in a single execution.
96
120
121
+
97
122
=== Node property schema for heterogeneous graphs
98
123
99
124
Heterogenous graphs typically have different node properties for different node labels.
@@ -102,6 +127,7 @@ Use therefore a default value of `0` for in each graph projection.
102
127
This works both in the binary input case and when binarization is applied, because having a binary feature with value `0` behaves as if not having the feature.
103
128
The `0` values are represented in a sparse format, so the memory overhead of storing `0` values for many nodes has a low overhead.
104
129
130
+
105
131
=== Orientation
106
132
107
133
Choosing the right orientation when creating the graph may have a large impact.
@@ -111,6 +137,7 @@ Using the analogy with GNN's, using a different relationship type for the revers
111
137
For HashGNN's this means instead using different min-hash functions for the two relationships.
112
138
For example, in a citation network, a paper citing another paper is very different from the paper being cited.
113
139
140
+
114
141
=== Output densification
115
142
116
143
Since binary embeddings need to be of higher dimension than dense floating point embeddings to encode the same amount of information, binary embeddings require more memory and longer training time for downstream models.
@@ -119,12 +146,26 @@ This behavior is activated by specifying `outputDimension`.
119
146
Output densification can improve runtime and memory of downstream tasks at the cost of introducing approximation error due to the random nature of the projection.
120
147
The larger the `outputDimension`, the lower the approximation error and performance savings.
121
148
149
+
122
150
=== Usage in machine learning pipelines
123
151
124
152
It may be useful to generate node embeddings with HashGNN as a node property step in a machine learning pipeline (like xref:machine-learning/linkprediction-pipelines/link-prediction.adoc[] and xref:machine-learning/node-property-prediction/index.adoc[]).
125
-
HashGNN is an xref:machine-learning/node-embeddings/index.adoc#node-embeddings-generalization[inductive] embedding algorithm.
126
-
It is therefore suitable to use in pipelines.
127
-
Since HashGNN is a random algorithm, in order for the embeddings to be consistent between runs (training and prediction calls), a value for the `randomSeed` configuration parameter should be provided when adding the HashGNN node property step to the training pipeline.
153
+
Since HashGNN is a random algorithm and xref:machine-learning/node-embeddings/index.adoc#node-embeddings-generalization[inductive] only when `featureProperties` and `randomSeed` are given, there are some things to have in mind.
154
+
155
+
In order for a machine learning model to be able to make useful predictions, it is important that features produced during prediction are of a similar distribution to the features produced during training of the model.
156
+
Moreover, node property steps (whether HashGNN or not) added to a pipeline are executed both during training, and during the prediction by the trained model.
157
+
It is therefore problematic when a pipeline contains an embedding step which yields all too dissimilar embeddings during training and prediction.
158
+
159
+
This has some implications on how to use HashGNN as a node property step.
160
+
In general, if a pipeline is trained using HashGNN as a node property step on some graph "g", then the resulting trained model should only be applied to graphs that are not too dissimilar to "g".
161
+
162
+
If feature generation is used, most of the nodes in the graph that a prediction is being run on, must be the same nodes (in the database sense) as in the original graph "g" that was used during training.
163
+
The reason for this is that HashGNN generates the node features randomly, and in this case is seeded based on the nodes' ids in the Neo4j database from whence the nodes came.
164
+
165
+
If feature generation is not used (`featureProperties` is given), the random initial node embeddings are derived from node property vectors only, so there is no random seeding based on node ids.
166
+
167
+
Additionally, in order for the feature propagation of the HashGNN message passing to be consistent between runs (training and prediction calls), a value for the `randomSeed` configuration parameter must be provided when adding the HashGNN node property step to the training pipeline.
@@ -133,11 +174,14 @@ In order to improve the embedding quality using HashGNN on one of your graphs, i
133
174
This process of finding the best parameters for your specific use case and graph is typically referred to as https://en.wikipedia.org/wiki/Hyperparameter_optimization[hyperparameter tuning].
134
175
We will go through each of the configuration parameters and explain how they behave.
135
176
177
+
136
178
=== Iterations
179
+
137
180
The maximum number of hops between a node and other nodes that affect its embedding is equal to the number of iterations of HashGNN which is configured with `iterations`.
138
181
This is analogous to the number of layers in a GNN or the number of iterations in FastRP.
139
182
Often a value of `2` to `4` is sufficient, but sometimes more iterations are useful.
140
183
184
+
141
185
=== Embedding density
142
186
143
187
The `embeddingDensity` parameter is what the original paper denotes by `k`.
@@ -147,23 +191,41 @@ The higher this parameter is set, the longer it will take to run the algorithm,
147
191
To large extent, higher values give better embeddings.
148
192
As a loose guideline, one may try to set `embeddingDensity` to 128, 256, 512, or roughly 25%-50% of the embedding dimension, i.e. the number of binary features.
149
193
194
+
195
+
=== Feature generation
196
+
197
+
The `dimension` parameter determines the number of binary features when feature generation is applied.
198
+
A high dimension increases expressiveness but requires more data to be useful and can lead to the curse of high dimensionality for downstream machine learning tasks.
199
+
Additionally more computation resources will be required.
200
+
Some values to consider trying for `densityLevel` are very low values such as `1` or `2`, or increase as appropriate.
201
+
202
+
150
203
=== Feature binarization
151
204
152
205
The `dimension` parameter determines the number of binary features when binarization is applied.
153
206
A high dimension increases expressiveness, but also the sparsity of features.
154
-
Therefore, a higher dimension should also be coupled with higher `embeddingDensity` and/or higher `densityLevel`.
207
+
Therefore, a higher dimension should also be coupled with higher `embeddingDensity` and/or lower `threshold`.
155
208
Higher dimension also leads to longer training times of downstream models and higher memory footprint.
156
-
Some values to consider trying for `densityLevel` are very low values such as `1` or `2` or sometimes up to the maximum allowed value of `density/2`.
157
-
The sparsity of the raw features and the input dimension can also affect the best value of `densityLevel`.
209
+
Increasing the threshold leads to sparser feature vectors.
210
+
211
+
The default threshold of `0` leads to fairly many features being active for each node.
212
+
Often sparse feature vectors are better, and it may therefore be useful to increase the threshold beyond the default.
213
+
One heuristic for choosing a good threshold is based on using the average and standard deviation of the hyperplane dot products plus with the node feature vectors.
214
+
For example, one can set the threshold to the average plus two times the standard deviation.
215
+
To obtain these values, run HashGNN and see the database logs where you read them off.
216
+
Then you can use those values to reconfigure the threshold accordingly.
217
+
158
218
159
219
=== Neighbor influence
160
220
161
221
As explained above, the default value is a reasonable starting point.
162
222
If using a hyperparameter tuning library, this parameter may favorably be transformed by a function with increasing derivative such as the exponential function, or a function of the type `a/(b - x)`.
163
223
The probability of selecting (and keeping throughout the iterations) a feature from different nodes depends on `neighborInfluence` and the number of hops to the node.
164
-
Therefore `neighborInfluence` should be re-tuned when `iterations` is changed.
224
+
Therefore, `neighborInfluence` should be re-tuned when `iterations` is changed.
225
+
165
226
166
227
=== Heterogeneous
228
+
167
229
In general, there is a large amount of information to store about paths containing multiple relationship types in a heterogeneous graph, so with many iterations and relationship types, a very high embedding dimension may be necessary.
168
230
This is especially true for unsupervised embedding algorithms such as HashGNN.
169
231
Therefore, caution should be taken when using many iterations in the heterogeneous mode.
@@ -545,6 +607,7 @@ YIELD nodePropertiesWritten
545
607
The graph 'persons' now has a node property `hashgnn-embedding` which stores the node embedding for each node.
546
608
To find out how to inspect the new schema of the in-memory graph, see xref:graph-list.adoc[Listing graphs].
547
609
610
+
548
611
[[algorithms-embeddings-hashgnn-virtual-example]]
549
612
=== Virtual example
550
613
@@ -570,7 +633,7 @@ We use a third hash function "three" for this purpose and `f3` gets the smaller
570
633
We now compute a hash of `f3` using "two" and it becomes `6`.
571
634
Since `5` is smaller than `6`, `f1` is the "winning" neighbor feature for `(b)`, and since `5` is also smaller than `8`, it is the overall "winning" feature.
572
635
Therefore, we add `f1` to the embedding of `(b)`.
573
-
We proceed similarily with `k=1` and `f1` is selected again.
636
+
We proceed similarly with `k=1` and `f1` is selected again.
574
637
Since the embeddings consist of binary features, this second addition has no effect.
575
638
576
639
We omit the details of computing the embedding of `(c)`.
0 commit comments