You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/modules/ROOT/pages/machine-learning/node-embeddings/hashgnn.adoc
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -197,6 +197,9 @@ As a loose guideline, one may try to set `embeddingDensity` to 128, 256, 512, or
197
197
The `dimension` parameter determines the number of binary features when feature generation is applied.
198
198
A high dimension increases expressiveness but requires more data in order to be useful and can lead to the curse of high dimensionality for downstream machine learning tasks.
199
199
Additionally, more computation resources will be required.
200
+
However, binary embeddings only have a single bit of information per dimension.
201
+
In contrast, dense `Float` embeddings have 64 bits of information per dimension.
202
+
Consequently, in order to obtain similarly good embeddings with HashGNN as with an algorithm that produces dense embeddings (e.g. FastRP or GraphSAGE) one typically needs a significantly higher dimension.
200
203
Some values to consider trying for `densityLevel` are very low values such as `1` or `2`, or increase as appropriate.
201
204
202
205
@@ -208,6 +211,10 @@ Therefore, a higher dimension should also be coupled with higher `embeddingDensi
208
211
Higher dimension also leads to longer training times of downstream models and higher memory footprint.
209
212
Increasing the threshold leads to sparser feature vectors.
210
213
214
+
However, binary embeddings only have a single bit of information per dimension.
215
+
In contrast, dense `Float` embeddings have 64 bits of information per dimension.
216
+
Consequently, in order to obtain similarly good embeddings with HashGNN as with an algorithm that produces dense embeddings (e.g. FastRP or GraphSAGE) one typically needs a significantly higher dimension.
217
+
211
218
The default threshold of `0` leads to fairly many features being active for each node.
212
219
Often sparse feature vectors are better, and it may therefore be useful to increase the threshold beyond the default.
213
220
One heuristic for choosing a good threshold is based on using the average and standard deviation of the hyperplane dot products plus with the node feature vectors.
0 commit comments