Skip to content

Commit cd57650

Browse files
breakanalysisadamnschMats-SX
committed
Minor rewording of docs
Co-Authored-By: Adam Schill Collberg <adam.schill.collberg@protonmail.com> Co-Authored-By: Mats Rydberg <mats.rydberg@neotechnology.com>
1 parent 32f8d5e commit cd57650

File tree

1 file changed

+6
-4
lines changed
  • doc/modules/ROOT/pages/machine-learning/node-embeddings

1 file changed

+6
-4
lines changed

doc/modules/ROOT/pages/machine-learning/node-embeddings/hashgnn.adoc

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -197,8 +197,9 @@ As a loose guideline, one may try to set `embeddingDensity` to 128, 256, 512, or
197197
The `dimension` parameter determines the number of binary features when feature generation is applied.
198198
A high dimension increases expressiveness but requires more data in order to be useful and can lead to the curse of high dimensionality for downstream machine learning tasks.
199199
Additionally, more computation resources will be required.
200-
However, there is only one bit of information per dimension with binary embeddings, whereas dense `Float` embeddings have 64 bits of information per dimension.
201-
Consequently, one typically needs a significantly higher dimension for HashGNN compared to algorithms producing dense embeddings like FastRP or GraphSAGE, in order to get as good embeddings.
200+
However, binary embeddings only have a single bit of information per dimension.
201+
In contrast, dense `Float` embeddings have 64 bits of information per dimension.
202+
Consequently, in order to obtain similarly good embeddings with HashGNN as with an algorithm that produces dense embeddings (e.g. FastRP or GraphSAGE) one typically needs a significantly higher dimension.
202203
Some values to consider trying for `densityLevel` are very low values such as `1` or `2`, or increase as appropriate.
203204

204205

@@ -210,8 +211,9 @@ Therefore, a higher dimension should also be coupled with higher `embeddingDensi
210211
Higher dimension also leads to longer training times of downstream models and higher memory footprint.
211212
Increasing the threshold leads to sparser feature vectors.
212213

213-
There is only one bit of information per dimension with binary embeddings, whereas dense `Float` embeddings have 64 bits of information per dimension.
214-
Consequently, one typically needs a significantly higher dimension for HashGNN compared to algorithms producing dense embeddings like FastRP or GraphSAGE, in order to get as good embeddings.
214+
However, binary embeddings only have a single bit of information per dimension.
215+
In contrast, dense `Float` embeddings have 64 bits of information per dimension.
216+
Consequently, in order to obtain similarly good embeddings with HashGNN as with an algorithm that produces dense embeddings (e.g. FastRP or GraphSAGE) one typically needs a significantly higher dimension.
215217

216218
The default threshold of `0` leads to fairly many features being active for each node.
217219
Often sparse feature vectors are better, and it may therefore be useful to increase the threshold beyond the default.

0 commit comments

Comments
 (0)