|
2 | 2 | This repository contains reference implementations of three self-supervised learning |
3 | 3 | techniques explored during the Vector Institute's Self-Supervised Learning (SSL) Bootcamp. |
4 | 4 |
|
5 | | -# Installing dependencies |
6 | | -``` |
7 | | -python3 -m venv /path/to/new/virtual/environment/ssl_env |
8 | | -source /path/to/new/virtual/environment/ssl_env/bin/activate |
9 | | -pip install --upgrade pip |
10 | | -pip install -r requirements.txt |
11 | | -``` |
| 5 | +# Summary of Reference Implementations |
12 | 6 |
|
13 | | -If you are on the Vector Institute's Vaughan cluster, the environment is already set up and can be activated with |
| 7 | +| Name | Description | Reference Implementation | |
| 8 | +|------|-------------|-------| |
| 9 | +Internal Contrastive Learning (ICL) + Latent Outlier Exposure (LOE)| ICL learns to maximize the mutual information between two complementary subsets based on the assumption that the relation between a subset of features and the rest of the features is class-dependent. LOE extends ICL to work with contaminated datasets. | [Anomaly Detection in Tabular Data with ICL](src/contrastive_learning/ICL/ICL.ipynb), [Latent Outlier Exposure for Anomaly Detection with Contaminated Data](src/contrastive_learning/LatentOE/LatentOE_Notebook.ipynb) |
| 10 | +SimMTM | Reconstructs a time series signal from multiple randomly masked versions. Uses series-wise representation similarity to do a weighted aggregation of point-wise representations before reconstruction. | [Beijing PM2.5 Air Quality Forecasting](src/masked_modelling/simmtm/simmtm-BeijingPM25Quality-forecasting.ipynb) |
| 11 | +TabRet | TabRet is a pre-trainable Transformer-based model for tabular data and designed to work on a downstream task that contains columns not seen in pre-training. Unlike other methods, TabRet has an extra learning step before fine-tuning called retokenizing, which calibrates feature embeddings based on the masked autoencoding loss. | [Stroke Prediction with the BRFSS dataset](src/masked_modelling/tabret/TabRet.ipynb) |
| 12 | +Data2Vec | Combines masked prediction with self-distillation to predict contextualized latent representations (produced by the teacher network) based on a partial/masked view of the input (given to the student network). | [Image Classification with STL-10 dataset](src/self_distillation/data2vec_vision.ipynb) |
14 | 13 |
|
| 14 | + |
| 15 | +# Setting up the environment |
| 16 | +Prior to installing the dependencies for this project, it is recommended to install |
| 17 | +[uv](https://github.com/astral-sh/uv?tab=readme-ov-file#installation) and create |
| 18 | +a virtual environment. You may use whatever virtual environment management tool |
| 19 | +that you like, including uv, conda, and virtualenv. |
| 20 | + |
| 21 | +With uv, you can create a virtual environment with the following command: |
| 22 | + |
| 23 | +```bash |
| 24 | +uv venv -n --seed --python 3.9 /path/to/new/virtual/environment/ssl_env` |
15 | 25 | ``` |
16 | | -source /ssd003/projects/aieng/public/ssl_bootcamp_resources/venv/bin/activate |
17 | | -``` |
| 26 | +This will create a new virtual environment in the specified path. |
| 27 | + |
| 28 | +**Note**: If you are using the Vector Institute's Vaughan cluster, a virtual |
| 29 | +environment has already been created for you at `/ssd003/projects/aieng/public/ssl_bootcamp_resources/venv`. |
| 30 | +
|
| 31 | +Once you have created a virtual environment, you can activate it with the command: |
18 | 32 |
|
19 | | -# Using pre-commit hooks |
20 | | -To check your code at commit time |
21 | 33 | ``` |
22 | | -pre-commit install |
| 34 | +source /path/to/new/virtual/environment/ssl_env/bin/activate |
23 | 35 | ``` |
24 | 36 |
|
25 | | -You can also get pre-commit to fix your code |
| 37 | +Then, you can install the dependencies for this project with the following command: |
| 38 | +
|
| 39 | +```bash |
| 40 | +git clone https://github.com/VectorInstitute/SSL-Bootcamp.git |
| 41 | +cd SSL-Bootcamp |
| 42 | +uv sync --no-cache --active --dev |
| 43 | +``` |
| 44 | +**Note**: The `--active` flag in the above command assumes that you have already |
| 45 | +activated your virtual environment. If you prefer not to create a new virtual |
| 46 | +environment yourself, you can omit the `--active` flag and uv will create a new virtual environment |
| 47 | +for you in the `.venv` directory inside the project root. |
| 48 | +
|
| 49 | +## Using pre-commit hooks |
| 50 | +To ensure that your code adheres to the project's style and formatting guidelines, |
| 51 | +you can use pre-commit hooks to check for common issues, such as code formatting, |
| 52 | +linting, and security vulnerabilities. Run the following command before pushing |
| 53 | +your code to the repository: |
| 54 | + |
26 | 55 | ``` |
27 | 56 | pre-commit run --all-files |
28 | 57 | ``` |
0 commit comments