End-to-End Sparse Point Cloud Jet Classification

A deep learning framework for Quark-Gluon jet classification using sparse point cloud representations. This project implements multiple neural network architectures optimized for high-energy physics jet tagging tasks.

Project Overview

This repository contains the implementation of end-to-end deep learning models for classifying particle jets as quarks or gluons. The framework processes jet data as point clouds, leveraging both convolutional and attention-based architectures to capture geometric and topological features.

Project Reference: ML4Sci GSoC 2025 - End-to-End Sparse Deep Learning

Architecture

Supported Models

1. Sparse CNN (ResNet-based)

1D Convolutional ResNet architecture optimized for point clouds
Variants: Small (S), Medium (M), Large (L)
Supports multiple resolutions: 256, 512, 768, 1024 points
Features residual connections and batch normalization

2. Aggregation Transformer

Self-attention mechanism for point cloud processing
Global and local feature aggregation
Positional encoding for spatial relationships
Variants optimized for different point cloud sizes

Model Variants

Model	Resolution	Parameters
ResNet_PC_256_S	256 points	Small
ResNet_PC_512_S	512 points	Small
ResNet_PC_768_S	768 points	Small
ResNet_PC_768_M	768 points	Medium
ResNet_PC_1024_S	1024 points	Small
ResNet_PC_1024_M	1024 points	Medium
ResNet_PC_1024_L	1024 points	Large

Project Structure

e2e_sparse/
├── DataGeneration/
│   └── QuarkGluon/
│       ├── ToPointCloudForm.py    # Convert raw data to point cloud format
│       └── 4Resolutions.sh        # Generate datasets at 4 resolutions
│
├── Supervised/
│   ├── CNN/
│   │   └── model.py               # ResNet-based CNN architectures
│   ├── AggregationTransformer/
│   │   └── model.py               # Transformer-based models
│   ├── trainer.py                 # Single-GPU training script
│   ├── trainer4Node.py            # Multi-node distributed training
│   └── Experiments/
│       └── Scripts/
│           ├── bash/              # Local training scripts
│           ├── slurm-no-resume/   # SLURM scripts without resume
│           └── slurm-preempt-chain/  # SLURM with checkpoint resume
│
└── README.md

Data Preparation

Generate Point Cloud Datasets:

cd DataGeneration/QuarkGluon
bash 4Resolutions.sh

This will create datasets at 4 different resolutions:

QG256.h5 (256 points)
QG512.h5 (512 points)
QG768.h5 (768 points)
QG1024.h5 (1024 points)

Training

Single GPU Training

python Supervised/trainer.py \
  --datapath=/path/to/QG1024.h5 \
  --Nepochs=100 \
  --lr=1e-3 \
  --model_variant=ResNet_PC_1024_S \
  --UseWandb=True \
  --wandb_project=quark-gluon \
  --wandb_entity=your-entity \
  --wandb_run_name=resnet_1024_experiment \
  --wandb_key=your-api-key \
  --Checkpoint_dir=/path/to/checkpoints \
  --NAccumSteps=1

Multi-GPU Training

python Supervised/trainer4Node.py \
  --datapath=/path/to/QG1024.h5 \
  --Nepochs=100 \
  --lr=1e-3 \
  --model_variant=Transformer_PC_1024_S \
  --UseWandb=True \
  --wandb_project=quark-gluon \
  --wandb_entity=your-entity \
  --wandb_run_name=transformer_1024_4gpu \
  --wandb_key=your-api-key \
  --Checkpoint_dir=/path/to/checkpoints

SLURM Cluster Training

For HPC clusters with SLURM:

# Preempt-chain (with automatic resume on preemption)
cd Supervised/Experiments/Scripts/slurm-preempt-chain
sbatch AggregationTransformer1024.sh <run_id>

# Standard SLURM (no resume)
cd Supervised/Experiments/Scripts/slurm-no-resume
sbatch SparseCNNResnet.sh

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
DataGeneration/QuarkGluon		DataGeneration/QuarkGluon
Supervised		Supervised
assets		assets
.gitignore		.gitignore
README.md		README.md
Report.md		Report.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

End-to-End Sparse Point Cloud Jet Classification

Project Overview

Architecture

Supported Models

1. Sparse CNN (ResNet-based)

2. Aggregation Transformer

Model Variants

Project Structure

Data Preparation

Training

Single GPU Training

Multi-GPU Training

SLURM Cluster Training

About

Uh oh!

Releases

Packages

Languages

waridrox/e2e_sparse

Folders and files

Latest commit

History

Repository files navigation

End-to-End Sparse Point Cloud Jet Classification

Project Overview

Architecture

Supported Models

1. Sparse CNN (ResNet-based)

2. Aggregation Transformer

Model Variants

Project Structure

Data Preparation

Training

Single GPU Training

Multi-GPU Training

SLURM Cluster Training

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages