Skip to content

Snakemake

Mark Keller edited this page Feb 21, 2019 · 3 revisions

Snakemake is a tool for managing data analysis workflows using rules that describe how to create output files from input files.

This wiki page is meant to provide some insights into common Snakemake conventions and answers to frequently asked questions. A more in-depth resource on Snakemake is its documentation, available here https://snakemake.readthedocs.io/en/stable/

If interested in seeing some of these ideas in action see the following LRGR repositories:

Conventions

Data Directories

If processing data from some "raw" data source, it is nice to separate these two types of files into their own directories. A common convention is to create separate directories called processed and raw.

File Constants

Rather than re-typing file paths when specifying the same file again and again as an input and output in your Snakefile, define files as "constants" (just python variables named in all caps) at the top of the Snakefile.

File Paths

For consistency across operating systems, do not include slashes in file paths in snakefiles. Instead, at the top of your snakefiles include from os.path import join and use the join function instead.

Multiple Snakefiles

The following are some tips if dealing with a project that grows large enough to need multiple snakefiles, particularly if arranged in a hierarchical way.

Rule Naming

When dealing with multiple snakefiles in a project, use unique rule names across all snakefiles in the project so they are not overwritten when including in one another.

Variable Naming

Same idea as the above Rule Naming.

Subworkflows

Use subworkflows to...

Frequently asked questions

My input filenames are too irregular?

https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#snakefiles-input-functions

Clone this wiki locally