-
Notifications
You must be signed in to change notification settings - Fork 0
Snakemake
Snakemake is a tool for managing data analysis workflows using rules that describe how to create output files from input files.
This wiki page is meant to provide some insights into common Snakemake conventions and answers to frequently asked questions. A more in-depth resource on Snakemake is its documentation, available here https://snakemake.readthedocs.io/en/stable/
If interested in seeing some of these ideas in action see the following LRGR repositories:
- https://github.com/lrgr/mutation-signatures-data
- (feel free to add to this list)
If processing data from some "raw" data source, it is nice to separate these two types of files into their own directories.
A common convention is to create separate directories called processed and raw.
Rather than re-typing file paths when specifying the same file again and again as an input and output in your Snakefile, define files as "constants" (just python variables named in all caps) at the top of the Snakefile.
For consistency across operating systems, do not include slashes in file paths in snakefiles. Instead, at the top of your snakefiles include from os.path import join and use the join function instead.
The following are some tips if dealing with a project that grows large enough to need multiple snakefiles, particularly if arranged in a hierarchical way.
When dealing with multiple snakefiles in a project, use unique rule names across all snakefiles in the project so they are not overwritten when including in one another.
Same idea as the above Rule Naming.
Use subworkflows to...
https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#snakefiles-input-functions