Revision as of 05:09, 22 February 2018

Main repository:

Start

This tutorial is assuming you start out on a machine with a conda-based python distribution (Anaconda or Miniconda or other).

Installing Conda with Pyenv

If you don't have a version of conda, it is recommended you use Pyenv to manage versions of python.

A very simple pyenv installation script:

install_pyenv.py

#!/usr/bin/python3
import getpass
import subprocess


def install_pyenv():
    user = getpass.getuser()
    if(user=="root"):
        raise Exception("You are root - you should run this script as a normal user.")
    else:
        # Install pyenv 
        pyenvcmd = ["curl","-L","https://raw.githubusercontent.com/pyenv/pyenv-installer/master/bin/pyenv-installer","|","/bin/bash"]
        subprocess.call(pyenvcmd, shell=True)

        # We don't need to add ~/.pyenv/bin to $PATH,
        # it is already done.

if __name__=="__main__":
    install_pyenv()

As noted in the script, you will need to add ~/.pyenv/bin to the path, as that is where the Pyenv versions of Python live.

Next, we have a python script to install snakemake:

#!/usr/bin/python3
import getpass
import tempfile
import subprocess


def install_pyenv():
    user = getpass.getuser()
    if(user=="root"):
        raise Exception("You are root - you should run this script as a normal user.")
    else:
        # Install snakemake
        conda_version = "miniconda3-4.3.30"

        installcmd = ["pyenv","install",conda_version]
        subprocess.call(installcmd)
        
        globalcmd = ["pyenv","global",conda_version]
        subprocess.call(globalcmd)

        # ---------------------------
        # Install snakemake

        pyenvbin = os.environ['HOME']
        condabin = pyenvbin+"/.pyenv/shims/conda"

        subprocess.call([condabin,"update"])

        subprocess.call([condabin,"config","--all","channels","r"])
        subprocess.call([condabin,"config","--all","channels","default"])
        subprocess.call([condabin,"config","--all","channels","conda-forge"])
        subprocess.call([condabin,"config","--all","channels","bioconda"])

        subprocess.call([condabin,"install","--yes","-c","bioconda","snakemake"])

        # ---------------------------
        # Install osf cli client
        
        pyenvbin = os.environ['HOME']
        pipbin = pyenvbin+"/.pyenv/shims/pip"

        subprocess.call([pipbin,"install","--upgrade","pip"])
        subprocess.call([pipbin,"install","--user","osfclient"])


if __name__=="__main__":
    install_pyenv()

Get Tutorial Files

Start by getting the files needed for the tutorial:

wget https://bitbucket.org/snakemake/snakemake-tutorial/get/v3.11.0.tar.bz2
tar -xf v3.11.0.tar.bz2 --strip 1

Create the conda environment:

conda env create --name snakemake-tutorial --file environment.yaml

Now activate the conda environment:

source activate snakemake-tutorial

First Snakefile

Create a Snakefile, and add the first rule:

rule bwa_map:
    input:
        "data/genome.fa",
        "data/samples/{sample}.fastq"
    output:
        "mapped_reads/{sample}.bam"
    shell:
        "bwa mem {input} | samtools view -Sb - > {output}"

This creates a folder with data in it, and an environment.yaml file for conda.

Executing the Rule

Unlike with Makefiles, Snakefile rules are executed based on their output files. So, if we want to execute the rule bwa_map, we look at the output file:

rule bwa_map:
    ...
    output:
        "mapped_reads/{sample}.bam"

That means we have to run snakefile and ask for mapped_reads/{sample}.bam, and this requires the input file data/samples/{sample}.fastq to be in place.

The shell portion is what stitches the input and output files together. So basically, you say "I want this output file" and Snakemake back-calculates the task graph needed to obtain that output file.

To do a dry run of the workflow:

$ snakemake -np mapped_reads/A.bam mapped_reads/B.bam

rule bwa_map:
    input: data/genome.fa, data/samples/B.fastq
    output: mapped_reads/B.bam
    jobid: 0
    wildcards: sample=B

bwa mem data/genome.fa data/samples/B.fastq | samtools view -Sb - > mapped_reads/B.bam

rule bwa_map:
    input: data/genome.fa, data/samples/A.fastq
    output: mapped_reads/A.bam
    jobid: 1
    wildcards: sample=A

bwa mem data/genome.fa data/samples/A.fastq | samtools view -Sb - > mapped_reads/A.bam
Job counts:
	count	jobs
	2	bwa_map
	2

This explains the tasks that are going to be executed: two input files leads to two separate bwa_map tasks. We can see the commands that Snakemake is going to execute just below information about each rule.

Flags

Snakemake/DahakFlot: Difference between revisions

From charlesreid1