GVL logo Melbourne Bioinformatics logo

Galaxy Workflows

Written and maintained by Simon Gladman - Melbourne Bioinformatics (formerly VLSCI)

Background

This workshop/tutorial will familiarise you with the Galaxy workflow engine. It will cover the following topics:

  • Logging in to the server
  • How to construct and use a workflow by various methods
  • How to share a workflow

Section 1: Preparation.

The purpose of this section is to get you to log in to the server.. For this workshop, you can use a Galaxy server that you have created by following the steps outlined in: Launch a GVL Galaxy Instance or you can use the public Galaxy-tut

Go to Galaxy URL of your server in Firefox or Chrome (your choice, please don't use IE or Safari.)

  • If you have previously registered on this server just log in:
    • On the top menu select: User -> Login
    • Enter your password
    • Click the Submit button
  • If you haven’t registered on this server, you’ll need to now.
    • On the top menu select: User -> Register
    • Enter your email, choose a password, repeat it and add a (all lower case) one word name
    • Click on the Submit button.

Section 2: Create and run a workflow.

This section will show you two different methods to create a workflow and then how to run one.

Import the workflow history

This step will show you how to import a history from a remote source into your own workspace. We will be using this history to build a workflow.

  • From the histories menu , click Import from File.
  • Enter https://swift.rc.nectar.org.au:8888/v1/AUTH_377/public/Workflow-finished-history.tar.gz in the URL box.
  • Click Submit

You might have to wait for a bit, then:

  • From the history menu , click Saved Histories
  • Select the history: Imported: Workflow-finished

Workflow creation: Method 1

We will create a workflow from an existing history. You can use this method to make a re-useable analysis from one you’ve already done. i.e. You can perform the analysis once and then create a workflow out of it to re-use it on more/new data. We will create a workflow from the history you imported in step 1 above. The footnote below explains the steps that created this history. These are the steps we will mimic in the workflow.

Make sure your current history is the one you imported in Section 2 - Step 1 above (imported: Workflow_finished.) If not, switch to it.

Now we will create the workflow.

  • Click on the histories menu button at the top of the history pane.
  • Click Extract Workflow

You will now be shown a page which contains the steps used to create the history you are extracting from. We use this page to say what to include in the workflow. We want everything here so we’ll just accept the defaults and:

  • Change the Workflow name to something sensible like “Basic Variant Calling Workflow”
  • Click Create Workflow

The workflow is now accessible via the bottom of the tool pane by clicking on All Workflows.

Some discussion:

Have a look at your workflow. Click on its button in the workflow list. It’s tool interface will appear. You can now run this workflow any time you like with different input datasets. NOTE: The input data sets must be of the same types as the original ones. i.e. Two fastq reads files and one fasta reference sequence.

More interesting though is to:

  • Click on the Workflows link in the top menu
  • Click on the down arrow on your workflow’s button.
  • Click Edit

A visualisation of your workflow will appear. Note the connections and the steps.

Next we’ll go through how to create this workflow using the editor..

Workflow Creation: Method 2

We will now create the same read mapping/variant calling workflow using the editor directly. We need to get some reads and a reference, map the reads to the reference using BWA, run Freebayes on the BAM to call the variants, and finally filter the resulting vcf file.

Step 1: Create a workflow name and edit space.

  • Click on Workflow in Galaxy’s menu.
  • Click on the Create New Workflow button.
  • In the "Workflow Name" text box type: Variants from scratch
  • Click the Create button.

You’ll now see a list of workflows. Your’s will be in that list.

Step 2: Open the editor and place component tools

  • Click on the down arrow on your workflow’s button.
  • Select Edit

You’ll be presented with a blank workflow grid.

Add three input datafiles.

  • In the Workflow control section of the tool pane, click on Inputs -> Input dataset three times.
  • Spread them out towards the left hand side of the workflow grid by clicking and dragging them around.
  • For each one, change their name.
    • Click on each input box in turn
    • In the right hand pane (where the history usually is), change the name to:
      • Reference data
      • Reads 1
      • Reads 2 - respectively.

Add in the BWA mapping step.

  • Click on NGS: Mapping -> Map with BWA in the tool pane. BWA will be added to the workflow grid.
  • In the right hand pane (where the history usually is), change the following parameters.
    • Change “Will you select a reference genome from your history or use a built-in index?:” to Use a genome from history.
    • Note that the BWA box on the grid changes to match these settings.

Connect the tools together.

  • Click and drag on the output of one of the input dataset tools to each of the input spots in the BWA tool. (Make connections.)
    • "Reference data" output to reference to "Use the following dataset as the reference sequence"
    • "Reads 1" output to "Select first set of reads"
    • "Reads 2" output to "Select second set of reads"

Add in the Freebayes (Variant Calling step.)

  • Click on NGS: Variant Calling -> Freebayes
  • In the right hand pane, change the following:
    • "Choose the source for the reference list:" to History
    • Connect "Map with BWA" bam output to "Freebayes’" bam input.
    • Connect the "Reference data" output to "Freebayes’" Use the following dataset as the reference sequence input.

If you’re keen - Note: this is optional. Also change the following parameters the right hand pane for freebayes to make it a bit more sensible for variant calling in bacterial genomes.

  • "Choose parameter selection level": Complete list of all options
  • "Set population model": Yes
  • "Set ploidy for the analysis": 1
  • "Set input filters": Yes
  • "Exclude alignments from analysis if they have a mapping quality less than": 20
  • "Exclude alleles from analysis if their supporting base quality is less than": 20
  • "Require at least this fraction of observations … to evaluate the position": 0.9
  • "Require at least this count of observations .. to evaluate the position": 10
  • "Set population and mappability priors": Yes
  • "Disable incorporation of prior expectations about observations": Yes

Add in the Filter step.

  • Click on Filter and Sort - > Filter
  • Connect "Freebayes’" output_vcf to the "filter" input.
  • In the right hand pane, change the following:
    • "With the following condition": c6 > 500
    • "Number of header lines to skip": 56

Phew! We’re nearly done! The only thing left is to select which workflow outputs we want to keep in our history. Next to each output for every tool is a star. Clicking on the stars will select those files as workflow outputs, everything else will be hidden in the history. In this case we only really want the BAM file and the final variants (vcf file.) Therefore:

Select workflow outputs.

  • Click on the star next to "Map with BWA’s" bam file output.
  • Click on the star next to "Filter’s" output vcf.

Step 3: Save it!

Click on the at the top of the workflow grid and select Save.

Congratulations. You’ve just created a Galaxy workflow.

Now to run it!

Running the workflow

We will now make a new history called "Test" and run the workflow on it’s data.

Create the new history

  • From the Histories Menu, select Copy Datasets
  • Select the 2 x fastq files and the Ecoli .fna file.
  • Under destination history, enter Test into "New History Named:"
  • Click Copy History Items button
  • Click on the link to the new history in the green bar at the top of the screen

Run the workflow

  • On the tools pane, click All Workflows
  • Select the Variants from scratch workflow (or whatever you called it.)
  • Give it the correct files.
  • Ecoli ... .fna for Reference data
  • bacterial_std_err_1.fastq for Reads 1
  • bacterial_std_err_2.fastq for Reads 2
  • Click Run Workflow

Your workflow will now run. It will send the right files to the right tools at the right time to the cluster (compute engine on your machine) and wait for them to finish. Watch as they turn yellow then green in turn.

What now?

Where to start? There’s so much you can do with Workflows. You can even run them on multiple file sets from the one setup.