skip to navigation skip to content
- Select training provider - (Bioinformatics)
Mon 8 Jul - Thu 11 Jul 2019
09:30 - 16:30

Venue: Bioinformatics Training Room, Craik-Marshall Building

Provided by: Bioinformatics


Booking

Bookings cannot be made on this event (Event is completed).


Other dates:

No more events

[ Show past events ]



Register interest
Register your interest - if you would be interested in additional dates being scheduled.


Booking / availability

Variant Discovery with GATK4
PrerequisitesUpdated

Mon 8 Jul - Thu 11 Jul 2019

Description

This workshop will focus on the core steps involved in calling germline short variants, somatic short variants, and copy number alterations with the Broad’s Genome Analysis Toolkit (GATK), using “Best Practices” developed by the GATK methods development team. A team of methods developers and instructors from the Data Sciences Platform at Broad will give talks explaining the rationale, theory, and real-world applications of the GATK Best Practices. You will learn why each step is essential to the variant-calling process, what key operations are performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset. If you are an experienced GATK user, you will gain a deeper understanding of how the GATK works under-the-hood and how to improve your results further, especially with respect to the latest innovations.

  • Day 1: Introductory and Overview. The first day of the workshop gives a high-level overview of various topics in the morning, and in the afternoon we show how these concepts apply to a case study. The case study is tailored based on the audience, as represented by their answers in our pre-workshop survey.
  • Day 2: Germline Short Variant Discovery. Today we dive deep into the tools that make up the GATK Best Practices Pipeline. In the morning we discuss variant discovery, and in the afternoon we look at refinement and filtering. You will have the opportunity both in the morning and in the afternoon to get hands-on with these tools and run them yourself.
  • Day 3: Somatic Variant Discovery. Today we will cover Somatic Variant Discovery in more depth. In the morning we primarily focus on calling short variants with Mutect2, and in the afternoon we look at copy number alterations. Both sections have a paired hands-on activity.
  • Day 4: Pipelining. Over the first three days, you would have learned a lot about different pipelines and tools that you can use in GATK. Today we will be learning all about how those pipelines are written in a language called WDL. In the afternoon we cover other useful topics to working on the cloud, including Docker and BigQuery.

Please note that this workshop is focused on human data analysis. The majority of the materials presented does apply equally to non-human data, and we will address some questions regarding adaptations that are needed for analysis of non-human data, but we will not go into much detail on those points.

The hands-on GATK tutorials in this workshop will be conducted on Terra, a new platform developed at Broad in collaboration with Verily Life Sciences for accessing data, running analysis tools and collaborating securely and seamlessly.

The training room is located on the first floor and there is currently no wheelchair or level access available to this level.

Please note that if you are not eligible for a University of Cambridge Raven account you will need to Book or register Interest by linking here.

Target audience
  • The course is aimed primarily at mid-career scientists – especially those whose formal education likely included statistics, but who have not perhaps put this into practice since.
  • Graduate students, Postdocs and Staff members from the University of Cambridge, Affiliated Institutions and other external Institutions or individuals
  • Please be aware that these courses are only free for registered University of Cambridge students. All other participants will be charged a registration fee in some form. Registration fees and further details regarding the charging policy are available here.
  • Further details regarding eligibility criteria are available here
Prerequisites
  • Familiarity with the basic terms and concepts of genetics and genomics.
  • Basic familiarity with the command line environment is required.
  • Sufficient UNIX experience might be obtained from one of the many UNIX tutorials available online.
Sessions

Number of sessions: 4

# Date Time Venue Trainers
1 Mon 8 Jul 2019   09:30 - 16:30 09:30 - 16:30 Bioinformatics Training Room, Craik-Marshall Building Tom Lyons,  Adelaide Rhodes,  Mark Fleharty,  Lee Lichtenstein
2 Tue 9 Jul 2019   09:30 - 16:30 09:30 - 16:30 Bioinformatics Training Room, Craik-Marshall Building Tom Lyons,  Adelaide Rhodes,  Mark Fleharty,  Lee Lichtenstein
3 Wed 10 Jul 2019   09:30 - 16:30 09:30 - 16:30 Bioinformatics Training Room, Craik-Marshall Building Tom Lyons,  Adelaide Rhodes,  Mark Fleharty,  Lee Lichtenstein
4 Thu 11 Jul 2019   09:30 - 16:30 09:30 - 16:30 Bioinformatics Training Room, Craik-Marshall Building Tom Lyons,  Adelaide Rhodes,  Mark Fleharty,  Lee Lichtenstein
Topics covered

Bioinformatics, Data handling, Data mining, Data visualisation, Genomics, Sequence variations

Objectives

After this course you should be able to:

  • Understand the overall variant discovery workflow rationale and requirements
  • Understand key methods and functionalities in light of the latest research
  • Understand key differences between germline and somatic variant discovery approaches
  • Apply analysis tools and Best Practices workflows to a real data set
  • Interpret analysis results and troubleshoot common problems
  • Write and execute WDL analysis pipelines
Aims

During this course you will learn about:

  • Pre-processing of high-throughput sequencing data
  • Variant discovery (germline and somatic short variants, somatic CNV)
  • Germline variant filtering and evaluation
  • Pipelining strategies
Format

Presentations, demonstrations and practicals

Timetable

Day 1 Topics
9:30 - 12:30
  • Opening Remarks
  • Introduction to Sequencing Data
  • Introduction to Data Preprocessing
  • Introduction to Variant Discovery
  • Introduction to Pipelining Platforms

12:30 - 13:30 Lunch (not provided)
13:30 - 16:30
  • Terra Orientation
  • Case Study (hands-on)
Day 2 Topics
9:30 - 12:30
  • Introduction to Germline Variant Discovery
  • HaplotypeCaller
  • Joint Calling
  • Germline Variant Discovery Tutorial (hands-on)
12:30-13:30 Lunch (not provided)
13:30-16:30
  • Variant Filtering
  • Genotype Refinement
  • Callset Evaluation
  • Germline Hard Filtering Tutorial (hands-on)
Day 3 Topics
9:30 - 12:30
  • Introduction to Somatic Variant Discovery
  • Somatic SNVs and Indels
  • GATK4 Mutect2 Tutorial (hands-on)
12:30-13:30 Lunch (not provided)
13:30-16:30
  • Somatic CNAs
  • GATK4 Somatic CNA Tutorial (hands-on)
  • GATK Best Practices for SNP/Indel Variant Calling in Mitochondria (demo)
Day 4 Topics
9:30 - 12:30
  • The Basics of WDL and Cromwell
  • Hello World WDL Tutorial
  • WDL Puzzles
12:30 - 13:30 Lunch (not provided)
13:30 - 16:30
  • Docker
  • BigQuery
  • Tea/Coffee breaks each day: mid-morning and mid-afternoon
Registration Fees
  • Free for registered University of Cambridge students
  • £ 50/day for all University of Cambridge staff, including postdocs, temporary visitors (students and researchers) and participants from Affiliated Institutions. Please note that these charges are recovered by us at the Institutional level
  • It remains the participant's responsibility to acquire prior approval from the relevant group leader, line manager or budget holder to attend the course. It is requested that people booking only do so with the agreement of the relevant party as costs will be charged back to your Lab Head or Group Supervisor.
  • £ 50/day for all other academic participants from external Institutions and charitable organizations. These charges must be paid at registration
  • £ 100/day for all Industry participants. These charges must be paid at registration
  • Further details regarding the charging policy are available here
Duration

4

Frequency

Once a year

Theme
Bioinformatics

Booking / availability