Variant Discovery with GATK4 PrerequisitesUpdated
This workshop will focus on the core steps involved in calling variants with the Broad’s Genome Analysis Toolkit, using the “Best Practices” developed by the GATK team. You will learn why each step is essential to the variant discovery process, what are the operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset.
In the course of this workshop, we highlight key functionalities such as the germline GVCF workflow for joint variant discovery in cohorts, somatic variant discovery using MuTect2, and copy number variation discovery using GATK-CNV. All analyses are demonstrated using GATK version 4. Finally, we demonstrate the use of pipelining tools to assemble and execute GATK workflows.
The workshop covers basic genomics, all currently supported Best Practices pipelines as well as pipelining with WDL/Cromwell/FireCloud. This includes the logic of the major pipelines, file formats and data transformations involved, and hands-on operation of the tools using goal-oriented exercises.
- Day 1: Introduction to Genomics, GATK Best Practices and Pipelining
- Day 2: Germline short variant discovery (SNPs + Indels)
- Day 3: Somatic variant discovery (SNVs + Indels + CNVs)
- Day 4: Writing pipelines with WDL and running them in FireCloud
Please note that this workshop is focused on human data analysis. The majority of the materials presented does apply equally to non-human data, and we will address some questions regarding adaptations that are needed for analysis of non-human data, but we will not go into much detail on those points.
Please note that if you are not eligible for a University of Cambridge Raven account you will need to Book or register Interest by linking here.
- The course is aimed primarily at mid-career scientists – especially those whose formal education likely included statistics, but who have not perhaps put this into practice since.
- Graduate students, Postdocs and Staff members from the University of Cambridge, Affiliated Institutions and other external Institutions or individuals
- Please be aware that these courses are only free for University of Cambridge students. All other participants will be charged a registration fee in some form. Registration fees and further details regarding the charging policy are available here.
- Further details regarding eligibility criteria are available here
- Familiarity with the basic terms and concepts of genetics and genomics.
- Basic familiarity with the command line environment is required.
- Sufficient UNIX experience might be obtained from one of the many UNIX tutorials available online.
Number of sessions: 4
# | Date | Time | Venue | Trainers | |
---|---|---|---|---|---|
1 | Mon 16 Jul 2018 09:30 - 16:30 | 09:30 - 16:30 | Bioinformatics Training Room, Craik-Marshall Building | map | Geraldine Van der Auwera, Eric Banks, Kate Voss, Takuto Sato, Soo Hee Lee |
2 | Tue 17 Jul 2018 09:30 - 16:30 | 09:30 - 16:30 | Bioinformatics Training Room, Craik-Marshall Building | map | Geraldine Van der Auwera, Eric Banks, Kate Voss, Takuto Sato, Soo Hee Lee |
3 | Wed 18 Jul 2018 09:30 - 16:30 | 09:30 - 16:30 | Bioinformatics Training Room, Craik-Marshall Building | map | Geraldine Van der Auwera, Eric Banks, Kate Voss, Takuto Sato, Soo Hee Lee |
4 | Thu 19 Jul 2018 09:30 - 16:30 | 09:30 - 16:30 | Bioinformatics Training Room, Craik-Marshall Building | map | Geraldine Van der Auwera, Eric Banks, Kate Voss, Takuto Sato, Soo Hee Lee |
Bioinformatics, Data handling, Data mining, Data visualisation, Genomics, Sequence variations
After this course you should be able to:
- Understand the overall variant discovery workflow rationale and requirements
- Understand key methods and functionalities in light of the latest research
- Understand key differences between germline and somatic variant discovery approaches
- Apply analysis tools and Best Practices workflows to a real data set
- Interpret analysis results and troubleshoot common problems
- Write and execute WDL analysis pipelines
During this course you will learn about:
- Pre-processing of high-throughput sequencing data
- Variant discovery (germline and somatic short variants, somatic CNV)
- Germline variant filtering and evaluation
- Pipelining strategies
Presentations, demonstrations and practicals
Day 1 | Topics |
9:30 - 9:45 | Opening remarks |
9:45 - 10:15 | Introduction to Sequence data / pre-processing workflow |
10:15 - 10:45 | Introduction to Germline variant discovery Best Practices workflows |
10:45 - 11:15 | Tea/coffee break |
11:15-11:45 | Introduction to Somatic variant discovery Best Practices workflows |
11:45-12:15 | Introduction to pipelining with WDL & Cromwell & FireCloud |
12:15-12:30 | Closing question time |
12:30-13:30 | Lunch (not provided) |
13:30-13:55 | Mapping |
13:55-14:20 | Marking duplicates |
14:20-14:45 | Base recalibration (BQSR) |
14:45-15:15 | Tea/coffee break |
15:15-16:30 | Hands-on IGV + GATK4 basics |
Day 2 | Topics |
9:30 - 9:45 | Recap of germline variant discovery Best Practices |
9:45-10:15 | HaplotypeCaller |
10:15-10:45 | Joint-calling with GenomicsDB + GenotypeGVCFs |
10:45-11:15 | Tea/coffee break |
11:15-12:30 | Hands-on joint-calling |
12:30-13:30 | Lunch (not provided) |
13:30-14:00 | Filtering with VQSR |
14:00-14:30 | Genotype Refinement |
14:30-15:00 | Callset Evaluation |
15:00-15:30 | Tea/coffee break |
15:30-16:30 | Hands-on filtering approaches |
Day 3 | Topics |
9:30 - 9:45 | Recap of somatic variant discovery Best Practices |
9:45-10:30 | Somatic SNVs and indels with Mutect2 |
10:30-11:00 | Tea/coffee break |
11:00-12:30 | Hands-on Mutect2 |
12:30-13:30 | Lunch (not provided) |
13:30-14:00 | Somatic CNVs with GATK CNV |
14:00-15:15 | Hands-on GATK CNV |
15:15-15:45 | Tea/coffee break |
15:45-16:15 | Preview of upcoming methods: germline CNV and SV |
16:15-16:30 | Open question time |
Day 4 | Topics |
9:30 - 9:45 | WDL/Cromwell 101 |
9:45-10:45 | Hands-on WDL/Cromwell |
10:45-11:15 | Tea/coffee break |
11:15-12:30 | Self-paced WDL exercises |
12:30-13:30 | Lunch (not provided) |
13:30-13:45 | FireCloud 101 |
13:45-14:45 | Hands-on FireCloud Part 1 |
14:45-15:15 | Tea/coffee break |
15:15-16:30 | Hands-on FireCloud Part 2 |
- Free for University of Cambridge students
- £ 50/day for all University of Cambridge staff, including postdocs, and participants from Affiliated Institutions. Please note that these charges are recovered by us at the Institutional level
- It remains the participant's responsibility to acquire prior approval from the relevant group leader, line manager or budget holder to attend the course. It is requested that people booking only do so with the agreement of the relevant party as costs will be charged back to your Lab Head or Group Supervisor.
- £ 50/day for all other academic participants from external Institutions and charitable organizations. These charges must be paid at registration
- £ 100/day for all Industry participants. These charges must be paid at registration
- Further details regarding the charging policy are available here
4
Once a year
- Introduction to genome variation analysis using NGS
- Introduction to high-throughput sequencing data analysis
- EMBL-EBI: European Variation Archive
Booking / availability