BEGIN:VCALENDAR VERSION:2.0 PRODID:-//132.216.98.100//NONSGML kigkonsult.se iCalcreator 2.20.4// BEGIN:VEVENT UID:20250724T191406EDT-6015ulM1MU@132.216.98.100 DTSTAMP:20250724T231406Z DESCRIPTION:Andreas Ziegler\, Dr. rer. nat\n\nScientific Director and CEO | Research Group Cardio-CARE\n\nWhere: Hybrid Event | 2001 ºÚÁϲ»´òìÈ College\, Room 1140\; Zoom\n\nAbstract\n\nRapid advances in high-throughput DNA seq uencing technologies have enabled the conduct of large-scale whole genome sequencing (WGS) studies. In this presentation\, we describe the per-proce ssing pipeline and quality control framework we have selected for the GENE tic SequencIng Study Hamburg-Davos (GENESIS-HD)\, a study involving more t han 9000 human whole genomes. All samples were sequenced on a single Illum ina NovaSeq 6000 with an average coverage of 35x using a PCR-free protocol and unique dual indices (UDI). For quality control\, one genome-in-a-bott le (GIAB) trio was sequenced in triplicate\, and one GIAB sample was seque nced 70 times in different runs. First\, we explain the sequencing approac h using illustrations. We describe important quality control metrics on th e raw data (fastq file)\, after mapping and alignment (bam file)\, after v ariant calling (gvcf file) and multi-sample calling (msvcf file). We provi de empirical data for efficient sample storage using original read archive (ORA) compression of fastq files. Finally\, we sketch methods tailored fo r downstream association analysis and their incorporation in our analysis pipeline. The most important quality metrics for sample filtering were anc estry\, sample cross-contamination\, deviation from the expected Het/Hom r atio\, relatedness\, and too low coverage. We detected some patterns of sa mple cross-contamination which indicate cross-contamination through a mult ichannel pipette. When fastq files were compressed using ORA compression\, the resulting file size was approximately 1/5 of the original file size\, and compression time was linear to mismatch bases. In summary\, the pre-p rocessing\, joint calling and QC of large WGS studies is nowadays feasible in reasonable time and efficient quality control procedures are readily a vailable.\n\nSpeaker Bio\n\nAndreas is Scientific Director and CEO of the non-profit research group Cardio-CARE\, Davos\, Switzerland\, a 100% daugh ter of the Kühne Foundation since 2020. Previously\, he was director of th e Institute of Medical Biometry and Statistics at the University of Lübeck \, Germany. He was president of the German Region of the International Bio metric Society and the International Genetic Epidemiology Society. His res earch covers different areas\, including machine learning\, clinical trial s for medical devices\, and genetic epidemiology. He has authored and co-a uthored more than 500 research articles and 8 books\, including a textbook on 'A Statistical Approach to Genetic Epidemiology'. In the past three ye ars\, Andreas’ main focus was on the whole genome sequencing experiment de scribed in the presentation.\n DTSTART:20230118T203000Z DTEND:20230118T213000Z SUMMARY:Pre-processing and quality control of whole genome sequencing data: a case study using 9000 whole genomes from the GENESIS-HD study URL:/epi-biostat-occh/channels/event/pre-processing-an d-quality-control-whole-genome-sequencing-data-case-study-using-9000-whole -genomes-344373 END:VEVENT END:VCALENDAR