BDSI Interactive Seminar Series - Seminar 14 October

Dr Adrià Closa (JCSMR) presents 'From Big Data to Translational Research in Paediatric Acute Leukaemia' and Dr Hawlader Al-Mamun (CSIRO, Data61) discusses 'Finding Treasure (Epistatic Interactions) In A Dark Random Forest'.

person Speaker

Speakers

Dr Adrià Closa
Dr Hawlader Al-Mamun
next_week Event series

Content navigation

Description

Dr Adrià Closa (Postdoctoral Fellow, JCSMR)

Title: From Big Data to Translational Research in Paediatric Acute Leukaemia

Abstract: The explosion of next generation sequencing (NGS) techniques in the last ten years has allowed to improve reliability and reduce velocity and cost of the genotyping of individual patients, resulting in the building of large scale and nested online public databases.

These databases provide access for researchers to data produced at high velocity, in great volume and coming from a vast variety of sources, the 3 “V” usually used to define Big Data. The analysis of genomic data is a perfect example of how Big Data collection and analysis can result in novel approaches for health care and personalized medicine. It is a new open window for translational medicine to find new therapeutic targets and improve the diagnosis and prognosis using genomic data from large cohorts of patients from multiple centres.

In this work, we present a practical example of to how apply an analysis of big data using genomic information from patients of multiple studies related with paediatric acute lymphoblastic leukaemia to find new therapeutic targets.

 

Dr Hawlader Al-Mamun (Research Scientist, CSIRO, Data61)

Title: Finding Treasure (Epistatic Interactions) In A Dark Random Forest

Abstract: Many phenotypes and disease traits in human, animals and plants are complex in nature and involve many genes and their interactions. Random Forest (RF) is a popular machine learning tool that is regularly reported as identifying epistatic interactions. To date, the RF based approaches for identifying interactions are based on variable importance measurements which cannot distinguish whether a discovered interaction is a true interaction or simply two variables with strong marginal effects. Additionally, if the interacting variables have small marginal effects, they will not appear near the top of the variable importance list. This means that detecting interactions based on variable importance can be problematic. To enable the identification of epistatic interactions when the interacting variables have an interaction effect with small or no marginal effects, a two-step approach was designed and implemented. First, pairs of variables occurring as parent-child pairs in the forest are tested against the null hypothesis. Pairs identified as potentially interacting are then tested in a second step to see if they have statistical evidence of a true interaction. The approach was evaluated on multiple simulation datasets and two real datasets. Simulation results demonstrated the method was able to identify true interactions. The real data analyses found a small number of interactions that were subsequently shown to improve the accuracy of prediction when included as interaction variables in input data. Although the method has only been used to identify marker interactions, the approach is equally applicable to detect Genotype by Environment (GxE) interactions.

 

Seminars held fortnightly from 2-3pm on Mondays at various locations, light refreshments will follow.

All welcome to attend.

Location

RSB, Eucalyptus Room S205