Incorporating Principles of Genetic Structure into the Practice of Genome-wide Association Studies

General Overview

Correlation is not causation, but generally it is correlation that we can measure. This project looks to infer causation from correlation under very specific conditions. We consider a large number of features collected across a smaller number of individuals with the goal of understanding a particular response. Assuming that one of the features causes the response, we aim to prioritise its discovery by accounting for interrelationships among the features.

Domain Overview

The project investigates the deep connections between genetic structure (population genetic processes, linkage disequilibrium, population structure) and the ability to computationally detect genetic variants responsible for variation in traits. The purpose is to develop principles and a guiding framework for a particular class of genome-wide association studies.

Technical Overview

This project addresses an instance of supervised learning in which the features are related by a known covariance structure that results from a well-studied generating process. The idea is to modify the criterion to be optimised so that, under certain assumptions, features are selected according to their propensity to be causal (as opposed to merely predictive).