Externally Funded Research Projects

Predictive Methods for Analyzing Highthroughput and Spatialtemporal Data, NSERC Individual Discovery Grant, 2019  2024, PI.

Predictive Analysis for Highthroughput Data
The accelerated development of highthroughput sequencing biotechnologies has made it affordable to collect highdimensional molecularlevel profiles, such as gene expression, which are called features in general. It is of great interest to identify relevant features associated with a phenotype (eg. cancer status, health disorder). Many researchers have advocated to apply statistical learning methods to perform predictive analysis for highthroughput data. Predictive analysis results can be used in many ways. For example, they can be used to diagnose human diseases, to predict response to a medicine (personalized medicine); they can be used to choose an optimal gene subset for further experiments by plant/animal breeders; the subset of features extracted from good predictive models can facilitate the uncovering of the biological mechanism for a phenotype. Unfortunately, the highdimensionality causes enormous overfitting in predictive analysis even with very simple models. The chance of finding false predictive features/patterns is extremely high. Therefore, it is challenging to fight against false discovery in predictive analysis. My research in this theme aims to develop new tools for honestly measuring predictivity (such as error rate, AUC) of selected features, and new tools for identifying truly predictive features and for building sharper predictive models for phenotypes. I also practice predictive analysis with specific highthroughput datasets in a variety of scientific problems related to human health.

Predictive Model Evaluation Methods for SpatialTemporal Data
In science, a theory is tested by performing predictions for observations in the future. Significant discrepancies between observations and predictions suggest that the theory is incorrect or flawed. Similarly, looking at outofsample predictions is a straightforward method for comparing and checking goodnessoffit (GOF) of statistical models. Today, increasingly complex models are being proposed for a variety of correlated data such as, temporal, spatial, and repeated measurements data. More widely applicable predictive methods for comparing and checking such complex models are demanded. My research in this theme aims to develop new tools for evaluating complex Bayesian/nonBayesian models with correlated random effects, with applications in many areas such as epidemiology, ecology, and environmental sciences.

 MITACS Accelerate Grant: Develop a web based geospatial artificial intelligence framework to track, visualize, analyze, model, and predict infectious disease spread in realtime, PI, 20202021

Genotype & Environment to Phenotype, subproject from Canada First Research Excellence Fund (CFREF) Project "Designing Crops for Global Food Security", $756,918, 20162019, CoInvestigator (PI: Prof. Kusalik).

Applications of Neural Network Curve Fitting Methods for Leastsquares Monte Carlo Simulations in Financial Risk Management, MITACS Accelerate Internship Fund, 2016, PI.

Bayesian Methods for Highdimensional and Correlated Data, NSERC Individual Discovery Grant, 2014  2019, PI.

Efficient Bayesian Analysis for Complex Models, NSERC Individual Discovery Grant, 2009  2014, PI.

A Computer Cluster for Research on Efficient Bayesian Statistical Methods, CFI Leaders Opportunity Fund, 2009, PI.

Clustering Analysis for Detecting the Types of Vehicles, MITACS Accelerate Internship Fund, 2008, CoPI with Prof. Laverty.