Sonika Tyagi Lab

Welcome to the Sonika Tyagi Lab

Welcome to the home of bioinformatics & digital health research laboratory. We develop new machine learning methods and pipelines, and apply these methods to solve biological and clinical research questions.

We are passionate about science and solving scientific problems. Our group is dedicated to providing an inclusive and safe environment for everybody to work in. We expect cooperation from existing and new members to adhere to the core values for the integrity of research.

EHR-QC

A complete end-to-end pipeline to standardise and preprocess Electronic Health Records (EHR) for downstream integrative machine learning applications. This utility is primarily focussed to provide domain specific tools for performing common standardisation and pre-processing tasks while handling the healthcare data. Available as both command-line and web interface, this tool aims to provide a fully automated solution for the data wrangling tasks allowing the researchers to focus more on the core modelling process.

Abstract DNA technology. Science medical concept. Futuristic background

EHR-ML

A generalisable automated pipeline for predicting clinical outcomes using electronic health records. Current machine learning research in healthcare often suffers from poor reproducibility due to a lack of standardised methods. EHR-ML allows researchers to perform machine learning analysis on any clinical outcome of interest using EHR data. It encompasses a domain specific modelling technique, and a comprehensive analytical suite in a user-friendly command-line and web interface. This allows the researchers to effortlessly build well-tuned, accurate, and context-specific models. Our goal is for EHR-ML to become the standard for clinical outcome prediction efforts.

GenomicBERT

A foundational genome language model to process DNA, RNA or Protein data. While NLP has effectively preprocessed and extracted “meaning” from human language, its use in biology has largely focused on literature and electronic health records. However, genomic sequence data shares notable similarities with human languages, making it well-suited for NLP: (A) DNA is composed of text strings (A, C, T, G) with its own semantics and grammar, (B) vast amounts of biological data are publicly available and growing exponentially, and (C) recent machine learning advances enhance the scalability of deep learning for genomic data analysis.

CRM Finder

A novel pipeline for predicting the co-binding between Transcription Factors and to generate cis-regulatory clusters from DNA sequences. Mainly it implemented two types of approaches for TF binding prediction: feature-based RFC and Deep learning approach using CNN. After the co-binding prediction, clusters of TFs are iteratively generated for each gene. This utility is accessible to everyone through a web interface where users can give TF of interest to find clusters it belongs and the genes along with a score. The work also includes methods for generating GRNs focusing on cardiac data.

Sonika Tyagi Lab

Sonika Tyagi Lab

Welcome to the Sonika Tyagi Lab

Key research areas

We utilise cutting edge AI and genomics technologies with significant outcomes for the academic and clinical communities to discover new treatments and improve healthcare.

Biomedical data standardisation

Multimodal data integration for personalised medicine

Integrative Genomics

Natural language processing of unstructured data

Projects

EHR-QC

EHR-ML

GenomicBERT

CRM Finder

People

Contact