EHR-QC
A complete end-to-end pipeline to standardise and preprocess Electronic Health Records (EHR) for downstream integrative machine learning applications. This utility is primarily focussed to provide domain specific tools for performing common standardisation and pre-processing tasks while handling the healthcare data. Available as both command-line and web interface, this tool aims to provide a fully automated solution for the data wrangling tasks allowing the researchers to focus more on the core modelling process.
EHR-ML
A generalisable automated pipeline for predicting clinical outcomes using electronic health records. Current machine learning research in healthcare often suffers from poor reproducibility due to a lack of standardised methods. EHR-ML allows researchers to perform machine learning analysis on any clinical outcome of interest using EHR data. It encompasses a domain specific modelling technique, and a comprehensive analytical suite in a user-friendly command-line and web interface. This allows the researchers to effortlessly build well-tuned, accurate, and context-specific models. Our goal is for EHR-ML to become the standard for clinical outcome prediction efforts.
GenomicBERT
A foundational genome language model to process DNA, RNA or Protein data. While NLP has effectively preprocessed and extracted “meaning” from human language, its use in biology has largely focused on literature and electronic health records. However, genomic sequence data shares notable similarities with human languages, making it well-suited for NLP: (A) DNA is composed of text strings (A, C, T, G) with its own semantics and grammar, (B) vast amounts of biological data are publicly available and growing exponentially, and (C) recent machine learning advances enhance the scalability of deep learning for genomic data analysis.
CRM Finder
A novel pipeline for predicting the co-binding between Transcription Factors and to generate cis-regulatory clusters from DNA sequences. Mainly it implemented two types of approaches for TF binding prediction: feature-based RFC and Deep learning approach using CNN. After the co-binding prediction, clusters of TFs are iteratively generated for each gene. This utility is accessible to everyone through a web interface where users can give TF of interest to find clusters it belongs and the genes along with a score. The work also includes methods for generating GRNs focusing on cardiac data.