Stephen Chen L&S Math & Physical Sciences
Predicting Plant Gene Expression Levels Using Deep Neural Networks
Since proteins carry out the majority of work in living cells, how a cell operates is heavily determined by the relative abundances of different proteins at any given time. These relative abundances are largely determined by gene expression levels: the extent to which a gene is “activated.” Predicting gene expression levels from noncoding DNA sequence is a major unsolved problem in computational biology. Recently, machine learning (ML) has emerged as one of the most effective tools for this task. While significant work has been done with human genomes, much work remains to be done with plant genomes. This project aims to use cutting-edge machine learning models, particularly convolutional and transformer-based neural networks, to accurately predict the activity levels of plant genes based solely on their DNA sequences. An effective tool for predicting these gene expression levels in plants could have unique applications for agriculture, renewable energy, and carbon capture.