In this talk, I will first give an overview of the recent research in the Structural and Functional Bioinformatics (SFB) group at KAUST. My group works on developing efficient algorithms and machine learning techniques to understand the path from protein sequences to structures to functions.
I will then focus on our efforts on protein 3D structure determination based on nuclear magnetic resonance (NMR) data. NMR is one of the two main methods for protein structure determination. Currently it takes weeks to months of human labor to determine a protein structure after NMR experiments, which includes the peak picking step, the resonance assignment step, and the structure calculation step. If we could fully automate this multi-step process, this would significantly speedup the structural biology research. I will identify the key obstacles in NMR data processing and propose solutions by computational methods. I will discuss our efforts on developing signal processing techniques for the peak picking and peak selection problems, optimization techniques for the resonance assignment problem, and machine learning techniques for the structure calculation problem. Each of these methods subtly handles the noise and imperfection of the others and significantly outperforms the state-of-the-art approaches. As a proof of concept, we combined the proposed methods into a system, which has succeeded in determining high resolution protein structures from a small set of NMR spectra, in a day.