Artificial Intelligence Helps Identify Molecular Structures

University of San Diego researchers developed a deep learning-based method to identify the molecular structures of natural products such as soil microorganisms, terrestrial plants and, marine life forms.

According to the researchers, SMART (Small Molecule Accurate Recognition) “has the potential to accelerate the molecular structure identification process ten-fold. This development could represent a paradigm shift in the chemical analysis, pharmaceutical and drug discovery fields since 70 percent of all Food and Drug Administration (FDA)-approved drugs are based on natural products.”

“The structure of a molecule is the enabling information,” said Bill Gerwick, professor of oceanography and pharmaceutical sciences at UC San Diego’s Scripps Institution of Oceanography. “You have to have the structure for any FDA approval. If you want to have intellectual property, you have to patent that structure. If you want to make analogs of that molecule, you need to know what the starting molecule is. It’s a critical piece of information.”

Using a TITAN X and GTX 1080 GPU with the cuDNN-accelerated Lasange (Theano) deep learning framework, the team of interdisciplinary researchers trained their convolutional neural network on thousands of heteronuclear singular quantum coherence nuclear magnetic resonance (HSQC NMR) spectra. The network takes a 2D image of the HSQC NMR spectrum of an unknown molecule and maps it into a 10-dimensional space clustered near similar molecules, making it easier for researchers to elucidate an unknown molecule’s structure.

The SMART cluster map based on training result of 2,054 HSQC spectra over 83,000 iterations, with inset boxes representing different compound classes discussed in the text.

Chen Zhang, a nanoengineering Ph.D. student at UC San Diego collaborating with Gerwick and the first author of the paper, said that determining a molecule’s structure can be a bottleneck in the natural product research process, taking experts months and even years to accurately determine the correct and complete structure.  While each molecule and its identification timeline is different, the SMART approach gives researchers an early clue into what family a new molecule falls under, drastically reducing the time it takes to characterize a new natural product.

The researchers also mention that SMART has an immediate value in natural products drug discovery efforts by being able to provide fast and automatic compound dereplication and assignment to molecular structure families.

Read more >