With MIT colleagues in the consortium Machine Learning for Discovery and Synthesis (MLPDS), we continue developing techniques for synthesis planning and analyzing data from high throughput experimentation (HTE) with particular emphasis on process chemistry applications. For example, we have explored using reinforcement learning to include Green Chemistry considerations in computer-aided synthesis planning (CASP). We have employed a dynamic tree-structured long short-term memory model that has learned strategic relationships among single-step reactions to prioritize particular pathways among the thousands of suggestions typically proposed by retrosynthesis tools, such as ASKCOS. Combining optimization and CASP techniques offer the opportunity to design overlapping synthesis plans to minimize the overall number of numbers of starting materials, catalysts, solvents, and reagents need while maximizing the likelihood of synthesis success. Current efforts aim to integrate systems engineering and CASP tools to prioritize reaction pathways according to process chemistry metrics such as the number of steps, costs, solvent choices, and sustainability. Impurity prediction is a long-term research challenge closely tied to reaction prediction efforts with MLPDS.
To facilitate the adoption of the enzymatic pathways, we have developed a single-step retrosynthesis search algorithm to enable enzyme-based synthesis of natural product analogs. This work demonstrates that molecular similarity is an effective metric to propose retrosynthetic disconnections based on analogy to precedent enzymatic reactions. A second model is capable of discriminating between reaction pairs belonging to homologous enzymes and evolutionarily distant enzymes and thus allows estimating the likelihood of experimental evolution success. Recursive use of the similarity-based single-step retrosynthesis and evolution prediction workflow has produced enzymatic synthesis routes for both active pharmaceutical ingredients and commodity chemicals. Current efforts focus on methods to identify chemical steps in a retrosynthesis pathway that enzymatic transformations can replace.
- X. Wang, Y. Qian, H. Gao, C.W. Coley, Y. Mo, R. Barzilay, and K.F. Jensen, Towards efficient discovery of green synthetic pathways with Monte Carlo tree search and reinforcement learning, Chem. Sci. 11, 10959-10972 (2020).
- Y. Mo, Y. Guan, P. Verma, J. Guo, M. E. Fortunato, Z. Lu, C. W. Coley and K. F. Jensen, Evaluating and clustering retrosynthesis pathways with learned strategy, Chem. Sci., 12 1461-1478 (2021).
- H. Gao, J. Pauphilet, T.J. Struble, C.W. Coley, and K.F. Jensen, Direct optimization across computer-generated reaction networks balances materials use and feasibility of synthesis plans for molecule libraries, J. Chem. Inf. Model. 61, 493-504 (2021).
- K. Sankaranarayanan, E. Heid, C.W. Coley, D. Verma, W.H. Green, and K.F. Jensen, Similarity based enzymatic retrosynthesis, Chem. Sci., 13, 6039–6053 (2022)