Dmitri Kireev, Bridging Physics and Machine Learning in Drug Discovery: A Hybrid Approach to Bias, Scale, and Scarcity
par
S3 351
Sciences 3
Despite recent advances, applying machine learning to drug discovery remains challenging due to limited and biased datasets, complex structure–activity relationships, and the sheer scale of chemical space. In this talk, I will describe our hybrid discovery framework (FRASE-bot) that strategically combines machine learning (ML) with physics-based methods such as docking and alchemical binding free energy calculations. I will discuss challenges in generalization and data representation, introduce our Hit-Triage Pretrained Transformer (Hit-TPT) trained as a binary classifier, and explain how ML is used opportunistically – where data are sufficient – while physics-based models are employed to ensure robustness and interpretability. The talk will include insights from our participation in community benchmarking efforts such as CACHE.