Overview

Date:Starts 8 May 2025, 15:00Ends 8 May 2025, 16:30
Seats available:40
Location:
EDIT Analysen or online (Opens in new tab)
Zoom (Opens in new tab)
Language:English

Zoom password: ai4science

The on-site event will be followed by fika in the Analysen coffee area (fika from 16:00-16:30).

Abstract:

In recent years, machine learning (ML) methods have transformed computational chemistry and materials research. In ML algorithms, we rely on machine-learning representations to serve as a ‘mathematical proxy’ for our underlying chemistry. Molecular featurization – how we transform atoms and molecules into mathematical signals appropriate for machine-learning thermodynamic quantities – has an important role in our ability to learn material properties and observable quantities.

There are many ways to encode raw chemical data, notably string-based representations such as SMILES or SELFIES, and the suitable choice largely depends on the problem at hand and the architecture of our models. However, in thermodynamic contexts, where the chemistry and connectivity remain largely unchanged, such as in molecular simulation, it is more typical to use configuration-dependent features, which transform molecular coordinates into a range of suitable numerical representations.

In this talk, I will primarily focus on how we assess and interpret models built on such molecular representations, focusing on how to do so using shallow, simple machine-learning models.

I’ll first start on the idea of thermodynamic fingerprints as order parameters for complex phenomena, contrasting technologies built through the machine-learning potentials community with traditional analyses, as well as extending these ideas into new methods for bottom-up coarse-graining.

From here, I will focus on how to extract actionable chemical and physical principles from models built on chemical data, a task traditionally achieved through unsupervised analyses such as principal components analysis or t-stochastic neighborhood embeddings. However, these methods only ask, ‘What makes these data points similar?’ not ‘In what ways does my model see these points as similar?’ The latter question, particularly in the context of supervised ML models, is more powerful and informative for structure-property relationships.

Our results show that this multi-objective framing, with its inherent interpretability, reveals underlying trends across many ML tasks, from materials classification to machine-learning potential building to non-linear regression tasks.

About the speaker:

Rose K. Cersonsky is the Michael and Virginia Conway Assistant Professor of Chemical and Biological Engineering at the University of Wisconsin-Madison.

Rose’s research group at UW-Madison, established in 2023, centers on developing techniques for and using data science and machine learning to unify our understanding of molecular motion and interactions across length scales. She and her group lead the development of scikit-matter, a scikit-learn-affiliated package for quantitative structure-property relations in materials research, and are core developers of chemiscope, an interactive visualizer for data-driven analyses of molecular datasets.

Structured learning

This theme focuses on how to make use of structure in data to build machine learning (ML) and artificial intelligence (AI) systems which are safer, more trustworthy and generalize better. Structure includes the relationship between data, in time and space, and how the predictions change when data is transformed in specific ways, for example rotated or scaled. These topics are abstract and general but have a direct impact on the use of AI and ML in the sciences and in applications such as drugs and materials design, or medical imaging.

Current

How data-driven studies translate into chemical and physical principles

Overview

Abstract:

About the speaker:

Structured learning