David Pfau

I'm a staff research scientist at Google DeepMind. I'm also a visiting professor at Imperial College London in the Department of Physics, where I supervise work on applications of deep learning to computational quantum mechanics. My own research interests span artificial intelligence, machine learning and scientific computing.

Prior to joining DeepMind, I was a PhD student at the Center for Theoretical Neuroscience at Columbia, where I worked on algorithms for analyzing and understanding high-dimensional data from neural recordings with Liam Paninski and nonparametric Bayesian methods for predicting time series data with Frank Wood. I also had a stint as a research assistant in the group of Mike DeWeese at UC Berkeley, jointly between Physics and the Redwood Center for Theoretical Neuroscience.

Current research interests include applications of machine learning to computational physics and connections between differential geometry and unsupervised learning.



Publications

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2010

2009


Talks

Invited Academic Talks

Public Outreach


Software

I try to contribute to open source as much as I can from within a private corporation, and some examples include the code from our Spectral Inference Networks paper, as well as various useful linear algebra operators and gradients in TensorFlow and JAX. In particular, the matrix exponential operator in TensorFlow was used to make a novel discovery in the theory of supergravity.

Though it hasn't been updated much since I joined DeepMind, you can find my personal GitHub here. Notable projects include a collection of methods for learning state space models for neuroscience data, some of which has been integrated into the pop_spik_dyn package, a Matlab implementation of Learning Recurrent Neural Networks with Hessian-Free Optimization, and the Java implementation of the Probabilistic Deterministic Infinite Automata used our paper. For those interested in probabilistic programming, I have also provided a PDIA implementation in WebChurch.

I also contributed a C++ implementation of Beam Sampling for the Infinite Hidden Markov Model to the Data Microscopes project. At a factor of 40 faster than existing Matlab code, it's likely the fastest beam sampler for the iHMM in the world.


Professional Service

Workshop Organizing

  • Learning Disentangled Representations: from Perception to Control
    Neural Information Processing Systems, Long Beach, CA, December 2017.
    [website]

PhD Students

  • Gino Casella,* Imperial College London, October 2020 -
  • Halvard Sutterud,* Imperial College London, January 2021 -

  • *co-advised with Matthew Foulkes and James Spencer

Thesis Committees

  • Georgios Arvanitidis, Technical University of Denmark, April 2019

  • Janith Petangoda, Imperial College London, July 2022

Reviewing

NeurIPS, ICLR, ICML, IJCAI, AISTATS, UAI, JMLR, Nature, Nature Communications

Area Chair

NeurIPS (2021, 2022), AISTATS (2023), ICML (2023)

Other Writing

Not everything makes it into a paper, but that doesn't mean it's not important. You can find short notes and other writings that don't have a home elsewhere here.

  • A Generalized Bias-Variance Decomposition for Bregman Divergences
    [note]   [tl;dr]  

    A simple result that I hadn't seen published elsewhere. Other research on generalized bias-variance decompositions historically has focused on 0-1 loss and is relevant to classificiation and boosting. In probabilistic modeling, error is measured through log probabilities instead of classification accuracy, often with distributions in the exponential family. Exponential family likelihoods and Bregman divergences are closely related, and it turns out it's straightforward to generalize the bias-variance decomposition for squared error to all Bregman divergences.

    Several years after writing this note, Frank Nielsen pointed me to "Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications" by Buja, Stuetzle and Shen. Right there in Section 21 is essentially the same derivation. However, it's still a fairly niche result, and I haven't seen a clean, standalone derivation before, so I hope these notes are helpful.

  • On Slow Research
    [story]   [tl;dr]  

    A short essay about the process behind writing the paper "Disentangling by Subspace Diffusion", containing my own thoughts on the research process and giving some insight into just how long and arduous the process of going from idea to paper can be.