Unfolding the Protein Folding Solution

Amidst the tumult of 2020, in which the world often seemed to stand still (and worse, at times to regress), the march of scientific progress pushed ever onward. First and foremost, we saw a new standard set for the development, testing, and validation of vaccines, culminating in the approval of multiple COVID-19 vaccines. At the Large Hadron Collider (LHC), we saw the first evidence of a new type of particle. And SpaceX’s Crew Dragon became the first private vehicle to carry astronauts to the International Space Station (ISS).

In November, we witnessed another remarkable scientific feat, as Google DeepMind’s AlphaFold2 “solved” the protein folding problem – an achievement with far-reaching implications for biology and medicine. This feat absolutely deserves to be celebrated. But it is also important to recognize the potential limitations of their approach, and to retain a healthy dose of skepticism. 

In this blog post, I plan to describe the protein folding “problem”, and then to explain why I believe it is best to exercise caution, rather than to immediately regard AlphaFold’s performance as a “solution” to this problem.

First however, I want to acknowledge that I am a physicist, not a biologist. Make of that what you will. I also want to disclose that last year I interned Google X, formerly known as The Moonshot Factory. The opinions I espouse here are entirely my own.

What is a Protein?

To most people, proteins mainly connote biology. Many – like myself – remember learning about proteins as biological molecules, or biomolecules, which have distinct biological functions. In reality, proteins sit at a unique intersection of biology, chemistry, and physics. This makes them fascinating objects of study, but also makes them particularly unyielding to established scientific methods. 

At a basic level, proteins are chains made from amino acids. The amino acids serve as the building blocks for proteins, in much the same way as letters in an alphabet can be strung together to form words. Just as the order of the letters in a word affects the meaning of the sequence, so too does the order of amino acids in the chain affect the biology of the resulting protein.

Unlike words however, the protein chains exist natively in the physical world. When we write a word on the page, the space between letters is fixed. The previous letters in the word don’t dictate how much space we should leave before the next letter. 

For proteins, space matters. Chemically, the amino acids are strung together via covalent bonds, where electron pairs are shared between both parties. Going a level deeper, the amino acids themselves are organic compounds made up of atoms, and are as a result substantially influenced by chemical and physical forces. These forces constantly push and pull the constituents in different directions, driving a series of twists and turns in three dimensional space as the protein moves toward a stable configuration, or conformation. This intricate dance is the process of protein folding. The protein gradually moves from less stable, higher energy configurations to more stable, lower energy states, so the folding is called spontaneous.

And here’s the thing: the protein’s function depends directly on this conformation. In other words, identifying a protein’s stable shape is crucial to understanding the roles it plays in biology.

The Protein Folding Problem

One of the most remarkable things about protein folding is that for a given chain, many distinct paths – each with their own twists and turns – can lead to the same final shape. The intermittent configurations can at times seem completely random; and yet the result is somehow predestined. Observations of this kind led Nobel laureate Christian Anfinsen to postulate that a protein’s structure is entirely determined by its sequence of amino acids. This hypothesis, known as Anfinsen’s dogma, essentially defines the protein folding problem: to predict a protein’s shape (and consequently its function) given only the protein’s sequence of amino acids.

Solving this problem has been an outstanding challenge for half a century, evading the tools of biology, of chemistry, and of physics. 

Physically, the problem is typically framed in terms of minimizing the energy of the collection of atoms and molecules in the protein chain. Despite their success in areas such as biophysics and drug design, techniques like molecular dynamics, which are based in classical mechanics, fall spectacularly short. And the proteins, often consisting of hundreds or even thousands of amino acids, are far too large to be treated quantum mechanically. Some physical models for the problem, which treat the protein chain as randomly choosing junctures at which to fold (so long as the chain doesn’t fold in on itself) lead to the conclusion that the problem is NP-Hard: a fancy way of saying that solving the general case is VERY HARD.

Typically when a problem gets too large in scale to be succinctly stated in the language of one theory, another theory emerges, and with it comes a more suitable language. We need not analyze the quantum mechanical wave function of every proton, neutron and electron to understand that noble gases are stable because their valence shells of electrons are full. And we need not look at every chemical bond in a cell to understand that the mitochondria is the powerhouse. To quote Nobel laureate Phil Anderson in his essay More is Different, “The constructionist hypothesis breaks down when confronted with the twin difficulties of scale and complexity…at each level of complexity entirely new properties appear”.

In the case of protein folding, progress has indeed been made toward finding a more suitable language. In fact, there is a general structural hierarchy within the folded proteins; the primary structure is comprised of the amino acid sequence; in the secondary structure the amino acids form stable patterns of helices and sheets; in the tertiary structure these helices and sheets are then folded into further formations; finally, the quaternary structure captures the interplay between multiple chains; biologists have even identified structural motifs, or three-dimensional structureswhich frequently appear as segments within folded proteins.

The frustrating thing about the protein folding problem is that we are fairly positive such a language should exist. Why? Because nature solves the problem all the time. Typical proteins fold in seconds or minutes; some fold on the scale of microseconds. Yet our theoretical models for protein folding – models based in physics – tell us that it should take proteins astronomically long times to fold. Even under the most lenient assumptions, the predicted timescales are longer than the Universe is old! This apparent discrepancy between the complexity of modeling protein folding on one hand, and the ease with which proteins actually fold on the other, is known as Levinthal’s paradox.

The Test

Every two years, the worldwide protein folding community comes together to assess the state of progress in the field. More than one hundred research groups from around the globe come armed with their newest and most sophisticated algorithms for predicting the structure of proteins. These algorithms are then evaluated on a set of roughly 100 never before measured proteins. 

Adding to the challenge, the competitors (the different research groups) are not told anything about the proteins prior to the assessment. In this way, the biennial test, known as the Community Assessment of protein Structure Prediction (CASP), is designed so as to test protein structure prediction solely on the basis of amino acid sequence. In other words, CASP is designed so that ‘solution’ implies solving the protein folding problem.

Given the vast space of possible ‘predictions’ for each protein, CASP evaluates the quality of a prediction, or how closely it approximates the actual measured protein, on a variety of metrics. The primary evaluation metric, the global distance test (GDT) involves comparing the actual and predicted positions of molecules known as alpha-carbons, which tag the approximate locations of the amino acids. In essence, this is a way of quantifying how well the measured and predicted proteins overlap in three-dimensional space, with GDT scores of 0 implying no overlap, and 100 signifying perfect overlap. 

However, the experimental techniques used to measure the actual proteins are not perfect. This means that after a certain point, it isn’t clear whether the predicted or measured protein is more accurate; which one is closer to ground truth. As a result, a score above 90 on the GDT is generally regarded as a ‘solution’. 

From the inception of CASP in 1994 up through 2016 (CASP12, the 12th competition), there had not been substantial improvement in performance on the GDT. In the intervening years, understanding of proteins and protein folding had absolutely matured. But the results had not materialized in three-dimensional structure prediction. From 2006 and 2016 for instance, the median GDT on the subset of test proteins in the free-modeling category for the best performing algorithm remained above 30 and below 42 every year. Up through this point, no machine learning based approach had even come close to threatening the state of the art. 

The ‘Solution’

Enter DeepMind, on a mission to radically challenge our deepest held beliefs about the power of AI. In 2017, DeepMind shocked the world when its artificial intelligence AlphaGo demonstrated mastery over the game of Go, convincingly beating the reigning world champion.  Fresh off of its game-playing triumph, DeepMind unabashedly set its sights on protein-folding.

In 2018, participating in CASP for the first time, DeepMind’s AI system AlphaFold handily beat the competition, scoring a median GDT of close to 60 on the free-modeling category, regarded as the most challenging category. This was properly recognized as a tremendous leap forward on the protein folding problem, albeit far from a solution. Already, AlphaFold had convinced many that machine learning could potentially be useful not just in games, but in pure scientific research. Indeed, AlphaFold was so convincing that about half of the entrants for the 2020 CASP competition used deep learning in their approaches.

Determined to build on this initial success, DeepMind went back to the drawing board and returned to CASP in 2020 with a new and improved AI; AlphaFold2. Once again, DeepMind shocked the world by shattering its own records and achieving a median GDT of 87 on the free-modeling category – and 92.4 GDT overall. On average, AlphaFold2’s predictions were within a single-atom’s width of the actual measurements. 

Almost immediately, AlphaFold2 was hailed as a ‘solution’ to the protein folding problem. Because AlphaFold2 is an improved version of AlphaFold, we’ll refer to the AI system without the number ‘2’.

DeepMind’s own blog claimed “AlphaFold: a solution to a 50-year old grand challenge in biology”. News outlets followed suit, with sources including Science Magazine, Vox, CNBC, and MIT Tech Review using some variant of the word “solved” in their coverage, and sentiment to match.


With AlphaFold, like AlphaGo before it, DeepMind is forcing us to reimagine what artificial intelligence is capable of. This in and of itself is remarkable. That AI will likely play an integral role in the future of medical research and drug discovery is worth further celebrating. DeepMind deserves ample credit for these achievements. 

That being said, it is far too early to claim they have ‘solved’ the protein folding problem. I believe we should remain skeptical because of the relationship between machine learning on the one hand, and generalizability and interpretability on the other. These problems are not unique to AlphaFold. Rather, they are philosophical qualms with using machine learning to ‘solve’ scientific problems in the way that AlphaFold attempts to do.

The application of machine learning to scientific research is not new. At the Large Hadron Collider (LHC) at CERN for instance, machine learning was used to find the Higgs boson back in 2012. The difference lies in how machine learning is employed. 

At CERN, machine learning was used to facilitate the comparison of our scientific theories – in this case the standard model of particle physics – and experimental data. Physicists already had a theory for the ways in which elementary particles interact with each other; they set out to test that theory by colliding fast-moving particles together, and comparing post-collision measurements with the outputs of their theoretical models. The problem was that even on powerful computers, their model took a long time to generate predictions. Machine learning helped them to more quickly generate synthetic data to compare with experiments. Machine learning did not replace the physics-based model; it helped test the model.

With AlphaFold, DeepMind is effectively attempting to replacephysical and biological models of protein folding with a machine learning model. Yes, AlphaFold performed far better than any previous models. But to what extent can we actually trust AlphaFold’s predictions on new proteins? In other words, how well does AlphaFold generalize? 

Well, we don’t really know. Of the millions of proteins we have already found, AlphaFold was trained on the tiny fraction whose structures have been measured. The test set was even smaller still. Even if AlphaFold had perfectly predicted every test protein – which it didn’t – I’d still bet on nature’s ingenuity. 

Of course, any theory, when faced with new observations, must wrestle with the same questions. If theory and observations disagree, the theory must be modified or replaced entirely. But with machine learning models, where the assumptions are hidden, it often isn’t clear where the model is breaking down.


By playing against AlphaGo time after time, researchers have begun to gain insights into how the AI “thinks”. And human Go players have taken inspiration from AlphaGo in their own strategies. In just a few years, the AI has already given us tremendous insights into the game of Go, strategy more generally, and what it means to be creative.

“Solving” a scientific theory is a far higher bar than besting the best human at a game. 

It’s quite possible that artificial intelligence helps us to achieve this goal; to find the right language. But we need to work on extracting insights from our machine learning models, and interpreting the models we build.

Even if AlphaFold never improves beyond its current state, it will still prove useful in medical research; at the bare minimum, it will allow biologists to take coarser measurements in the lab (reducing time and money spent), and use AlphaFold to iron out the fine structure. More optimistically, we can envision a future in which humans with with AlphaFold to discover the rules of protein folding.

AlphaFold is not a solution to the protein folding problem, but it is absolutely a breakthrough. Any machine learning based approach to science will need to address practical and philosophical challenges. For now, we should appreciate DeepMind’s colossal step forward, and we should prepare for unprecedented progress in the near future. This is only the beginning.

Symmetry Redux: Conservation Laws


One of the first topics I ever posted about here on Silhouette of Science is Symmetry. I was fascinated by the manifold ways, both obvious and subtle, that Symmetry pervades our world. In that post I touched upon the presence of Symmetry in processes biological, chemical and physical, and the striking absence of Symmetry in a select few instances. I appealed to cellular automata as an example of simplicity spawning complexity. And lastly I yearned for some deeper symmetry, or a more fundamental rule from which the apparent breaking of symmetries could be explained.

Continue reading “Symmetry Redux: Conservation Laws”

The Physics-Metaphysics Gap

“the difference between physics and metaphysics is not that the practitioners of one are smarter than the practitioners of the other. The difference is that the metaphysicist has no laboratory.”

– Carl Sagan

Physics and philosophy are like cosmic yin and yang, complementary and complete, our universe their union. They exist in distinct spaces; physics being the realm of empirical inquiry, metaphysics being that which is not. Metaphysics takes over where physics leaves off. Or so it would seem. Nature, it turns out, might not be so black and white. 

The grey twilight of reality, which I will call the physics-metaphysics gap, might remain murky, forever evading our understanding. What exactly is this gap, and how does it come about?

Putting the Meta in Metaphysics

Language is fickle; richly expressive and inherently beleaguered with misunderstanding and miscommunication. The mere use of language changes sentiments; context and history, homophone and homonym, imbue words with new meaning. Accidental connotations accrue, and with them our appreciation for the original usages. The linguistic goal of clarity is hopelessly Sisyphean. The word ‘metaphysics’ is susceptible to the same unstoppable forces. And as its meaning changes, so too does its domain of discourse.

Yet this property of language is just as much a feature as a bug. By leveraging linguistic developments, we can dig more deeply into what the original meaning less convincingly penetrates. 

Physics”, deriving from the Greek ta physika, translates to “the nature of things”. The use of “meta”, meaning “beyond” or “after”, is delightfully deceptive in its etymological origin! Indeed, “metaphysics” first appeared as the title of a sequence of Aristotle’s works published after his treatise on physics. Ironically, at least from a modern point of view, the “meta” in metaphysics was meant literally!

The same modern perch, however, implores a more contextual interpretation. Only considering the title’s explicit semantics severely disservices Aristotle, Andronicus of Rhodes – the editor of Metaphysics credited with its naming – and the influence Aristotle’s works have had on shaping our society. Aristotle’s works on metaphysics were not haphazardly placed after his works on physics. Aristotle regarded physics as the study of change, and “first philosophy”, the subject matter of his metaphysics, the study of that which persists through change.

Quite conscientiously it seems, he believed that one must understand the change beforebeginning to grasp the constant. In this sense at least, metaphysics at its core was meant to represent something beyond physics.

Remarkably, even as the mechanics of our physical theories has evolved, as has the meaning of the word physics itself, much of Aristotle’s original distinction remains. “Physics” still refers to the study of the natural world, but it has also come to represent our investigation of the physical world through the scientific method. Prediction and probe, hypothesis and revision are the central tenets of modern science.

This science is intimately connected to change. Newton’s laws relate the changes in motion of objects to the influence of forces. Schrödinger’s equation describes evolution of quantum states in time. Even Einstein’s famous equivalence between mass and energy is about the conversion of matter into energy. The equations of physics do not make philosophical claims about what objects, quantum states, or matter and energy are. Of course, it’s not always easy to separate the physics from the philosophy. Metaphysical assertions about the nature of space, time and existence can affect the way physicists test hypotheses. In turn, mathematical structures found to underlie physical constructs can force new philosophical interpretations.

Leaning into the linguistic conflation of physics and the science of physics, perhaps a modern metaphysics should concern itself with what is truly beyondphysics as a science. What are the limitations on what we can learn about our universe through physics?

This new line of thought aligns quite nicely with twentieth century developments in the usage of  “meta”, where metalanguage arose as the study of language, and metamathematics as the mathematical study of mathematics itself. For instance, whereas logic considers deductions from a logical theory, metalogic studies the truths derivable about logic systems themselves.

In the early twentieth century, mathematician David Hilbert set forth the goal of putting mathematics on a firm foundation by finding a finite set of axioms from which all known mathematical results could be proven. This movement, known as Hilbert’s program, was met with considerable optimism as the math community came together to achieve this grand goal.

The result was a crowning achievement indeed, but for metalogic rather than mathematics. In his eponymous incompleteness theorems, Kurt Gödel showed that any consistent set of axioms is incapable of proving all arithmetic truths about the natural numbers – and even more drastically, incapable of proving its own consistency. In so doing, he demonstrated once and for all the limitations of logic.

The Collider and the Computer

A mathematician and friend once told me in jest that “mathematics is about the math, but physics is about the physicist”. His sentiment rings astoundingly true. In the popular imagination, the physicist is an eccentric genius. Stories surround these physics folk heroes: Newton poking needles in his own eyes; crazy-haired Einstein using never-to-be-cashed checks as bookmarks; Feynman playing the bongos and picking locks. Yet there’s always a strand, implicit in this lore, connecting the quirky and unconventional with inspired creativity.

We’re made to believe that eventually the Theory of Everything will come to light, illuminated by another idiosyncratic individual who saw the world just a little bit differently. Creativity is the answer; mathematics the language.

But is creativity really the answer? If the truths derivable via logic are limited, then surely our ability to ascertain truths about the physical world is as well. As clever as humans are, our ability to explore the physics of our universe is still bound by the physics of our universe! 

The culprit is the empiricism baked into the scientific method – the same empiricism that delineates physical and metaphysical inquiry. Falsifiability, first espoused by philosopher of science Karl Popper, is one of the guiding principles of modern science. It says that any scientific hypothesis that cannot be falsified is not scientific, plain and simple. 

This brings us to the physics-metaphysics gap: physical phenomena governing our universe that are not falsifiable by the science that we call physics. In our universe, the small fall within this gap.

According to our current understanding of physics, all matter is made out of particles or particle-like objects which pop in and out of existence. The more fundamental a particle, the smaller it is and the more energy required to create it. We directly test particle physics theories via collisions predicted to create the hypothesized particles. 

The history of particle physics has been one of scattering and collisions with ever-increasing energies. To “demonstrate” the existence of the Higgs and W and Z Bosons for instance, required the construction of a 17 mile long collider (the Large Hadron Collider), and perhaps the most tremendous global scientific collaboration to date.

But this strategy is inherently limited by the energy-length scale relation: if we wanted to directly probe potential ‘stringy’ behavior of quantum gravity, we’d need a collider the size of our galaxy! If we somehow managed to build such a collider and the results led us to hypothesize that strings were made of even smaller objects, then the corresponding collider would potentially need to be larger than our universe itself!

Just as astoundingly, we are also fundamentally limited in our ability to even ascertain predictions of our physical theories. In quantum chromodynamics, an attempt to unite quantum mechanics and the strong nuclear force, predictions cannot be made analytically. The issue turns out to be eerily similar to that which plagues our experimental probing capabilities – a problem of large energies. In this case, representing space or time as a continuum leads to infinite energies, and the theory is not mathematically well defined. Instead, spacetime is approximated as a discretized lattice. 

This approach is known as Lattice QCD, is well defined. However, to have bearing on the physical world, the lattice spacing must be taken to 0, approaching the continuum limit. As the lattice spacing becomes smaller, the requisite compute quickly increases. Even determining the precise predictions of QCD entails extrapolation and often multiple approximations, combined with world-class supercomputers. 

QCD doesn’t even account for gravity. Classically computing predictions from any theory of quantum gravity would likely be even more intensive, and would quickly become intractable!

Whether there is some ultimate Theory of Everything or it is ‘turtles all the way down’, there’s a point beyond which we will neither be able to directly probe the physics of our universe, and beyond which we will not even be able to determine the predictions of our physical theories about our universe.

One natural corollary is in order: we hold string theory in a certain regard because its predictions are not falsifiable. However, what we fail to acknowledge – at least explicitly – is that this problem is not inherent to string theory. Rather, it is a feature of all theories at those length scales.

As crazy as this seems, it’s maybe even crazier to think that it didn’t have to be this way. Philosophically, there’s a distinction between necessary truths, which are true in every possible world, and contingent truths, which just happen to be true in our world. Whereas the incompleteness of logic is necessary, the existence of a physics-metaphysics gap is contingent on the particular physics of our universe.

If the basic laws governing our physics were different, it is quite possible that we could directly probe the fundamental elements of or universe through the scientific method. There’s no reason a priori that energy had to be related to length scale, or that the length scales of the ‘smallest’ physical objects had to be so damn small. By similar logic, our physical theories could have given analytic – or at least computationally tractable – predictions. More generally, there’s no reason energy, space or time had to be relevant concepts in describing the natural world.

The Path Forward

So does all of this mean that our efforts are in vain? Should we abandon all hope?

Maybe we just need to go back to the drawing board. Metaphysically, maybe by slightly modifying our conceptions of space and time or by moving away from a particle-centric view of the universe, we can narrow or close the physics-metaphysics gap. Physically, perhaps we can find other, indirect methods of probing.

At a Quantum Gravity in the Lab workshop at Google X I attended a few months ago, theorists and experimentalists from around the world came together to initiate the practice of a new sub-discipline in physics. The central thesis was that rather than directly probe the small-scale phenomena of quantum gravity, we can use the quantum nature of quantum computers to simulate emergent quantum gravity-like behavior. 

Matter in our universe must obey the physical laws of our universe. By manipulating matter in such a way that it is explainable in terms of emergent space-time dimensions, we can indirectly learn about the emergent space-time structure of our own universe. Such a strategy blurs the boundary between predicting and probing. And this might be exactly what we need.

The workshop’s principle paper (the impetus for the workshop itself), Quantum Gravity in the Lab: Teleportation by Size and Traversable Wormholes, proposes a table-top experiment to ‘test’ teleportation. If we use a quantum computer to chaotically scramble an input message containing quantum entanglement, the initially surprising result is that at some point in time after the message decoheres, it actually comes back into focus.

As the authors argue, this can be understood in the context of quantum gravity as if the initial state consists of two entangled black holes connected by a wormhole. In this picture, the decoherence and re-coherence of the message is due to its ‘teleportation’ through the wormhole!

What justifies this picture? And what do we gain by taking this perspective? It turns out that this engages essentially with one of the premier conjectures of theories of quantum gravity: 


This pithy proposed equivalence states in broad strokes that quantum entanglement is intimately related to the structure of space-time. ER stands for Einstein-Rosen Bridges, colloquially known as wormholes, and EPR (Einstein-Podolski-Rosen) refers to a pair of entangled particles. The conjecture then states that entangled particles are connected via wormholes.

By testing the refocusing of an entangled message, this experiment proposes to indirectly probe conjectures about theories of quantum gravity. If successful, such an experiment would use (emergent) Einstein-Rosen Bridges to help bridge the physics-metaphysics gap!

Whether or not this particular experiment proves useful, the larger lesson stands: we are going to have to be more creative in how we test physical theories. Indirect probing provides less convincing evidence and is beset by many philosophical difficulties, prime among them the ‘theory-ladenness of observation’. It might be our only hope. Is it enough?

What’s in a Universe

Our Universe is vast – almost unfathomably so. Temporally, we believe the age of the Universe to be roughly 13.8 billion years. Spatially, the observable universe is a sphere 93 billion light-years across. What’s beyond this sphere is a mystery: the observable universe could be all there is, or it could be a tiny fraction of something that extends infinitely in all directions. Materially, the observable universe is thought to have the mass-equivalent of   Hydrogen atoms. And this regular matter pales in proportion to both dark matter and dark energy. 

With all of that space and time and stuff to fill them, the possibilities appear functionally endless. Suffice to say there’s a lot for nature to work with, and we can at least to some extent wrap our heads around the fact that from such possibilities could arise chemical and biological complexity. Once the organism is born, Evolution admits teeming diversity of life. 

This picture is uncomplicated by the relative paucity of physical building blocks. To our knowledge there are only a handful of elementary particles, organized into what physicists call the Standard Model. This model, which describes three of the four fundamental forces (Electromagnetism, Strong and Weak Nuclear), boils everything down to twelve Fermions (massive particles that obey the Pauli Exclusion Principle), a few gauge Bosons (massless particles that mediate the fundamental forces), and the Higgs Boson. Each of these particles has its own set of distinguishing properties.

Emergence is the idea that the whole may behave differently than the collection of its parts. As the ever wise physicist P.W. Anderson put it, “More is different”. We learn to accept this premise early on in our education. In Chemistry, the primitive objects are physical: particles like electrons, protons and neutrons, the latter two of which are each composed of three quarks. From these particles derive a panoply of elements, each with its own chemical properties. In Biology the basic object of study is DNA, which contains the instructions for life and is responsible for its diversity therein. DNA is constructed from chains of nucleotides – themselves chemical compounds. Perhaps the most astounding thing about life is that all its variants are generated by different sequences of the four (!) nucleotides that comprise DNA. 

Examples like these make us comfortable with the fact that great complexity can arise from immense simplicity. But they implicitly give us the impression that complexity is a hierarchy, in which primitives at one level result in complex behavior at the next level. By this logic, Chemistry is infinitely more complicated than Physics, and biological phenomena infinitely more so than chemical. On this view, the plenitude of elementary particles gives us little in the way of additional physical behavior;   Hydrogen atoms collectively behave just like   individual Hydrogen atoms, and all we have to work with are the defining properties of particles in the Standard Model.

This perspective is patently false, and in reality our Universe can in fact host quite exotic physical phenomena. To understand the sense in which these exotic phenomena exist, we need to define physical diversity. And more pressingly, we need to elaborate on the world that entertains this existence, i.e. the Universe.

Deriving from latin unus (one) and versus (transform), the word universe literally translates as “turned into one”. In the broadest sense, universe refers to “the totality of existing things”. The concept naturally arises in physics, where the physical Universe consists of all space, time, energy and matter. Thus, there is only one physical universe. Distinct from this physical Universe, the concept of a universe exists in mathematics, where it refers to “the collection of all objects one wishes to consider”. Unlike physics, mathematics is constructive. This means that infinitely many mathematical universes can exist.

This constructive freedom is part of what makes mathematics so useful in the description of physics. In our attempts to make sense of the world, physicists devise physical theories to describe certain aspects of the Universe. For our purposes, such physical theories consist of:

1. An ontology (the objects, i.e. what exists) 
2. A universe (where they exist) 
3. Rules for manipulating the objects (physical laws)

In so doing, they associate to the physical Universe a mathematical universe. 

A quick clarification is in order: First of all, not all sets constitute valid mathematical universes. Take for example the set in Russell’s paradox:  “The set of all sets that do not contain themself”, which quickly leads to logical contradiction. Furthermore, many mathematical universes bear no relation to the physical world. One might imagine naïvely that surely some mathematical universe would suffice to successfully represent our physical Universe, but it would likely be highly complex. Yet remarkably, much of physical reality has been faithfully represented by tremendously simple mathematical structures.

This simplicity of correspondence is simultaneously deeply profound and mysterious. To conclude his article The Unreasonable Effectiveness of Mathematics in the Natural Sciences, Nobel Laureate Eugene Wigner writes, 

“The miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics is a wonderful gift which we neither understand nor deserve. We should be grateful for it and hope that it… will extend, for better or for worse, to our pleasure.”

– Eugene Wigner

Some have taken this connection so far as to speculate that our external physical reality is a mathematical structure – a conjecture referred to as the Mathematical Universe Hypothesis (MUH). Seeing as we are interested in characterizing physical diversity, we will only consider the mathematical structures of classical physics and Quantum Mechanics. By understanding what is possible in the mathematical universes of these theories (which are both of course approximations), we will lower bound the physical diversity of our Universe. 

In Classical Mechanics, most physical properties can be satisfactorily represented by a single number at each moment in time. For instance, we live in three dimensions, so the position of a single object is given by three real-valued numbers – one for each dimension. The same is true of velocity, momentum, and acceleration. These values can take on any real value, and are allowed to change continuously. When multiple rigid bodies (classical objects) are considered, they can interact with each other, as they do via gravitational attraction or electromagnetic Coulomb-repulsion. The specific values attained for certain properties of the objects can now be inter-dependent in time, but importantly the complexity of the mathematical structure only grows linearly in the number of objects. This means that the mathematical structure is just as constrained in the case of   completely disjoint, non-interacting particles as in the case of   strongly-interacting particles.

Quantum Mechanics provides a more flexible mathematical framework by abstracting the state of a system away from the particular measured values of observable quantities. Rather, the state of a system is a vector in a complex-valued space called Hilbert space. This freedom allows for representing possibilities like superposition, in which a quantum system is partially occupying multiple abstract states at once. Contrasting with the classical case, the mathematical structure grows exponentially with the number of inter-dependent (entangled) particles.

When comparing the mathematical universes of classical and quantum physics, we see that the quantum framework provides much greater flexibility and freedom – especially in the case of many strongly interacting particles.

Naturally, the following concern arises: It is clear that the mathematical universe of classical physics does not capture all of the physics we observe at small scales, and that the universe of quantum mechanics is sufficient for this purpose. However, it is not immediately obvious that all of the added freedom is necessary. In other words, just because we posit a mathematical universe doesn’t mean that real-world physics can reach all possible points in that universe. 

So how much of the universe can quantum states fill? In theory the rules for manipulating quantum states (our allowed operations) give us the power to densely fill the mathematical universe. And as Nobel Laureate Frank Wilczek puts it,

“The spontaneous activity of quantum systems explores all consistent possibilities… Nature, in her abundance, provides materials to embody all theoretically consistent possibilities.”

– Frank Wilczek

In practice there are many complications, including the fragility of quantum states, short coherence times, and challenges of performing many-body operations. That being said, even just the ground states of quantum systems, which tend to lie in a minuscule ‘corner’ of Hilbert space, contain interesting physics that goes beyond classical possibilities.

But what exactly does that additional mathematical structure provide in the way of physical diversity? What is physically possible? Even attempting here to give a complete account of the exotic physics would be a fool’s errand. So let’s take for granted that the values of the fundamental constants are fixed, and that space-time is flat. One reasonable way of characterizing the complexity of potential physical behavior is through classifying the ways in which particles can act collectively. In particular, it is instructive to look at the different phases of matter, or the ways systems can order, in the sense that particles in the same chunk of material exhibit correlations with other particles in the same chunk of material. These correlations can be in spin, charge, position, or something else entirely.

Classically, the only phase of matter are solid, liquid, gas, and plasma. The first three pertain to positional constituent correlations. The latter is an ionized gas. Classical models of ferromagnetic and paramagnetic ordering exist, although these are really quantum phenomena. 

It is also informative to look at what happens when matter in these phases is given a slight nudge, or perturbed. When a solid is pushed, the particles on the surface are displaced inward, bringing them closer to their neighbors in the lattice. The extra force felt on the second plane of particles effectively pushes them further in that direction. This in turn brings them closer to the next plane of particles. In this manner, the perturbation makes its way through the chunk of material until it reaches the opposite side, where it reflects and propagates in the reverse direction. The particles vibrate in synchrony, and the excitation resulting from slightly perturbing the solid has a collective wave-like nature. In liquids and gases, the relative lack of positional order leads to more complicated perturbation dynamics, including chaotic phenomena like turbulent flow. Yet even still, the relative scope of classical possibilities is quite limited.

Quantum mechanics makes things much more interesting: strong correlations between elementary particles can lead to exotic new phases of matter with drastically different behavior. 

In superconductors for example, electrons (elementary fermions) pair up and form bosons called Cooper pairs, which no longer obey the Pauli Exclusion Principle, subsequently allowing the electrons (still paired) to flow without resistance. 

In one-dimensional electron gases, the phenomenon of spin-charge separation occurs: while the elementary particles (electrons) have both spin and charge, the low-energy excitations are found to carry either spin or charge, but not both. 

Perhaps even stranger is the Fractional Quantum Hall Effect (FQHE), in which the conductance plateaus in fractions of the original electron charge. Another way of phrasing the difference between bosons and fermions is in term os their exchange statistics. All particles of the same type, e.g. all electrons, are identical. However, when two electrons are interchanged, their combined quantum state changes, and it only reverts to its original state when they are swapped back. For bosons, the combined quantum state is oblivious to swaps. In other words, it takes two fermion swaps but only a single boson swap to recoup a combined quantum state. The FQHE gives rise to effective particles that have fractional charge in units of the fundamental electron charge, and exhibit non-fermionic, non-bosonic exchange statistics: The number of swaps that leaves the quantum state invariant can in principle be anything! For this reason, the new particles are called anyons.

These examples typify the concept of the quasiparticle in condensed matter – an effective particle that describes the emergent behavior of microscopically complicated systems. Not all quasiparticles provide tremendous insight beyond classical physics: According to the wave-particle duality of quantum theory, the wave-like collective vibrations of a lattice also take on characteristics of a particle – a quasiparticle known as a phonon. However, as collective modes, quasiparticles can exhibit physical phenomena distinct from that of the elementary particles. 

Practically, quasiparticles might be incredibly difficult to harness as building blocks, and ontologically they are not as fundamental as the elementary particles. But within the hierarchy of the sciences, elementary and quasi-particles sit on the same level. More is different indeed. 

Just how different can the allowed phenomena be? Classical and quantum theory are both approximations to physical reality. The mathematical universe required to capture the physics of our Universe must be larger than that of quantum theory, whether that freedom lies in six hidden spatial dimensions or somewhere else entirely. Surely that extra ‘space’ will give additional physics. Yet in one deep sense, the possibilities are quite limited.

Universality posits that certain qualities are shared by all entities – a concept which is ubiquitous in philosophy, religion, and science. A universal computer, for example, is capable of simulating any other computer efficiently. In physics, universality manifests in the study of phase transitions. Second order phase transitions entail systems continuously transitioning from order to disorder as one parameter or degree of freedom is varied. 

The classical 2d Ising model for instance – a grid of ‘spins’ (each of which can be either up or down) interacting with nearest neighbors – exhibits a ferromagnet-to-paramagnet as a function of temperature. At zero temperature, the state with all spins aligned (all up, or all down) is infinitely more likely than all other possible states. As we increase temperature, states with all but a few spins aligned become more likely. This means that the probability of fluctuations, or deviations from the ground state. Increasing the temperature further, fluctuations begin to occur on larger and larger length-scales – a block of   consecutive overturned spins becomes likely for larger and larger  . In the disordered (paramagnetic) phase, all configurations are equally favorable. Approaching the phase transition, we observe that fluctuations on all length-scales become equally likely, and the system becomes scale-invariant. 

This phenomenon is in no way unique to the Ising model. In fact, the scale-invariance of fluctuations at a phase transition point is universal! Moreover, regardless of the underlying order or the variable parameter, the behavior of these fluctuations is characterized by one of a few discrete classes, called universality classes

One might think that the emergence of such restrictive behavior results from the simplicity of the mathematical universe. I emphasize that universality of these phase transitions is physical in origin, and that the meager number of universality classes is a direct consequence of scale invariance, which is a physical symmetry constraint. The set of universality classes does not grow when we expand from the classical to the quantum;  in fact, the universality class of any   – dimensional quantum system is precisely that of the   – dimensional classical system. String theory or additional physical dimensions would not expand this set.

What do we learn from all of this? By considering phases of matter as a form of physical diversity, we see that the mathematical universe plays a substantial role in enabling complexity, and more mathematically flexible physical theories might inform us of even broader diversity. Universal qualities of the physical world, on the other hand, fundamentally limit the potential for variety in physical behavior.

I conclude by revisiting Anderson:

“the ability to reduce everything to simple fundamental laws does not imply the ability to start from those laws and reconstruct the universe.”

P.W. Anderson

Phases of matter are not the only form of physical diversity. Who knows where it may appear.

Diagrams and Deep Neural Nets: Abstraction in Science

Famed Abstract Expressionist Arshile Gorky once wrote, “Abstraction allows man to see with his mind what he cannot physically see with his eyes… Abstract art enables the artist to perceive beyond the tangible, to extract the infinite out of the finite. It is the emancipation of the mind. It is an explosion into unknown areas.”

In science as in art, abstraction has always been vital to progress. It is responsible for our mathematics, for many of our scientific discoveries, and for unearthing overlooked connections in old theories, thereby changing our understanding of the world. Historically, the development of new tools for abstraction has led to novel insights, and counterintuitively, to more concrete and quantitatively accurate predictions. Now, deep learning methods are taking this to an entirely new level, uncovering patterns invisible to not only the naked eye but even the machinery of mathematics.

In society, the abstract is all around us in our signs, symbols, and gestures. We use abstractions to organize our knowledge and to express ourselves. As civilization has developed, our communal trove of knowledge has grown exponentially. We develop abstractions for our abstractions, and place an ever higher premium on the ability to think abstractly.


Mathematics provides the perfect showcase for this idea: It’s basically the science of abstraction. The very process of learning mathematics highlights how abstract representations get layered one upon the other until they form a universe of connections.

The first layer — counting — is so simple we might not even think of it as abstraction. But to say “there are five apples” means that we can abstract away the different shapes and sizes that make them distinct objects, and categorize them as the same. We learn that four apples is different than five apples. We eat one and are forced to develop the concepts of addition and subtraction.

We develop numerals, written symbols for the numbers they represent. We create notation for addition (+) and subtraction (-). We integrate the concept of a variable, something that can change. The layers are already stacking up: The variable is an abstraction for a changing numeral, which is an abstraction for a number, which we originally manufactured to count our physical objects. 

As our mathematics becomes more sophisticated, we develop abstractions for owing (having negative of something), zero (the idea of nothingness), and infinity to represent something larger than any counting number – larger than any quantity of apples or anything else we could ever possibly have. The unbounded vastness of infinity perhaps epitomizes the limitlessness of abstraction itself. We define sets of numbers, like the irrationals, which is impossible to make contact with without thinking in abstractions. This process goes on and on. 

Abstraction enters science

Through mathematics, abstraction made its way into science itself. At its core, the scientific method is a cycle of hypothesis, testing, and revision. Thus, scientists have always sought patterns and laws to describe natural phenomena. At the height of the Scientific Revolution, Sir Isaac Newton published Principia (1687), laying the ideological framework for a science rooted in abstraction. 

While Newton’s eponymous laws of motion, and law of universal gravitation were quite accurate at the time (and to this day remarkably describe macroscopic non-relativistic matter), the laws were even more powerful in their statement that the state of a physical object can be represented by mathematical variables. For Newton’s laws, the state of an object was fully captured by its position, velocity and acceleration, all of which are easily measured quantities. However, in different theories the state has since taken on various properties. Furthermore, the abstraction to a state allowed for properties that are not directly measurable – like the phase of a quantum state (only the relative phases between quantum states are measurable) – but which nonetheless have observable consequences. This represented a paradigmatic philosophical shift in the practice of science.

Diagrams are emblematic of abstraction in science. Ubiquitous in science – and especially prominent in physics – they go far beyond pictures or illustrative figures. In nearly every branch of physics, diagrams facilitate the solution of problems by making computations tractable. But even more importantly, they do so by abstracting away physically unimportant details of the system under study and emphasizing one particular feature

In Classical Mechanics, which describes how macroscopic objects like blocks and balls and trains behave, Newton’s Laws formulate the dynamics of such objects in terms of forces, which act on objects and set them in motion. Free Body Diagrams (FBDs) arise as a visual tool for keeping track of the forces acting on an object. In an FBD, forces are represented as lines emanating from (the center of mass of) an object.

As a simple example, consider the setup in figure 1 below: two electrically charged balls, A and B, are hanging (at rest) from strings attached to a rafter. Suppose we want to find the tension in the string attached to ball A. From this picture alone, it is not clear what details are relevant or even if we have all of the necessary information.

Fig. 1: Physical setup: two electrically charged balls of uniform density, A and B, are hanging statically from ropes attached to a rafter. The ropes have the same length, and each makes an angle theta will the vertical.

The Free Body Diagram for ball A gives a much cleaner picture, indicating the relevant aspects of the physics. We can see immediately that there are only three forces acting on A – gravity, a Coulomb repulsion from B, and the tension from the rope. These forces are very different in nature, but they are all treated on equal footing. We can conclude immediately that we do not need to know the length of the rope, or the length or width of the rafter. We don’t even need the angle theta.

Fig. 2: Free body diagram for setup in Fig. 1. Only forces acting on ball A are shown – the tension, electromagnetic, and gravitational. Because the ball is in equilibrium, both the horizontal (x) and vertical (y) components of the net force must cancel.

Furthermore, we need the mass of A but do not need to know the mass of B (because this is an FBD for A, not B), however we do need the electrical charge of both balls and the distance between them. Ball A is in static equilibrium, so by Newton’s Laws, the net force acting on it must be zero.

By abstracting away the nature of the forces, the details of the physical setup, and the other objects present, Free Body Diagrams isolate the ingredients responsible for determining motion, making a seemingly complicated problem feasible.

The history of physics is littered with similar examples of the power of diagrammatic abstractions, such as Minkowski Diagrams in Special Relativity, and Penrose Diagrams in General Relativity, which illuminate the causal structure of spacetime. Perhaps the most prevalent diagram in all of physics is the Feynman Diagram. Feynman Diagrams are such powerful tools that Julian Schwinger, who shared the 1965 Nobel Prize in Physics with Richard Feynman, said they “brought quantum field theory to the masses.” Feynman Diagrams are so popular they have even pervaded pop culture, finding their way into movies and onto shirts and mugs.

The central object of study in quantum electrodynamics (QED) – the study of the interactions between light and matter-  is the scattering matrix. The fundamental processes in quantum field theory are called scattering events – one particle scatters off another and breaks up into multiple (decay),  two particles collide and annihilate each other (pair annihilation), etc.. The scattering matrix provides the relationships between the initial and final states of such a system when particles scatter. It is given by an integral that is often quite difficult or even impossible to calculate directly.

Feynman Diagrams are useful tools for “book-keeping” when calculating the scattering matrix. Richard Feynman recognized that even though the scattering matrix might be hard to calculate directly, the integral could be written as a (possibly infinite) series, where each term in the series could be viewed as a set of particles interacting, representing a different pathway or “channel” for the scattering to occur. Furthermore, each term can be represented by a diagram.

These diagrams are read temporally from left to right, with initial particles entering at the far left (some initial time) and final particles exiting at the far right (after scattering). The diagrams do not contain spatial information. Every line represents a particle, and every vertex an interaction. Implicitly, momentum and charge are conserved at every vertex. Terms that contributed more strongly to the path integral corresponded to simpler – and thus more probable – particle interactions. Feynman rules provide a prescription for manipulating these diagrams, and for calculating their contributions to the scattering matrix, thus expediting the computation of the previously intractable quantity.

Fig. 3: Feynman diagram for electron-positron annihilation. p1 and p2 are the momenta of the electron and positron respectively. The product of the scattering event is a photon (the wavy line). Copied from Schwartz QFT.

Moreover, these diagrams paved the way for new theoretical developments. First, they shed light on the fundamental nature of symmetry. Taking the diagrams at face value, Feynman concluded in 1941 that a particle moving forward in time was indistinguishable from its anti-particle moving backward in time. This became known as the Feynman-Stuckelberg interpretation. 

Second, they provided insight into the role of locality. Just looking at the terms in the scattering matrix as a series, it is not clear which terms will contribute and which will get cancelled out by other terms. Viewing the series diagrammatically, it becomes obvious that there are two types of terms: connected diagrams, in which you can trace a path from any initial particle to any final particle, and disconnected diagrams, in which you cannot. The disconnected diagrams can be decomposed into connected components, and simple manipulations show that these cannot contribute to the final scattering amplitude. This leads to cluster decomposition – a statement of locality that says that experiments well-separated in space cannot influence each other.

Fig. 4: Example of disconnected and connected Feynman diagrams. The disconnected diagrams cannot interfere with the connected diagrams. Copied from Schwartz QFT.

Diagrams will always have a place in science. And the prevalence of these tools speaks to the human capacity for creativity and ingenuity. Each diagram reflects a revelation in which one particular set of features was discovered to be vital and others immaterial. As our understanding of the world develops, however, our theories grow ever more intricate. What if the essential elements of these theories become too subtle to isolate by stroke of genius alone?

Computational Abstraction

To put it bluntly, humans aren’t essential for abstraction. Humans are bound to their physical nature, but the act of abstracting means leaving the physical realm behind. Indeed, many of the technological advances of the past few years have been spurred on by computational abstraction, a process in which computers learn abstract representations of data. At the core of this renaissance is the deep neural network – an algorithm originally conceived to mimic the process of learning in the human brain.

A simplified model of the human brain consists of many connected neurons (a network) that talk (pass information) to each other. Each neuron takes some information in, transforms it, and then transmits an electrical signal via synapse to another neuron. The synapse either fires or doesn’t fire, depending on magnitude of the transformed value.

A neural network functions on the same principles: A set of neurons take input data, transform it, and then pass the new values to another set of neurons, which in turn transform and communicate the modified values. Each set of neurons is called a layer, and the number of layers is the depth of the network. One slight modification from the model of the human brain is the prescription for transmitting electrical signal, known as the activation function. Rather than the binary fire or not fire of genuine synapses, more complicated functions are used. 

Such an algorithm learns through a training process, in which it is given input data which it is asked to transform, and then the estimated output is compared to the true output (the final representation you would like it to learn. Every time the estimated output differs from the desired output, the network updates itself by changing the way it transforms inputs.

Just as the human brain performs abstraction when learning new mathematical concepts or drawing FBDs or Feynman Diagrams, a neural network abstracts away irrelevant details from the training examples when it modifies the transformation it applies to the data. However, whereas in these diagrams, the relevant features were hand-picked, neural networks learn which features are relevant.

On its face, there is no clear advantage to having multiple layers of neurons. In practice increased depth often leads to improved performance. One distinct advantage of deep neural networks is that abstraction occurs at each layer. Throughout the training process, the transformations at each layer are tuned so that the network learns intermediate representations (one for every layer), in addition to a final representation. The deeper the layer, the more abstract the features.

Take one type of neural network used to process images, called a convolutional neural net (CNN). At the highest layers, the filters look like distorted images. In the middle layers, patterns start to emerge. In the lowest layers, the CNN picks out specific textures and then edges. The CNN itself isn’t thinking, but through the process of abstraction it uncovers low-level visual features. 

For instance, let’s say you want to teach a CNN what a human face is. To train the network, you assemble a large, diverse collection of images of human faces, and feed those images through the network one by one. After each step, the CNN adjusts its understanding of faces through a process called backpropagation. If the input face differs from the network’s current understanding of a face, the network changes the way the neurons communicate with each other to try to account for these differences. As more images are passed through the network, its definition of a face becomes increasingly robust. 

Fig. 5: Example of feature representations at different layers in a convolutional neural network (CNN). The input layer takes in images of faces, and deeper layers decompose the faces into more and more abstract elements. Copied from Nathan Lintz’s Indico blog post.

By the end of the training process, the deep neural network has “learned”what a human face is by deconstructing it layer by layer, with deeper layers discovering more fundamental patterns in the data. Then the network can recombine these features in new ways, painting pictures of what it thinks a human face actually is

Deep learning facilitates scientific progress

Deep learning has already found applications in many areas of science. It is being used to model dark matter and galaxy shapes, to identify new physics in collision events at the Large Hadron Collider (LHC), and to advance drug discovery. And these data-oriented approaches have already met with tremendous success in identifying features that people could not find through intuition or genius alone. 

Higgs Detection

One of the first applications of deep learning in physics was in the discovery of the Higgs boson at CERN. The Standard Model of Particle Physics provides a unified description of three of the four fundamental forces: electromagnetic, and weak and strong nuclear interactions. It stipulates the existence of the Higgs boson – a particle that gives mass to the other particles. The Higgs was theorized to have such high energy that, when it is a possible product of a scattering event, its diagrams contribute very minimally to scattering matrix – it is produced with very low probability. 

In order to verify the existence of the Higgs boson, physicists conducted trillions of scattering events in the LHC and set out to demonstrate that the measured and theorized Higgs contributions matched. This required distinguishing events in which Higgs bosons were produced from background events, some of which gave quite similar signatures. 

The primary challenge lay in the quantity of data required to determine the Higgs’ contribution to within acceptable margin of error. At the LHC, particles are collided together at near the speed of light, resulting in billions of scattering events each second. The detectors take millions of measurements for each collision, resulting in the creation of roughly a petabyte of data per second. 

It was unfeasibly under hardware constraints to store the massive amount of data resulting from all collisions necessary for the theorized number of Higgs bosons to be produced. Thus, decisions about which collisions to store, (the ones that are likely to have produced High particles), had to be made on the spot. Therefore, the traditional machinery of quantum field theory was too bulky for this problem. Instead, deep neural nets were trained to take the measurements from the detector as input and classify events as potentially interesting or not. In other words, the networks took in physical attributes from the collision, and abstracted away what makes a collision likely to produce Higgs bosons. This allowed for essentially instantaneous classification. 

Drug Discovery

More recently, deep learning has shown great promise in the quest for novel classes of molecules and materials. Throughout history, entire eras have been defined by the discovery and exploitation of new types of materials, from the bronze age to the iron age to our current silicon age and the blossoming of the semiconductor industry. Since at least 1942, when penicillin was derived from the penicillium fungus and used as an antibiotic, pharmaceuticals have had a similarly society-altering effect on public health.  This has resulted in the quest for compounds that exhibit particular properties of interest, be they medicinal, electronic or otherwise.

The difficulty here is two-fold: first, the space of possible materials (or of possible drugs) is vast, and far too expansive to be searched systematically. Second, the synthesis or a compound from scratch is expensive and time-consuming. 

In order to find a drug that satisfies a particular property, it is necessary to greatly reduce the number of compounds that need to be synthesized. This process of reducing the search space is known as high-throughput screening. Machine learning has been a part of this process for decades, but the quality of the computational sieve required to pick out good candidates lay out of reach – until the increased abstraction and representational power of deep neural networks made many problems in drug discovery tractable.

The road ahead

While abstraction itself does not require a human element, science does. As a tool for abstraction, deep learning relies heavily on practitioners and scientists. Humans must tune the hyperparameters of the network such as the learning rate, which controls how much the transformations at each neuron are updated at each step of the training process. Humans also specify the depth of the network, and the number of neurons in each layer. These choices can be far from obvious. 

Perhaps even more importantly, deep neural networks do not replace previous scientific methods and results, but instead build upon them. At CERN, the neural networks were trained using the results from simulated collision events based upon the physics of the Standard Model, viewed by many as the crowning achievement of theoretical physics thus far. In drug discovery, one of the essential factors impacting performance is the input representation. A priori it is not clear how best to present a molecule as data to a computer, be it a list of constituents and relative positions of atoms, a graph with atoms as vertices and bonds as edges, or something else entirely. It turns out that if scientists use domain knowledge (pertaining to the desired properties), they can generate chemically inspired input encodings that far outperform naïve encodings. 

Deep learning is not a panacea for the problems of science. It will not reveal to us the true nature of our universe, nor will it replace the role of humans in science. Time and again revolutionary thinkers have shifted the paradigm and changed the way we view the world, and the human spirit has strength to prevail against all odds. But by utilizing deep learning as a tool, we can shift the odds in our favor, and in so doing expedite scientific progress.