Diagrams and Deep Neural Nets: Abstraction in Science

Famed Abstract Expressionist Arshile Gorky once wrote, “Abstraction allows man to see with his mind what he cannot physically see with his eyes… Abstract art enables the artist to perceive beyond the tangible, to extract the infinite out of the finite. It is the emancipation of the mind. It is an explosion into unknown areas.”

In science as in art, abstraction has always been vital to progress. It is responsible for our mathematics, for many of our scientific discoveries, and for unearthing overlooked connections in old theories, thereby changing our understanding of the world. Historically, the development of new tools for abstraction has led to novel insights, and counterintuitively, to more concrete and quantitatively accurate predictions. Now, deep learning methods are taking this to an entirely new level, uncovering patterns invisible to not only the naked eye but even the machinery of mathematics.

In society, the abstract is all around us in our signs, symbols, and gestures. We use abstractions to organize our knowledge and to express ourselves. As civilization has developed, our communal trove of knowledge has grown exponentially. We develop abstractions for our abstractions, and place an ever higher premium on the ability to think abstractly.

Mathematics

Mathematics provides the perfect showcase for this idea: It’s basically the science of abstraction. The very process of learning mathematics highlights how abstract representations get layered one upon the other until they form a universe of connections.

The first layer — counting — is so simple we might not even think of it as abstraction. But to say “there are five apples” means that we can abstract away the different shapes and sizes that make them distinct objects, and categorize them as the same. We learn that four apples is different than five apples. We eat one and are forced to develop the concepts of addition and subtraction.

We develop numerals, written symbols for the numbers they represent. We create notation for addition (+) and subtraction (-). We integrate the concept of a variable, something that can change. The layers are already stacking up: The variable is an abstraction for a changing numeral, which is an abstraction for a number, which we originally manufactured to count our physical objects. 

As our mathematics becomes more sophisticated, we develop abstractions for owing (having negative of something), zero (the idea of nothingness), and infinity to represent something larger than any counting number – larger than any quantity of apples or anything else we could ever possibly have. The unbounded vastness of infinity perhaps epitomizes the limitlessness of abstraction itself. We define sets of numbers, like the irrationals, which is impossible to make contact with without thinking in abstractions. This process goes on and on. 

Abstraction enters science

Through mathematics, abstraction made its way into science itself. At its core, the scientific method is a cycle of hypothesis, testing, and revision. Thus, scientists have always sought patterns and laws to describe natural phenomena. At the height of the Scientific Revolution, Sir Isaac Newton published Principia (1687), laying the ideological framework for a science rooted in abstraction. 

While Newton’s eponymous laws of motion, and law of universal gravitation were quite accurate at the time (and to this day remarkably describe macroscopic non-relativistic matter), the laws were even more powerful in their statement that the state of a physical object can be represented by mathematical variables. For Newton’s laws, the state of an object was fully captured by its position, velocity and acceleration, all of which are easily measured quantities. However, in different theories the state has since taken on various properties. Furthermore, the abstraction to a state allowed for properties that are not directly measurable – like the phase of a quantum state (only the relative phases between quantum states are measurable) – but which nonetheless have observable consequences. This represented a paradigmatic philosophical shift in the practice of science.

Diagrams are emblematic of abstraction in science. Ubiquitous in science – and especially prominent in physics – they go far beyond pictures or illustrative figures. In nearly every branch of physics, diagrams facilitate the solution of problems by making computations tractable. But even more importantly, they do so by abstracting away physically unimportant details of the system under study and emphasizing one particular feature

In Classical Mechanics, which describes how macroscopic objects like blocks and balls and trains behave, Newton’s Laws formulate the dynamics of such objects in terms of forces, which act on objects and set them in motion. Free Body Diagrams (FBDs) arise as a visual tool for keeping track of the forces acting on an object. In an FBD, forces are represented as lines emanating from (the center of mass of) an object.

As a simple example, consider the setup in figure 1 below: two electrically charged balls, A and B, are hanging (at rest) from strings attached to a rafter. Suppose we want to find the tension in the string attached to ball A. From this picture alone, it is not clear what details are relevant or even if we have all of the necessary information.

Fig. 1: Physical setup: two electrically charged balls of uniform density, A and B, are hanging statically from ropes attached to a rafter. The ropes have the same length, and each makes an angle theta will the vertical.

The Free Body Diagram for ball A gives a much cleaner picture, indicating the relevant aspects of the physics. We can see immediately that there are only three forces acting on A – gravity, a Coulomb repulsion from B, and the tension from the rope. These forces are very different in nature, but they are all treated on equal footing. We can conclude immediately that we do not need to know the length of the rope, or the length or width of the rafter. We don’t even need the angle theta.

Fig. 2: Free body diagram for setup in Fig. 1. Only forces acting on ball A are shown – the tension, electromagnetic, and gravitational. Because the ball is in equilibrium, both the horizontal (x) and vertical (y) components of the net force must cancel.

Furthermore, we need the mass of A but do not need to know the mass of B (because this is an FBD for A, not B), however we do need the electrical charge of both balls and the distance between them. Ball A is in static equilibrium, so by Newton’s Laws, the net force acting on it must be zero.

By abstracting away the nature of the forces, the details of the physical setup, and the other objects present, Free Body Diagrams isolate the ingredients responsible for determining motion, making a seemingly complicated problem feasible.

The history of physics is littered with similar examples of the power of diagrammatic abstractions, such as Minkowski Diagrams in Special Relativity, and Penrose Diagrams in General Relativity, which illuminate the causal structure of spacetime. Perhaps the most prevalent diagram in all of physics is the Feynman Diagram. Feynman Diagrams are such powerful tools that Julian Schwinger, who shared the 1965 Nobel Prize in Physics with Richard Feynman, said they “brought quantum field theory to the masses.” Feynman Diagrams are so popular they have even pervaded pop culture, finding their way into movies and onto shirts and mugs.

The central object of study in quantum electrodynamics (QED) – the study of the interactions between light and matter-  is the scattering matrix. The fundamental processes in quantum field theory are called scattering events – one particle scatters off another and breaks up into multiple (decay),  two particles collide and annihilate each other (pair annihilation), etc.. The scattering matrix provides the relationships between the initial and final states of such a system when particles scatter. It is given by an integral that is often quite difficult or even impossible to calculate directly.

Feynman Diagrams are useful tools for “book-keeping” when calculating the scattering matrix. Richard Feynman recognized that even though the scattering matrix might be hard to calculate directly, the integral could be written as a (possibly infinite) series, where each term in the series could be viewed as a set of particles interacting, representing a different pathway or “channel” for the scattering to occur. Furthermore, each term can be represented by a diagram.

These diagrams are read temporally from left to right, with initial particles entering at the far left (some initial time) and final particles exiting at the far right (after scattering). The diagrams do not contain spatial information. Every line represents a particle, and every vertex an interaction. Implicitly, momentum and charge are conserved at every vertex. Terms that contributed more strongly to the path integral corresponded to simpler – and thus more probable – particle interactions. Feynman rules provide a prescription for manipulating these diagrams, and for calculating their contributions to the scattering matrix, thus expediting the computation of the previously intractable quantity.

Fig. 3: Feynman diagram for electron-positron annihilation. p1 and p2 are the momenta of the electron and positron respectively. The product of the scattering event is a photon (the wavy line). Copied from Schwartz QFT.

Moreover, these diagrams paved the way for new theoretical developments. First, they shed light on the fundamental nature of symmetry. Taking the diagrams at face value, Feynman concluded in 1941 that a particle moving forward in time was indistinguishable from its anti-particle moving backward in time. This became known as the Feynman-Stuckelberg interpretation. 

Second, they provided insight into the role of locality. Just looking at the terms in the scattering matrix as a series, it is not clear which terms will contribute and which will get cancelled out by other terms. Viewing the series diagrammatically, it becomes obvious that there are two types of terms: connected diagrams, in which you can trace a path from any initial particle to any final particle, and disconnected diagrams, in which you cannot. The disconnected diagrams can be decomposed into connected components, and simple manipulations show that these cannot contribute to the final scattering amplitude. This leads to cluster decomposition – a statement of locality that says that experiments well-separated in space cannot influence each other.

Fig. 4: Example of disconnected and connected Feynman diagrams. The disconnected diagrams cannot interfere with the connected diagrams. Copied from Schwartz QFT.

Diagrams will always have a place in science. And the prevalence of these tools speaks to the human capacity for creativity and ingenuity. Each diagram reflects a revelation in which one particular set of features was discovered to be vital and others immaterial. As our understanding of the world develops, however, our theories grow ever more intricate. What if the essential elements of these theories become too subtle to isolate by stroke of genius alone?

Computational Abstraction

To put it bluntly, humans aren’t essential for abstraction. Humans are bound to their physical nature, but the act of abstracting means leaving the physical realm behind. Indeed, many of the technological advances of the past few years have been spurred on by computational abstraction, a process in which computers learn abstract representations of data. At the core of this renaissance is the deep neural network – an algorithm originally conceived to mimic the process of learning in the human brain.

A simplified model of the human brain consists of many connected neurons (a network) that talk (pass information) to each other. Each neuron takes some information in, transforms it, and then transmits an electrical signal via synapse to another neuron. The synapse either fires or doesn’t fire, depending on magnitude of the transformed value.

A neural network functions on the same principles: A set of neurons take input data, transform it, and then pass the new values to another set of neurons, which in turn transform and communicate the modified values. Each set of neurons is called a layer, and the number of layers is the depth of the network. One slight modification from the model of the human brain is the prescription for transmitting electrical signal, known as the activation function. Rather than the binary fire or not fire of genuine synapses, more complicated functions are used. 

Such an algorithm learns through a training process, in which it is given input data which it is asked to transform, and then the estimated output is compared to the true output (the final representation you would like it to learn. Every time the estimated output differs from the desired output, the network updates itself by changing the way it transforms inputs.

Just as the human brain performs abstraction when learning new mathematical concepts or drawing FBDs or Feynman Diagrams, a neural network abstracts away irrelevant details from the training examples when it modifies the transformation it applies to the data. However, whereas in these diagrams, the relevant features were hand-picked, neural networks learn which features are relevant.

On its face, there is no clear advantage to having multiple layers of neurons. In practice increased depth often leads to improved performance. One distinct advantage of deep neural networks is that abstraction occurs at each layer. Throughout the training process, the transformations at each layer are tuned so that the network learns intermediate representations (one for every layer), in addition to a final representation. The deeper the layer, the more abstract the features.

Take one type of neural network used to process images, called a convolutional neural net (CNN). At the highest layers, the filters look like distorted images. In the middle layers, patterns start to emerge. In the lowest layers, the CNN picks out specific textures and then edges. The CNN itself isn’t thinking, but through the process of abstraction it uncovers low-level visual features. 

For instance, let’s say you want to teach a CNN what a human face is. To train the network, you assemble a large, diverse collection of images of human faces, and feed those images through the network one by one. After each step, the CNN adjusts its understanding of faces through a process called backpropagation. If the input face differs from the network’s current understanding of a face, the network changes the way the neurons communicate with each other to try to account for these differences. As more images are passed through the network, its definition of a face becomes increasingly robust. 

Fig. 5: Example of feature representations at different layers in a convolutional neural network (CNN). The input layer takes in images of faces, and deeper layers decompose the faces into more and more abstract elements. Copied from Nathan Lintz’s Indico blog post.

By the end of the training process, the deep neural network has “learned”what a human face is by deconstructing it layer by layer, with deeper layers discovering more fundamental patterns in the data. Then the network can recombine these features in new ways, painting pictures of what it thinks a human face actually is

Deep learning facilitates scientific progress

Deep learning has already found applications in many areas of science. It is being used to model dark matter and galaxy shapes, to identify new physics in collision events at the Large Hadron Collider (LHC), and to advance drug discovery. And these data-oriented approaches have already met with tremendous success in identifying features that people could not find through intuition or genius alone. 

Higgs Detection

One of the first applications of deep learning in physics was in the discovery of the Higgs boson at CERN. The Standard Model of Particle Physics provides a unified description of three of the four fundamental forces: electromagnetic, and weak and strong nuclear interactions. It stipulates the existence of the Higgs boson – a particle that gives mass to the other particles. The Higgs was theorized to have such high energy that, when it is a possible product of a scattering event, its diagrams contribute very minimally to scattering matrix – it is produced with very low probability. 

In order to verify the existence of the Higgs boson, physicists conducted trillions of scattering events in the LHC and set out to demonstrate that the measured and theorized Higgs contributions matched. This required distinguishing events in which Higgs bosons were produced from background events, some of which gave quite similar signatures. 

The primary challenge lay in the quantity of data required to determine the Higgs’ contribution to within acceptable margin of error. At the LHC, particles are collided together at near the speed of light, resulting in billions of scattering events each second. The detectors take millions of measurements for each collision, resulting in the creation of roughly a petabyte of data per second. 

It was unfeasibly under hardware constraints to store the massive amount of data resulting from all collisions necessary for the theorized number of Higgs bosons to be produced. Thus, decisions about which collisions to store, (the ones that are likely to have produced High particles), had to be made on the spot. Therefore, the traditional machinery of quantum field theory was too bulky for this problem. Instead, deep neural nets were trained to take the measurements from the detector as input and classify events as potentially interesting or not. In other words, the networks took in physical attributes from the collision, and abstracted away what makes a collision likely to produce Higgs bosons. This allowed for essentially instantaneous classification. 

Drug Discovery

More recently, deep learning has shown great promise in the quest for novel classes of molecules and materials. Throughout history, entire eras have been defined by the discovery and exploitation of new types of materials, from the bronze age to the iron age to our current silicon age and the blossoming of the semiconductor industry. Since at least 1942, when penicillin was derived from the penicillium fungus and used as an antibiotic, pharmaceuticals have had a similarly society-altering effect on public health.  This has resulted in the quest for compounds that exhibit particular properties of interest, be they medicinal, electronic or otherwise.

The difficulty here is two-fold: first, the space of possible materials (or of possible drugs) is vast, and far too expansive to be searched systematically. Second, the synthesis or a compound from scratch is expensive and time-consuming. 

In order to find a drug that satisfies a particular property, it is necessary to greatly reduce the number of compounds that need to be synthesized. This process of reducing the search space is known as high-throughput screening. Machine learning has been a part of this process for decades, but the quality of the computational sieve required to pick out good candidates lay out of reach – until the increased abstraction and representational power of deep neural networks made many problems in drug discovery tractable.

The road ahead

While abstraction itself does not require a human element, science does. As a tool for abstraction, deep learning relies heavily on practitioners and scientists. Humans must tune the hyperparameters of the network such as the learning rate, which controls how much the transformations at each neuron are updated at each step of the training process. Humans also specify the depth of the network, and the number of neurons in each layer. These choices can be far from obvious. 

Perhaps even more importantly, deep neural networks do not replace previous scientific methods and results, but instead build upon them. At CERN, the neural networks were trained using the results from simulated collision events based upon the physics of the Standard Model, viewed by many as the crowning achievement of theoretical physics thus far. In drug discovery, one of the essential factors impacting performance is the input representation. A priori it is not clear how best to present a molecule as data to a computer, be it a list of constituents and relative positions of atoms, a graph with atoms as vertices and bonds as edges, or something else entirely. It turns out that if scientists use domain knowledge (pertaining to the desired properties), they can generate chemically inspired input encodings that far outperform naïve encodings. 

Deep learning is not a panacea for the problems of science. It will not reveal to us the true nature of our universe, nor will it replace the role of humans in science. Time and again revolutionary thinkers have shifted the paradigm and changed the way we view the world, and the human spirit has strength to prevail against all odds. But by utilizing deep learning as a tool, we can shift the odds in our favor, and in so doing expedite scientific progress.