Computational Modelling in Structural Biology

This post was contributed by Dr Clare Sansom, senior associate lecturer in the Department of Biological Sciences. Dr Sansom attended Dr Maya Topf’s lecture on Computational Modelling during Birkbeck Science Week 2016.

Brain computer (copyright Marcos Fernandez via Flickr. Image cropped)

Brain computer (copyright Marcos Fernandez via Flickr. Image cropped)

The Wednesday of Birkbeck Science Week – 13 April – was set aside to celebrate women in science, and it included a talk by Maya Topf of the Department of Biological Sciences. Maya, who was educated in Israel and Oxford, came to Birkbeck on an MRC fellowship after a post-doc at UCSF and has rapidly worked up the academic ladder to the position of reader in computational biology. She will be appointed as a full Professor in October this year.

Maya began by explaining that her research involves making models: specifically, three-dimensional models of biological molecules. Models have enabled scientists to make sense of biological processes since Watson and Crick’s double helical model of DNA showed how this molecule could both replicate itself and act as a template for the synthesis of proteins. This model, celebrated in the film Life Story that was shown earlier in Science Week, would not have been possible without the X-ray photographs of DNA fibres obtained by Rosalind Franklin, then working at King’s College.

The purpose of computational modelling

And the main purpose of modelling molecules is the same now as it was in the 1950s: to discover how they function, and specifically how they function in the environment of the cell. We still have no means of observing what protein molecules – the tiny ‘machines’ that drive all cellular processes – look like when they are at work; all we have is models that may be more or less precise. The very first protein structures to be determined were of the oxygen-carrying proteins myoglobin and haemoglobin, and the first of these, published in 1960, were very imprecise: it was possible to see the shape of the chain but no individual atom positions. These, and all early protein structures, were obtained by X-ray crystallography; ten years later the same group used the same technique to determine a structure in which all atoms except hydrogens could be seen.

DNAThese two proteins have now also been studied using two other structural biology techniques, nuclear magnetic resonance and, most recently, electron microscopy. This last technique is best suited for studying large proteins and complexes of many protein chains, and therefore not suitable for studying most forms of haemoglobin, a small, simple protein. Haemoglobin in earthworms, however, functions as a complex of many individual molecules. Electron microscopy gave a low-resolution picture of the overall shape of these molecules, much like those first haemoglobin structures, and a more precise picture was built up by ‘docking’ atomic-resolution X-ray structures of a single haemoglobin molecule into the shape of the fold.

During the last half-century these three techniques have generated structures for a wide range of proteins, leading to insights in many areas of biochemistry: how the body’s catalysts, the enzymes, work; how drugs bind to their receptors; and how a ‘large’ molecular complex, the ribosome, can synthesise all the proteins that a cell needs from RNA templates. The first atomic structures of this ‘molecular machine’ were obtained in the early 2000s and have transformed our view of protein translation since then (see these videos from the Howard Hughes Medical Institute in the US: basic and more advanced versions).

But, as real proteins are too small to be visible with even the best light microscopes, we need to realise that even these experimental structures are models. Each of the three techniques has its own advantages and limitations. X-ray crystallography needs protein crystals, which can be difficult or even impossible to obtain for particular proteins; electron microscopy cannot be used to study small proteins, but NMR works best with these. All three techniques are complex, time-consuming and expensive, and therefore proteins with known structures are greatly outnumbered by those without structures. There are probably about 43,000 known structures of ‘distinctly different’ proteins known compared to over half a million well-characterised protein sequences.

Bridging the sequence-structure gap

Maya explained that much of her group’s work concerns trying to bridge this ‘sequence-structure gap’ by using computers to model unknown protein structures. There are several ways of doing this; if the computers are powerful enough and the molecule is small enough (and the smallest proteins can be) it is possible to generate a model structure ‘from first principles’ using physics. These techniques assume that the molecules are likely to occupy conformations in which their energy is low. The best results simulate protein folding to produce model structures that can be very close to the experimentally-determined ones, but these require an enormous amount of computational power. Less expensive computer modelling methods tend to rely more on experimental data; Maya collaborates with Helen Saibil in Biological Sciences to fit atomic structures of individual proteins to lower-resolution maps of protein complexes that were generated by electron microscopy. Proteins studied in this way include GroEL, a ‘molecular chaperone’ that forms a chamber that isolates unstructured proteins so that they can fold.

Dr Maya Topf

Dr Maya Topf

Another method of modelling protein structures uses evolution, and relies on the fact that there are remarkably few different basic protein structures – each of the 43,000 known protein structures takes up one of only about 1,000 different folds. Just as all birds have the same basic pattern, with two legs and two wings, all proteins with a particular function will usually have a similar fold. It is therefore possible to model the structure of a protein based on one or more of its evolutionary relatives, in a technique called ‘homology modelling’. In some cases, it is possible to produce a usable model from the structure of a related protein from a very different type of organism. It was more than a decade after the publication of the first bacterial ribosome structures before similar structures could be obtained from mammalian ribosomes, but many useful results were obtained during that time by modelling mammalian ribosome sequences using the bacterial structures and low-resolution electron microscopy data.

Maya ended her talk by stressing that structural biology is a science of model-building. It requires experimental data complemented by physics and by evolution, and, almost above all, it requires powerful computers. Generally, the more sources of information can be combined into a model, the nearer the ‘correct’ structure that model will be: and to quote the statistician George Box, ‘all models are wrong, but some are useful’.

Find out more

Share

Leave a Reply

Your email address will not be published. Required fields are marked *