Protein structure prediction
Protein structure prediction, that is, the determination of the three-dimensional shape of a protein by computational methods starting from the primary sequence, is a challenge that structural bioinformatics has been grappling with since the 60s of the 19th century. During this time, systematic progress was made, and the state-of-the-art tools were selected in subsequent editions of the CASP (Critical Assessment of protein Structure Prediction) competition. Various approaches have been developed, including comparative modeling using templates and de novo modeling. In the early 2000s, methods using machine learning and artificial intelligence began to have significant success. An undeniable breakthrough came in the CASP14 edition, where the novel AlphaFold method was the top-ranked predictor and provided previously unattainable high-quality predictions. If you are curious about the details of this achievement, I refer you to an informative and truly enjoyable analysis published by the Oxford Protein Informatics Group.
AlphaFold by DeepMind
AlphaFold is an artificial intelligence (AI) program developed by Alphabets’s/Google’s DeepMind which performs predictions of protein structure. The program is designed as a deep learning system.
A team that used AlphaFold 2 (2020) […] scored above 90 for around two-thirds of the proteins in CASP’s global distance test (GDT), a test that measures the degree to which a computational program predicted structure is similar to the lab experiment determined structure, with 100 being a complete match.
Some researchers noted that the accuracy is not high enough for a third of its predictions, and that it does not reveal the mechanism or rules of protein folding for the protein folding problem to be considered solved. Nevertheless, there has been widespread respect for the technical achievement.
Explore other resources
- DeepMind: AlphaFold: a solution to a 50-year-old grand challenge in biology
- Oxford Protein Informatics Group CASP14: what Google DeepMind’s AlphaFold 2 really achieved, and what it means for protein folding, biology and bioinformatics
- AlphaFold Database AlphaFold DB provides open access to protein structure predictions for the human proteome and other organisms
Cite AlphaFold
- Jumper, J et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021).
- Mirdita M, Schütze K, Moriwaki Y, Heo L, Ovchinnikov S, Steinegger M. ColabFold - Making protein folding accessible to all. bioRxiv, 2021
AlphaFold Computing Solutions
AlphaFold2 installation on your local machine
If you were going to run AlphaFold using the direct computing power of your local machine (whether a workstation or laptop) you should be warned that you will need close to 2TB of disk space just for the AlphaFold-related databases alone. This can be a serious limitation and then it is better to go straight to the tutorial section on using AlphaFold via HPC infrastructure (ISU HPC, SciNet HPC) or cloud computing with Google Colab.
In case you have sufficient resources, you can have AlphaFold running locally by following the instructions provided by github.com/kalininalab. You can also find the singularity container for AlphaFold2 at github.com/hyoo.
AlphaFold2 pipeline in Cloud via Google Colab
The Colab is a freely accessible online Colaboratory Google Research platform, where you can easily run the AlphaFold pipeline using ColabFold notebook. The more detailed instructions are provided in a separate tutorial, AlphaFold Cloud Computing Solutions.
AlphaFold2 standalone preinstalled on HPC infrastructure
The general instructions of using AlphaFold2 preinstalled as a standalone module on various High Performance Computing infrastructures are provided in the AlphaFold High Performance Computing section. In addition, a detailed description of how AlphaFold2 is configured on Iowa State University’s HPC infrastructure and SCINet HPC is available in the AlphaFold via the ISU HPC infrastructure and AlphaFold via the SCINet HPC infrastructure sections, respectively.