Skip to main content

AlphaFold

Introduction

AlphaFold is a groundbreaking AI system that is accelerating research in the field of bioinformatics. To use AlphaFold, the system first takes in a sequence of amino acids and then predicts the three-dimensional structure of a protein with extreme efficiency.

Read more on the AlphaFold official website.

This section on AlphaFold will guide you through using AlphaFold on Elja.


Getting started

note

Due to NVIDIA compatibility issues, Elja now requires you to run AlphaFold in a Conda environment.

Setting up the Conda environment

We start by initializing the conda environment, following the same steps as described in Conda:

$ module use /hpcapps/lib-mimir/modules/all 
$ module load Anaconda3/2022.05
$ conda config --add channels defaults
$ conda config --add channels bioconda
$ conda config --add channels conda-forge
$ conda config --set auto_activate_base false
$ conda init
$ bash # You can also log out and log in again.

Load AlphaFold

Once conda is initialized and ready to use, we can load the AlphaFold module.

$ ml use /hpcapps/libbio-gpu/modules/all
$ ml load AlphaFold/2.3.1

Run AlphaFold on Elja

note

AlphaFold will only run efficiently on GPU compute nodes. Be sure to specify a GPU partition when running your jobs.

To run AlphaFold on Elja, you can either use an interactive session or submit a batch job.

Starting an interactive session

You can start an interactive session with the srun command on a GPU node. You can use the screen command or tmux to create a secondary terminal where your interactive session runs in the background.

$ srun --job-name "AlphaFold" --partition gpu-1xA100 --time 01:00:00 --pty bash
$ conda activate $env_path
$ run_alphafold.sh -d /AlphaFoldData/AlphaFold/data -o /hpcapps/source/alphafold_non_docker/dummy_test/ -f /hpcapps/source/alphafold_non_docker/example/query.fasta -t 2020-05-14

Running AlphaFold with SBATCH

cat submit.slurm
#!/bin/bash
#SBATCH --mail-type=ALL
#SBATCH --mail-user=<MAIL> # for example uname@hi.is
#SBATCH --nodes=1 # number of nodes
#SBATCH --partition=gpu-1xA100
#SBATCH --time=1-00:00:00 # run for 1 day maximum
#SBATCH --output=slurm_job_output.log
#SBATCH --error=slurm_job_errors.log # Logs if job crashes

module use /hpcapps/libbio-gpu/modules/all
module load AlphaFold/2.3.1
conda activate $env_path

# Run the command
run_alphafold.sh -d /AlphaFoldData/AlphaFold/data -o /hpcapps/source/alphafold_non_docker/dummy_test/ -f /hpcapps/source/alphafold_non_docker/example/query.fasta -t 2020-05-14

Additional Information

AlphaFold Parameters

When running AlphaFold using the run_alphafold.sh script, several parameters are available:

  • -d /AlphaFoldData/AlphaFold/data: Specifies the location of the AlphaFold database (required)
  • -o <output_dir>: Directory where results will be saved
  • -f <fasta_file>: Path to the FASTA file containing the protein sequence
  • -t <max_template_date>: Maximum template release date (YYYY-MM-DD)

For a complete list of parameters, refer to the AlphaFold documentation.

Interpreting Results

AlphaFold generates several files for each prediction:

  • PDB files containing the predicted structures
  • JSON files with confidence metrics
  • Visualization files for examining the quality of predictions

The primary metric for evaluating prediction quality is the pLDDT score (predicted Local Distance Difference Test), which ranges from 0 to 100, with higher values indicating higher confidence.

Troubleshooting

Common issues when running AlphaFold on Elja:

  1. CUDA errors: Ensure you're using the correct GPU partition
  2. Memory limitations: Large proteins may require more GPU memory; adjust batch sizes if needed
  3. Environment errors: Verify that the Conda environment is properly activated

For additional help, contact the Elja support team.