Thursday, January 23, 2025

thumbnail

BIOENGINEERING : Databases for Obtaining Nucleotide Sequences for Expression Vectors.


 

Introduction

In a previous tutorial, we looked at the rules that dictate whether a protein is amenable for expression based on size, location and solubility.  This form of information can be accessed from protein databases such as UniProt or the highly curated Swiss-prot. Once you have determined that a protein is likely to be expressed, and identified a suitable expression system, you will need the sequence of nucleotides required to produce the protein. Protein expression systems include e. coli (the most widely used), bacillus subtilis, yeast (Saccharomyces cerevisiae), insect cells (via baculovirus) and mammalian cells. Yeast, insect or mammalian expression systems are needed, where the protein of interest requires complex post-translational modifications for functionality.


In this tutorial, we're going to look at where you can get the nucleotide sequences required for expressing your protein of interest, in an expression system.

Outline

  1. Databases for nucleotide sequences

  2. Downloading the sequences

  3. Steps After Downloading the Nucleotide Sequences

Databases for Nucleotide Sequences

When seeking to express proteins in an expression system such as e.coli, yeast, or mammalian cells,  you need the Coding Sequences (CDS) for that protein. The CDS is the nucleotide sequence encoding the protein of interest. It excludes regions of a gene known as the 5’ and 3’ untranslated regions (UTRs). Introns likewise, are excluded from CDS sequences. 


On the go? Watch the tutorial here:




NCBI (National Center for Biotechnology Information)
Website: https://www.ncbi.nlm.nih.gov

Steps:

  1. Search for the gene of interest in the NCBI Gene database.

  2. Look for the "mRNA and Protein" section.

  3. Download the CDS (coding sequence) in FASTA format.

2. Ensembl
Website: https://www.ensembl.org

Steps:

  1. Search for your gene of interest in Ensembl.

  2. Navigate to the "Gene" page and select the transcript of interest.

  3. Export the CDS as a FASTA file.

3. UniProt

Website: https://www.uniprot.org


Steps:

  1. Search for your protein of interest.

  2. Check the "Sequence" section to find CDS-related data or links to nucleotide sequences.

  3. Export the sequence in FASTA format.


4. JGI Genome Portal
Website: https://genome.jgi.doe.gov

Steps:

  1. Search for genes in specific microbial, fungal, or plant genomes.

  2. Download annotated CDS sequences.

5. PlasmoDB
Website: https://plasmodb.org

Steps:

  1. Search for your gene of interest in the search bar using the gene name, ID, or keyword.

  2. Click on the relevant gene from the search results to view its detailed information.

  3. Navigate to the "Sequences" tab on the gene's detail page.

  4. Choose the sequence type you need:

  1. Genomic DNA: Includes exons and introns.

  2. CDS (Coding Sequence): The protein-coding exons, spliced and concatenated. No introns or untranslated regions.

  3. Protein Sequence: The translated amino acid sequence.

  1. Select FASTA format and click "Download" to save the sequence.

There are numerous additional databases for downloading organism specific sequences. For example, TAIR (The Arabidopsis Information Resource) etc.

Steps After Downloading the Nucleotide Sequences

You may improve the amount of soluble product or the efficiency with which your protein is translated, by optimizing the codons. This may be essential in heterologous systems. For example, if you are expressing a human protein in e. coli. Codon optimisation involves replacing certain codons in the gene with more commonly used codons, provided that they are synonymous. This ensures that the sequence that you have, corresponds with the most highly used codons in your host organism. You can use a tool such as GenScript’s Codon Optimisation Tool.

Website: https://www.genscript.com/tools

Steps:

  1. Input the desired protein or organism-specific gene.

  2. Optimize the CDS for your expression system.

    Use Case: Optimized sequences for custom expression.

Once you have the sequence, you will need to design primers to amplify the gene from its native environment. To do this, total RNA is often isolated and converted to something called complementary DNA, after enrichment for messenger RNA (mRNA). Once the complementary DNA is obtained (cDNA), PCR can be performed to isolate the gene of interest.

The most common reason for failure to detect a gene from a cDNA prep, is that the gene is poorly represented in the cDNA. Adding more cDNA to the PCR may work. Alternatively, you may extract the cDNA from a different tissue, where applicable.   If all fails, or where it is not possible to amplify a gene from cDNA, there are companies that can synthesise the gene for a price.  The gene sequence is designed and chemically synthesized in vitro.


Subscribe by Email

Follow Updates Articles from This Blog via Email

No Comments

About

Search This Blog

Powered by Blogger.

About Me

My photo
Adwoa Agyapomaa has a BSc from RMIT, Australia and an MPH from Monash University, Australia. Adwoa is the founder of Adwoa Biotech. She is currently a Senior Research Assistant. Enjoyed the tutorial? Connect with me on YouTube [Adwoa Biotech] where we talk biotech techniques, and lab workflows.