Message Boards

WOLFRAM COMMUNITY

7939 Views

19 Replies

25 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Biological Sciences Chemistry Curated Data Graphics and Visualization Wolfram Language Wolfram Function Repository

Converting DNA strands to amino acid chains

Samikshaa Natarajan

Posted 2 years ago

POSTED BY: Samikshaa Natarajan

19 Replies

Sort By:

Zach Shelton

Zach Shelton, Wolfram Research

Posted 1 year ago

The function can be used and called from the resource function repository: https://resources.wolframcloud.com/FunctionRepository/resources/DNAtoAminoAcid Feel free to check it out!

POSTED BY: Zach Shelton

Todd Allen

Todd Allen, Harrisburg Area Community College

Posted 1 year ago

Your code is now producing correct output for the central dogma, DNA --> RNA --> Protein. If you are interested in learning more about bioinformatic topics, let me recommend this book, which is advanced - but there is nothing wrong with challenging yourself. bioinformatics algorithms

POSTED BY: Todd Allen

Todd Allen

Todd Allen, Harrisburg Area Community College

Posted 1 year ago

You are mistaken about the meaning of "coding strand." In the bioinformatics community coding strand refers to the DNA sequence that matches the mRNA (except for having Ts instead of Us). This means the coding strand is what is produced by the process of transcription, and the coding strand (in RNA form) is the template the ribosome would use to make a polypeptide. See here: coding strand from wikipedia In essence the coding strand contains the actual codons, so when a ribosome "sees" TAC in the coding strand which is the template the ribosome is attached to, it will insert a tyrosine. The cell does not take the TAC from the coding strand and transcribe it to AUG as you suggest because the coding strand had already been made by transcription. Think about it and see if you can realize your code is producing incorrect output.

POSTED BY: Todd Allen

Samikshaa Natarajan

Posted 1 year ago

I see. I think I've confused the coding strand and template strand. My code takes the template strand, creates the complementary mRNA strand, then matches that mRNA strand with the appropriate anticodon. In that case, a "TAC" on the template strand would code for an "AUG"/methionine on the mRNA/ribosome. Is that correct?

POSTED BY: Samikshaa Natarajan

Todd Allen

Todd Allen, Harrisburg Area Community College

Posted 1 year ago

Let's take your second-to-last sentence: In that case, a "TAC" on the template strand would code for an "AUG"/methionine on the mRNA/ribosome. You have to remember there is chemical directionality to the DNA. So, TAC on the template strand is really 5' - TAC - 3' Which means, on the coding strand you would have: 3' - AUG - 5' Since ribosomes always scan mRNA 5' to 3', the ribosome would actually "see" 5' - GUA - 3', so it would insert valine in the polypeptide. Don't get frustrated. This is not intuitive stuff that naturally flows out of most textbooks. You are learning as you push through this.

POSTED BY: Todd Allen

Samikshaa Natarajan

Posted 1 year ago

So does that mean the sequence gets reversed before it's translated in mRNA? So then a CAT on the template strand would be 5' - CAT - 3', so the mRNA is 3' - GUA - 5', which then is read by the ribosome as 5' - AUG - 3' and will thus insert a methionine, right?

POSTED BY: Samikshaa Natarajan

Todd Allen

Todd Allen, Harrisburg Area Community College

Posted 1 year ago

That's correct, Samikshaa! You've got it! Do you see a path forward to update your code to produce correct output?

POSTED BY: Todd Allen

Samikshaa Natarajan

Posted 1 year ago

Thank you for your help! I've updated the post & code to reflect what you said, is it correct now?

POSTED BY: Samikshaa Natarajan

Todd Allen

Todd Allen, Harrisburg Area Community College

Posted 1 year ago

Are you adopting a non-standard genetic code? In the "standard" genetic code "TAC" codes for tyrosine, whereas ATG codes for methionine.

POSTED BY: Todd Allen

Samikshaa Natarajan

Posted 1 year ago

Right, but assuming the input DNA strand is the coding strand, it is first transcribed into mRNA and then into the amino acid chain. Given an mRNA strand, "AUG" codes for methionine; therefore, on the DNA strand, "TAC" corresponds to "AUG" which becomes methionine. That's why the function only starts reading at "TAC" on the input DNA strand, since that's where the methionine will be.

POSTED BY: Samikshaa Natarajan

Todd Allen

Todd Allen, Harrisburg Area Community College

Posted 1 year ago

Samikshaa, I always enjoy seeing Mathematica used for biology topics. Thank you for posting your work. I do believe that the output of your code is, however, biologically incorrect. Compare the list of amino acids from your first input cell to the list of amino acids returned by the built-in command BioSequenceTranslate: ResourceFunction[ ResourceObject[<\|"Name" -> "DNAtoAminoAcid", "ShortName" -> "DNAtoAminoAcid", "UUID" -> "67954e72-53c2-4527-a7c0-68dd2ba1497e", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Convert a given strand of DNA to a list of \ amino acids", "RepositoryLocation" -> URL[ "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$\ 093e0005691b471995f708959efa4269`DNAtoAminoAcid", "FunctionLocation" -> CloudObject[ "https://www.wolframcloud.com/obj/e7ed59a3-2a4e-4af8-b48c-\ a1d06c90942e"]\|>, \ {ResourceSystemBase -> "https://www.wolframcloud.com/obj/\ resourcesystem/api/1.0"}]][\ "GTATACTGGTCATAGCATTGACTGGTCCATGTACTTACCGCT"] Out[10]= {Entity["Chemical", "LMethionine"], Entity["Chemical", "LThreonine"], Entity["Chemical", "LSerine"], Entity["Chemical", "LIsoleucine"], Entity["Chemical", "LValine"], Entity["Chemical", "LThreonine"], Entity["Chemical", "LAsparticAcid"], Entity["Chemical", "LGlutamine"], Entity["Chemical", "LValine"], Entity["Chemical", "LHistidine"], Entity["Chemical", "LGlutamicAcid"], Entity["Chemical", "LTryptophan"], Entity["Chemical", "LArginine"]} In[11]:= BioSequenceTranslate[ BioSequence["DNA", "GTATACTGGTCATAGCATTGACTGGTCCATGTACTTACCGCT"]]["SequenceString"] Out[11]= "VYWS.H.LVHVLTA" Compare Out[10] to Out[11], they are different. The codon "GTA" codes for valine, not methionine. Is this what you intended?

Samikshaa,

I always enjoy seeing Mathematica used for biology topics. Thank you for posting your work.

I do believe that the output of your code is, however, biologically incorrect.

Compare the list of amino acids from your first input cell to the list of amino acids returned by the built-in command BioSequenceTranslate:

ResourceFunction[
ResourceObject[<|"Name" -> "DNAtoAminoAcid", 
"ShortName" -> "DNAtoAminoAcid", 
"UUID" -> "67954e72-53c2-4527-a7c0-68dd2ba1497e", 
"ResourceType" -> "Function", "Version" -> "1.0.0", 
"Description" -> "Convert a given strand of DNA to a list of \
amino acids", 
"RepositoryLocation" -> URL[
"https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], 
"SymbolName" -> "FunctionRepository`$\
093e0005691b471995f708959efa4269`DNAtoAminoAcid", 
"FunctionLocation" -> CloudObject[
"https://www.wolframcloud.com/obj/e7ed59a3-2a4e-4af8-b48c-\
a1d06c90942e"]|>, \
{ResourceSystemBase -> "https://www.wolframcloud.com/obj/\
resourcesystem/api/1.0"}]][\
"GTATACTGGTCATAGCATTGACTGGTCCATGTACTTACCGCT"]

Out[10]= {Entity["Chemical", "LMethionine"], 
Entity["Chemical", "LThreonine"], Entity["Chemical", "LSerine"], 
Entity["Chemical", "LIsoleucine"], Entity["Chemical", "LValine"], 
Entity["Chemical", "LThreonine"], 
Entity["Chemical", "LAsparticAcid"], 
Entity["Chemical", "LGlutamine"], Entity["Chemical", "LValine"], 
Entity["Chemical", "LHistidine"], 
Entity["Chemical", "LGlutamicAcid"], 
Entity["Chemical", "LTryptophan"], Entity["Chemical", "LArginine"]}

In[11]:= BioSequenceTranslate[
BioSequence["DNA", 
"GTATACTGGTCATAGCATTGACTGGTCCATGTACTTACCGCT"]]["SequenceString"]

Out[11]= "VYWS.H.LVHVLTA"

Compare Out[10] to Out[11], they are different.

The codon "GTA" codes for valine, not methionine.

Is this what you intended?

POSTED BY: Todd Allen

Samikshaa Natarajan

Posted 1 year ago

My code looks for the first instance of the "TAC" sequence since that becomes the starting methionine codon needed in translation. In the input string "GTATACTGGTCATAGCATTGACTGGTCCATGTACTTA CCGCT", the "TAC" first appears after the "GTA", so the "GTA" is ignored and the translation begins at "TAC", meaning the start codon is methionine. "GTA" does indeed code for valine, but my code ignores it and starts at methionine.

POSTED BY: Samikshaa Natarajan

J. M.

Posted 2 years ago

This is very interesting work, Samikshaa. Since you already use `BioSequence[]`, are you already aware of the functions `BioSequenceComplement[]` and `BioSequenceTranslate[]`? For example, BioSequenceTranslate[BioSequenceComplement[BioSequence["DNA", "AGTCGTAGTACGGAT"]]] BioSequence["Peptide", "SASCL", {}] BioSequenceTranslate[BioSequenceComplement[BioSequence["DNA", "TACTTTTCGTCCGGTATAATT"]]] BioSequence["Peptide", "MKSRPY.", {}] where a stop is represented by a period in `BioSequence[]`.

POSTED BY: J. M.

Samikshaa Natarajan

Posted 1 year ago

It's similar to those functions, except this is one function and it returns a list of amino acid entities rather than a BioSequence. But those functions are certainly useful!

POSTED BY: Samikshaa Natarajan

J. M.

Posted 1 year ago

Well, once you have the `BioSequence[]`, it isn't overly difficult to get the same result as your function: Lookup[EntityValue[Entity["BioSequenceType", "Peptide"], EntityProperty["BioSequenceType", "AlphabetRules"]], Characters[BioSequenceTranslate[BioSequenceComplement[ BioSequence["DNA", "AGTCGTAGTACGGAT"]]] @ "SequenceString"], Nothing] I invite you to study the documentation for `BioSequence[]` and functions related to it in more detail.

Well, once you have the BioSequence[], it isn't overly difficult to get the same result as your function:

Lookup[EntityValue[Entity["BioSequenceType", "Peptide"], 
                   EntityProperty["BioSequenceType", "AlphabetRules"]], 
       Characters[BioSequenceTranslate[BioSequenceComplement[
                  BioSequence["DNA", "AGTCGTAGTACGGAT"]]] @ "SequenceString"],
        Nothing]

I invite you to study the documentation for BioSequence[] and functions related to it in more detail.

POSTED BY: J. M.

Samikshaa Natarajan

Posted 1 year ago

I will explore it more, thank you!

POSTED BY: Samikshaa Natarajan

Moderation Team

Moderation Team, WOLFRAM

Posted 2 years ago

-- you have earned *Featured Contributor Badge* Your exceptional post has been selected for our editorial column *Staff Picks* http://wolfr.am/StaffPicks and Your Profile is now distinguished by a *Featured Contributor Badge* and is displayed on the Featured Contributor Board. Thank you!