Group Abstract

Message Boards

WOLFRAM COMMUNITY

13.3K Views

1 Reply

10 Total Likes

View groups...

Follow this post

Share this post:

GROUPS:

Staff Picks Biological Sciences Data Science Wolfram Language Modeling Statistics and Probability Machine Learning Wolfram Summer School Neural Networks Artificial Intelligence

[WSSA16] Emulation of the 'RNA to Amino Acid Sequence' pathway

Elen Vardanyan

Elen Vardanyan, Luys Foundation

Posted 9 years ago

ABSTRACT A living cell has a vast amount of enzymes and molecules that have influence on the derivation of a protein from a gene. Protein synthesis is one of the most complex processes in the cell and it involves a lot of signals, enzymes, and other molecular complexes. As the central dogma of molecular biology says, our DNA undergoes processes that make it into an RNA sequence (Transcription), and the latter to an amino acid sequence (Translation); amino acids then fold into proteins. In this pipeline, two essential enzymes are the spliceosome and the ribosome. Spliceosome cuts off the unnecessary parts (Introns) from the primary RNA transcript and joins the parts (Exons) which then have to be translated into an amino acid sequence. This produces a mature mRNA. The ribosome, in its turn, takes the mature mRNA and translates it into its corresponding amino acid sequence. The aim of this project is to model the primary RNA transcript ?mRNA?Amino Acid Sequence pipeline. We used Machine Learning techniques to design and train an Artificial Neural Network to recognize all the important sites that take part in these processes and imitate some of the algorithms that the cell does during protein synthesis. I. Exon-Intron Splicing Data Acquisition In the first step, we import all of the data about the special sites that will be needed. Data on Intron and Exon splice site regions were acquired from a Machine Learning Database. Firstly, we import around 3160 sequences from the database. The sequences are labeled "EI" for an Exon-Intron splice site, "IE" for an Intron-Exon splice site and "N" for Neither. Throughout the project, we converted the sequences to 2-dimensional vectors by converting the nucleotide letters A, C, G and T to {1,0,0,0}, {0,1,0,0}, {0,0,1,0} and {0,0,0,1}, respectively. In this way it is easier for the classifier to understand the data. Classifier We make an association of sequences and their labels and Classify this data. Artificial Neural Network Design and Training (Optional) We also propose an ANN that is trained on the acquired data as an alternative for the classification of splice sites. The user needs to format the data into grayscale images for dimensions {60, 4} to test the network. Formatting of the data is also presented below. (Important: The whole dataset has been used for training, so the user needs another one.) Sliding Window To analyze the sequence and to find splice sites in it, we construct a 60 nucleotides long "window" that slides along the primary RNA transcript and looks for splice sites. takeIdxSeq[seq_, idx_] := If[idx <= Length@seq, Take[seq, {idx, 59 + idx}]] scoreEI[str_String] := Module[ {p, k, l, seq, splicesites, out}, seq = Characters[str] /. {"A" -> {1, 0, 0, 0}, "C" -> {0, 1, 0, 0}, "G" -> {0, 0, 1, 0}, "T" -> {0, 0, 0, 1}}; splicesites = ssList[str]; k = Take[seq, {#, # + 59}] & /@Flatten@Position[splicesites, "EI"]; l = takeEIRange[#] & /@ k; p = findPositions[#] & /@ l; Thread[{score[pwmDon, Part[p, #]] & /@ Range[Length@p], "EI", Flatten@Position[splicesites, "EI"]}] ] scoreIE[str_String] := Module[ {p, k, l, seq, splicesites, out}, seq = Characters[str] /. {"A" -> {1, 0, 0, 0}, "C" -> {0, 1, 0, 0}, "G" -> {0, 0, 1, 0}, "T" -> {0, 0, 0, 1}}; splicesites = ssList[seq]; k = Take[seq, {#, # + 59}] & /@ Flatten@Position[splicesites, "IE"]; l = takeIERange[#] & /@ k; p = findPositions[#] & /@ l; Thread[{score[pwmAcc,Part[p, #]] & /@ Range[Length@p],"IE", Flatten@Position[splicesites,"IE"]}] ] Donor Site and Acceptor Site PWMs The Position Weight Matrices (PWMs) are commonly used to represent motifs (patterns) in biological sequences. These two are acquired from the same sequences from the database. Donor Site (Exon-Intron) PWM Acceptor Site (Intron-Exon) PWM Removing the wrong Splice Sites This is one of the most crucial steps in the whole processes. An mRNA has to start and end with Exons, so if there is a IE splice site near the beginning or a EI splice site near the end of the sequence, this means that the splice sites need to be further filtered. findMaxpos[list_, idx_] := Flatten@Position[list[[All, 1]], Max[list[[All, 1]]]] + idx - 1; FindWrongSS[seq_String] := Module[ {listEI, listIE, list, eipos, iepos, m1, m2, k1, k2, j1, j2, f1, f2, v1, v2, out}, listEI = scoreEI[seq]; listIE = scoreIE[seq]; list = SortBy[Partition[Flatten@Append[listEI, listIE], 3], Last]; eipos = Flatten@Position[list[[All, 2]], "EI"]; iepos = Flatten@Position[list[[All, 2]], "IE"]; m1 = {Min[#], Max[#]} & /@ Split[eipos, #2 - #1 == 1 &]; m2 = {Min[#], Max[#]} & /@ Split[iepos, #2 - #1 == 1 &]; k1 = DeleteCases[m1, a_ /; First[a] == Last[a]]; k2 = DeleteCases[m2, a_ /; First[a] == Last[a]]; j1 = Flatten[findMaxpos[Part[list, First@# ;; Last@#], First@#] & /@ k1]; j2 = Flatten[findMaxpos[Part[list, First@# ;; Last@#], First@#] & /@ k2]; f1 = Flatten[Range[First@#, Last@#] & /@ k1]; f2 = Flatten[Range[First@#, Last@#] & /@ k2]; v1 = list[[#, 3]] & /@ Complement[f1, j1]; v2 = list[[#, 3]] & /@ Complement[f2, j2]; out = Flatten[Append[v1, v2]]; out ] removeUnnecessary[seq_String] := Module[ {list, firstEI, firstIE, lastEI, lastIE, splicesites}, list = FindWrongSS[seq]; splicesites = ssList[seq]; splicesites[[list]] = "N"; firstIE = Part[First@Position[splicesites, "IE"], 1]; firstEI = Part[First@Position[splicesites, "EI"], 1]; lastIE = Part[Last@Position[splicesites, "IE"], 1]; lastEI = Part[Last@Position[splicesites, "EI"], 1]; If[firstIE < firstEI, splicesites[[firstEI]] = "N"]; If[lastIE < lastEI, splicesites[[lastEI]] = "N"]; splicesites ] Final Step: Splicing Introns and Joining Exons Spliceosome[seq_String] := Module[ {a, b, k, m, positions, str}, a = Flatten@Position[removeUnnecessary[seq], "EI"]; b = Flatten@Position[removeUnnecessary[seq], "IE"]; k = Partition[Sort@Flatten@Append[a, b], 2] + 30; m = Flatten[Range[First@#, Last@#] & /@ k]; positions = Partition[Complement[Range[Length@seq], m], 1]; str = Delete[Characters[seq], Partition[m, 1]]; StringJoin[str] ] II. mRNA?Amino Acid Translation Nucleotides (A,C,G,U) make triplets called codons. 61 of the 64 possible triplets code for amino acids (one codes for Met, which is the encoded by the codon AUG and in most of the cases that is the start codon), while the other 3 are stop codons, which stop the translation. Ribosome[seq_String] := Module[ {findStartCodon, concatinateSeq, translate, findStopCodon, sp, codonlist, prot, out}, findStartCodon[seq2_String] :=First@Flatten@StringPosition[seq2, "AUG"]; concatinateSeq[n_Integer, seq1_String] := StringPartition[StringTake[seq1, {n, -1}], 3]; translate[codons_List] := CodontoAminoAcid[#] & /@ codons; findStopCodon[prot_List] := Take[prot, {1, First[Flatten@Position[prot, "*"]]}]; sp = findStartCodon[seq]; codonlist = concatinateSeq[sp, seq]; prot = translate[codonlist]; out = findStopCodon[prot]; out ] Final Step: Primary RNA ? Amino - Acid Wrapping up all of the processes ? RNAtoAA[seq_String] := Module[ {spliced}, spliced = Spliceosome[seq]; Ribosome[spliced] ]

POSTED BY: Elen Vardanyan

1 Reply

Sort By:

EDITORIAL BOARD

EDITORIAL BOARD, WOLFRAM

Posted 9 years ago

- you earned "Featured Contributor" badge, congratulations ! This is a great post and it has been selected for the curated Staff Picks group. Your profile is now distinguished by a "Featured Contributor" badge and displayed on the "Featured Contributor" board.

POSTED BY: EDITORIAL BOARD

Reply to this discussion

Reply Preview

Attachments

Remove Add a file to this post

Follow this discussion

or Discard

Feedback

[WSSA16] Emulation of the 'RNA to Amino Acid Sequence' pathway

ABSTRACT

I. Exon-Intron Splicing

Data Acquisition

Classifier

Artificial Neural Network Design and Training (Optional)

Sliding Window

Donor Site and Acceptor Site PWMs

Removing the wrong Splice Sites

Final Step: Splicing Introns and Joining Exons

II. mRNA?Amino Acid Translation

Final Step: Primary RNA ? Amino - Acid