|
Rna Parse Introduction
Secondary Structure vs. Linear Nucleotide Composition
Consider the following functional, NC RNA where: n= Any nucleotide {a,u,g,c} c= W/C or wobble pair compliment {a-u, u-a, g-c, c-g, g-u, u-g} Assuming structure equals or usually equals function, the actual linear composition of the molecule is unimportant save that loops remain loops and stem remain compliment-pairs. Viewed as structure rather than as a linear string of augc's, its easy to see this 162 nucleotide molecule may be represented in several hundred million different ways. Thus, a linear sequence search for "similarly matching" strings becomes useless. Thus,the need to be able to match structure rather than sequence.
Testing all possible nucleotide sequences that could possibly form the molecule above is impossible, matching structure via grammatical predicates is quite easy.
RNA is treated as if it where a language:Languages can be classified by order of complexity and of interdependency.
English sentences are more than simple strings of words; they are interwoven expressions constructed from nouns, pronouns, verbs, adverbs, et cetera, that impart or convey meaning, often tied contextually to other expressions or conditions. While all biosequences are linear, one character following the next, each individual component may bond with another to form more complex structures, from simple loops through the extremely complex tertiary shapes such as globular proteins and pseudo-knots.
RNA structure is matched via grammatical rules of bonding and match independent of nucleotide sequence provided nucleotides are compliments. These methods do not use MSA, lowest energy configurations, and so forth unless used as an adjunct to researching particular structures. The grammatical methods used here are not intended for new-motif discovery but can possibly discover variations in known RNA motifs including multiple crossing, or pseudoknoted structure or simple tandem repeats or tandem-compliment repeats.
Since we do not try every possible combination of nucleotides (Which is NP-complete and computationally exponential.) but match a given structure itself, the method is linear and is able to scan entire small genomes for matches in seconds. Thus, a simple structure, ((((::::::)))) is parsed in about the same O(n) time and space as a pseudoknot ((((:::[[[))))::::::::::]]]. Without delving into the grammars and methods themselves, the rules these grammars use are set with the minimal-possible complexity to match the desired structure, where: A-U e.g. "A" being a compliment of "U" U-A C-G G-C
The loop-stem is the basic structural unit of all RNA molecules is used as an example of a canonical palindrome with "filler," i.e. loops and bulges:
[The sequence 5' AUGCAUGCAUGCAU 3']
Suppose we wish to find a similar structure, there are several ways to go about it - save the original sequence doesn't vary too much. (Noting that a BLAST or MSA method search will fail if the target sequence varies by more than a few nucleotides.) Ideally, search methods ignore specific linear sequence composition and rather look for the possible stem-loop structure itself: C= complements (as above) and X= A,U,G,C. Thus, the grammar says "match a stem of four nucleotides and a loop of six nucleotides," eliminating the need to try all possible combinations of the 4 nucleotides until they "fit" the desired secondary structure. This method is linear and fast and with the added power to add predicates that allow both the stem and loop to vary, can just as easily cover an exponentially increasing number of possible sequences that match the target structure.
[The loop sequence 5' CCCCXXXXXCCCC 3'] Most RNA functional structures are not so simple in reality. The structure below is matched using grammatical methods independently of nucleotide makeup. Of note, its probably a moot point to further verify matched structures per lowest-energy methodology due to the low probability that any random nucleotide sequence would possess the same stem/loop characteristics (Nature is conservative and uses what works.)
|