Stem Loc AMA
Stemloc-AMA
Stemloc-AMA is a multiple alignment mode included as part of stemloc. Stemloc-AMA uses sequence annealing, a technique introduced by Schwartz and Pachter in the AMAP program for protein multiple alignment, to search the space of multiple alignments. Sequence annealing avoids many of the well-known shortcomings of naive progressive alignment, including its inability to correct mistakes and poor handling of indels.
Stemloc-AMA uses the pairwise structural alignment implemented in stemloc to obtain the posterior probabilities that pairs of characters are aligned (for all pairs of sequences). The sequence annealing technique then efficiently searches the space of multiple alignments to find an alignment which (locally) maximizes the expected alignment metric accuracy (Schwartz, Myers and Pachter), an objective function which can be tuned to the desired sensitivity/specificity tradeoff.
Benchmarking against BRalibaseII reveals that Stemloc-AMA has sensitivity comparable to that of the best competing methods and is more specific for all choices of the sensitivity/specificity tradeoff.
Most alignment programs attempt to maximize sensitivity, even at the expense of specificity. This can easily be seen by a simple test: Pick several non-homologous sequences and tell your alignment program of choice to align them. Most alignment programs will align everything, despite the lack of homology.
Stemloc-AMA attempts to avoid this problem of over-aligning by only aligning characters with clear homology. To test this, we used it to align two tRNA and two Group II intron sequences. Correctly inferring the lack of homology between tRNAs and Group II introns, it separately aligns the two tRNAs and the two Group II introns and then leaves them as two distinct alignments.
Here is a movie of Stemloc-AMA in action.
Downloading Stemloc-AMA
Stemloc-AMA is implemented as a multiple alignment mode in stemloc, our RNA structural alignment program. stemloc is freely available under the GNU GPL as part of the DART package, available from here.
Quick user's guide
Stemloc-AMA is invoked with the '--ama' command-line option to stemloc. It accepts FASTA format input files and outputs Stockholm format alignments.
Lazy user's recommend usage:
stemloc --ama --nfold 1000 --nalign 100 --noshow-intermediates -log 6 myfile.fastaIf you run into memory errors, then try reducing nfold and nalign.
Basic usage, showing all intermediates alignments produced by sequence annealing:
stemloc --ama myfile.fasta
Basic usage, suppressing intermediate alignments:
stemloc --ama --noshow-intermediates myfile.fasta
Constrained usage:
stemloc --ama --nfold 1000 --nalign 100 myfile.fasta
Constrained usage, more sensitive:
stemloc --ama --gapfactor 0 --nfold 1000 --nalign 100 myfile.fasta
Constrained usage, more specific:
stemloc --ama --gapfactor 5 --nfold 1000 --nalign 100 myfile.fasta
Trying using the flag '-log 4' when running stemloc to see progress reports, details of the algorithms, etc. Any number from 0 to 10 is an acceptable log level.
There are many more options, including local structural alignment and parameter training on curated alignments. More information about these options and the underlying stemloc program can be found at Stemloc Tutorial.
Please contact us with questions.
Citation
- Bradley et al.: Specific alignment of structured RNA: stochastic grammars and sequence annealing. Bioinformatics 2008;24:2677-83.
- StemlocAMA preprint
-- Robert Bradley - 09 May 2008