Xrate Tests
From Biowiki
xrate tests
You can run these tests by downloading dart (see Downloading Dart) and typing the following
cd dart make xrate cd src/ecfg make test
The tests follow the Dart Test Format and are handled by this Perl script. See also the README file describing the test file formats.
---
Simulated data tests
Test: Reversible ACGT cycler
# STOCKHOLM 1.0 seq1 ACGTATGC seq2 CGTATGCA seq3 GTACGCAT seq4 TACGCATG seq5 ACGTATGC #=GF NH (seq1:.01,(seq2:.01,(seq3:.01,(seq4:.01,seq5:1.01):1):1):1); //
P(i@t=0) | i | Rate(i to j) | |||
.25 | A | -1 | .5 | 0 | .5 |
.25 | C | .5 | -1 | .5 | 0 |
.25 | G | 0 | .5 | -1 | .5 |
.25 | T | .5 | 0 | .5 | -1 |
What are the eigenvalues of this?
Test: No substitution information
# STOCKHOLM 1.0 seq-1 A seq-2 A seq-3 A seq-4 A #=GF NH ((seq-4:0.20872,seq-2:0.20199):0.37854,seq-3:0.76145,seq-1:1.23173); //
P(i@t=0) | i | Rate(i to j) | |||
1 | A | 0 | 0 | 0 | 0 |
0 | C | 0 | 0 | 0 | 0 |
0 | G | 0 | 0 | 0 | 0 |
0 | T | 0 | 0 | 0 | 0 |
Note: this test currently fails
- Optional alternative is to add pseudocounts (amounting to a simple prior) yielding something like
P0 i Rate(i->j) [.97] A (-.03 .01 .01 .01) [.01] C ( .01 -.03 .01 .01) [.01] G ( .01 .01 -.03 .01) [.01] T ( .01 .01 .01 -.03)
- Actual error message:
There has been an exception with no handler - exiting An exception has been thrown Runtime error:- detected by Newmat: process fails to converge [[Matrix Type]] = Sym # Rows = 20; # Cols = 20 Trace: Jacobi.
- Presumably New Mat choking on a singular counts matrix?
Where in code is this happening? - Ian Holmes
Test: Unidirectional ACGT cycler
- Origin: Ian Holmes
- Input: DartSrc:ecfg/t/Unidirectional_acgt_cycler.stockholm
# STOCKHOLM 1.0 seq1 A seq2 C seq3 G seq4 T seq5 A #=GF NH (seq1:.01,(seq2:.01,(seq3:.01,(seq4:.01,seq5:1.01):1):1):1); //
P(i@t=0) | i | Rate(i to j) | |||
1 | A | -1 | 1 | 0 | 0 |
0 | C | 0 | -1 | 1 | 0 |
0 | G | 0 | 0 | -1 | 1 |
0 | T | 1 | 0 | 0 | -1 |
Note: this test currently fails
What are the eigenvalues of the rate matrix, out of curiosity?
- The reversible version of xrate should give something like this:
P0 i Rate(i->j) [.25] A (-1 .5 0 .5) [.25] C (.5 -1 .5 0) [.25] G ( 0 .5 -1 .5) [.25] T (.5 0 .5 -1)
...since every branch is forcibly reversed during EM.
---
Real data tests
Test: Nucleotide substitution in fruitflies
- Origin: Ian Holmes
- Input: fruitfly genomes
- Eight-way MAVID alignments from Pachter group
- Median Ks tree from Dan Pollard, Eisen group
- Need to get these together in a single huge Stockholm file
- Output: ?
- Comment: not expected to be quick; may overload xrate; perhaps add another test where we look at alignments individually?
- Xrate has been exercised using a portion of the Pachter group's Eight-way MAVID alignments. The tests were run on a PowerMac G5 with 8G of memory and a Debian box with a 2.4GHz Intel Pentium 4 CPU and 512K of memory. Xrate is memory intensive; all of the substantial DNA tests run much faster on the Mac with 8G memory. On the Mac, xrate processed an alignment of 8 DNA sequences, each of length 500,000 characters, and an initial tree, in 11 minutes, with 98% CPU usage. On the same Mac, xrate was unable to process an eight species alignment, each with 1,320,000 characters, due to lack of memory.
- On the Debian box, xrate successfully processed an 8 species alignment, 200,000 characters each; this took 2 hours 5 minutes of real time; average CPU usage was only 13% due to extensive swapping. On the Mac, the same alignment was processed in 3 minutes. The xrate output on Mac differed from that on Debian in the fifth decimal place (maximum difference 3.6*10-5).
- Ian Holmes, 12/18/2005: setting up GNU Makefile patterns for Pachter group's most recent web-published 9-way alignments
Test: Nucleotide substitution in primate globin pseudogenes
- Origin: Ian Holmes
- Input:
- Globin pseudogene alignments from Lars Arvestad and Bill Bruno
- Let xrate estimate the tree
- Output: ?