Xrate Tests

From Biowiki
Jump to: navigation, search

xrate tests

You can run these tests by downloading dart (see Downloading Dart) and typing the following

cd dart
make xrate
cd src/ecfg
make test

The tests follow the Dart Test Format and are handled by this Perl script. See also the README file describing the test file formats.

---

Simulated data tests

Test: Reversible ACGT cycler

# STOCKHOLM 1.0
seq1	 ACGTATGC
seq2	 CGTATGCA
seq3	 GTACGCAT
seq4	 TACGCATG
seq5	 ACGTATGC
#=GF NH (seq1:.01,(seq2:.01,(seq3:.01,(seq4:.01,seq5:1.01):1):1):1);
//
P(i@t=0) i Rate(i to j)
.25 A -1 .5 0 .5
.25 C .5 -1 .5 0
.25 G 0 .5 -1 .5
.25 T .5 0 .5 -1

What are the eigenvalues of this?

Test: No substitution information

# STOCKHOLM 1.0
seq-1	 A
seq-2	 A
seq-3	 A
seq-4	 A
#=GF NH ((seq-4:0.20872,seq-2:0.20199):0.37854,seq-3:0.76145,seq-1:1.23173);
//
P(i@t=0) i Rate(i to j)
1 A 0 0 0 0
0 C 0 0 0 0
0 G 0 0 0 0
0 T 0 0 0 0

Note: this test currently fails

  • Optional alternative is to add pseudocounts (amounting to a simple prior) yielding something like
P0  i  Rate(i->j)

[.97] A (-.03  .01  .01  .01)
[.01] C ( .01 -.03  .01  .01)
[.01] G ( .01  .01 -.03  .01)
[.01] T ( .01  .01  .01 -.03)
  • Actual error message:
There has been an exception with no handler - exiting

An exception has been thrown

Runtime error:- detected by Newmat: process fails to converge
[[Matrix Type]] = Sym	 # Rows = 20; # Cols = 20
Trace: Jacobi.
  • Presumably New Mat choking on a singular counts matrix?

Where in code is this happening? - Ian Holmes

Test: Unidirectional ACGT cycler

# STOCKHOLM 1.0
seq1	 A
seq2	 C
seq3	 G
seq4	 T
seq5	 A
#=GF NH (seq1:.01,(seq2:.01,(seq3:.01,(seq4:.01,seq5:1.01):1):1):1);
//
P(i@t=0) i Rate(i to j)
1 A -1 1 0 0
0 C 0 -1 1 0
0 G 0 0 -1 1
0 T 1 0 0 -1

Note: this test currently fails

What are the eigenvalues of the rate matrix, out of curiosity?

  • The reversible version of xrate should give something like this:
P0	 i  Rate(i->j)

[.25] A (-1 .5  0 .5)
[.25] C (.5 -1 .5  0)
[.25] G ( 0 .5 -1 .5)
[.25] T (.5  0 .5 -1)

...since every branch is forcibly reversed during EM.

---

Real data tests

Test: Nucleotide substitution in fruitflies

  • Origin: Ian Holmes
  • Input: fruitfly genomes
  • Output: ?
  • Comment: not expected to be quick; may overload xrate; perhaps add another test where we look at alignments individually?
  • Xrate has been exercised using a portion of the Pachter group's Eight-way MAVID alignments. The tests were run on a PowerMac G5 with 8G of memory and a Debian box with a 2.4GHz Intel Pentium 4 CPU and 512K of memory. Xrate is memory intensive; all of the substantial DNA tests run much faster on the Mac with 8G memory. On the Mac, xrate processed an alignment of 8 DNA sequences, each of length 500,000 characters, and an initial tree, in 11 minutes, with 98% CPU usage. On the same Mac, xrate was unable to process an eight species alignment, each with 1,320,000 characters, due to lack of memory.
  • On the Debian box, xrate successfully processed an 8 species alignment, 200,000 characters each; this took 2 hours 5 minutes of real time; average CPU usage was only 13% due to extensive swapping. On the Mac, the same alignment was processed in 3 minutes. The xrate output on Mac differed from that on Debian in the fifth decimal place (maximum difference 3.6*10-5).

Test: Nucleotide substitution in primate globin pseudogenes

Test: Amino acid substitution in globular proteins

Test: Amino acid substitution in transmembrane proteins