xrate.cc: first after int main(): // initialise the options parser // INIT_OPTS_LIST (opts, argc, argv, 1, "[options] ", "estimate a substitution rate matrix from an alignment database using Expectation Maximisation\n") see util/opts.h: action lines are: #define INIT_CONSTRUCTED_OPTS_LIST(OPTS,ARGS,SYNTAX,SHORTDESC) OPTS.short_description = (SHORTDESC); OPTS.syntax = (SYNTAX); OPTS.expect_args = (ARGS); SET_VERSION_INFO(OPTS); Rnd::add_opts(OPTS); OPTS.newline(); Log_stream::add_opts(OPTS); #define INIT_TYPED_OPTS_LIST(OPTS_TYPE,OPTS,ARGC,ARGV,ARGS,SYNTAX,SHORTDESC) OPTS_TYPE OPTS(ARGC,ARGV); INIT_CONSTRUCTED_OPTS_LIST(OPTS,ARGS,SYNTAX,SHORTDESC) #define INIT_OPTS_LIST(OPTS,ARGC,ARGV,ARGS,SYNTAX,SHORTDESC) INIT_TYPED_OPTS_LIST(Opts_list,OPTS,ARGC,ARGV,ARGS,SYNTAX,SHORTDESC) decode: Opts_list opts(argc,argv); Opts_list.short_description = (...); Opts_list in opts_list.h: some stuff appears straightforward - not familiar with: // builder methods to add options, with comments in help text void add (const char* opt, bool& var, const char* desc = 0, bool show_default = 1, const char* negopt = 0, const char* negdesc = 0); see opts.add () calls in xrate.cc - don't worry about it. in -E (preprocessed) output: see Rnd::add_opts(opts); in util/rnd.h: class Rnd in util/rnd.cc: void Rnd::add_opts (Opts_list& ol) xrate.cc: first substantial operation try { if (!opts.parse()) { cerr << opts.short_help(); exit(1); } } in opts_list.cc: bool Opts_list::parse() { const Parse_status stat = parse_opt (opt); mysterious. near the end: for_contents (vector, post_parse_callback, callback) parsed_ok &= (*callback) (this); for_contents a macro in macros.h: a generic for loop. Can we set a break point on Opts_list::parse()? yes, hits it after sstring opt = next_string(); next is if (opt[0] != '-') gdb: print opt[0] gives a seg fault. give up on examining opt - would be nice to see. in opts_list.cc: 367 for_contents (vector, post_parse_callback, callback) parsed_ok &= (*callback) (this); s goes to: /usr/local/lib/gcc/i686-pc-linux-gnu/3.4.3/../../../../include/c++/3.4.3/bits/stl_vector.h:330 then to: Rnd::force_seed(Opts_list*) (ol=0xbffff664) at /home/pete/work/dart_cvs/dart/src/util/rnd.cc:41 Opts_list::parse() (this=0xbffff664) at /home/pete/work/dart_cvs/dart/src/util/opts_list.cc:369 didn't see where actually parsed - maybe doesn't if no '-' found. Back to xrate.cc - 105 ifstream align_db_in (alignment_db_filename); 107 stock_db.read (align_db_in, seq_db); goes to /home/pete/work/dart_cvs/dart/src/seq/stockholm.cc:599 to 603 stock.read_Stockholm (in, seq_db); to /home/pete/work/dart_cvs/dart/src/seq/stockholm.cc:29 to 46 if (re_sep.Match(input.c_str())) ... 115 else if (re_stock.Match (input.c_str())) // ignore format-version identifiers the first line of rrm.sto is # STOCKHOLM 1.0 yes this matches: 116 found_format_version_id = TRUE; next line: blank first subsequent non-blank line: 68 const sstring& name = re_row[1]; what's an sstring? in util/sstring.h. to 90 gs_annot[name] = Annotation(); in stockholm.h: typedef map Annotation; // annotation, keyed by tag (e.g. "SS") 95 gapped_seq.push_back (Biosequence()); 98 path.append_row (vector()); what's that? In seq/alignment.h (called by stockholm.h) see: struct Alignment : Named_rows, Stream_saver ... Alignment_path path; Alignment_path also defined in alignment.h. 110 g->append (row_data); line 74: Biosequence* g; Biosequence is an sstring: investigate these - in sstring.h: class sstring : public string_with_streambuf, public ostream guess means sstring is derived from string_with_streambuf - see the top lines of sstring.h. how see what string is there? p re_row[2].sbuf gives lots of interesting information: $7 = { >> = {_vptr.basic_streambuf = 0x81b0ec8, _M_in_beg = 0x0, _M_in_cur = 0x0, _M_in_end = 0x0, _M_out_beg = 0x0, _M_out_cur = 0x0, _M_out_end = 0x0, _M_buf_locale = { static none = 0, static ctype = 1, static numeric = 2, static collate = 4, static time = 8, static monetary = 16, static messages = 32, static all = 63, _M_impl = 0x4010eb88, static _S_classic = 0x4010eb88, static _S_global = 0x4010eb88, static _S_categories = Cannot access memory at address 0x0 (gdb) x 0x81b0ec8 0x81b0ec8 <_ZTVN21string_with_streambuf21DART_string_streambufE+8>: 0x0804f3d0 (gdb) x 0x0804f3d0 0x804f3d0 <~DART_string_streambuf>: 0x83e58955 (gdb) x 0x83e58955 0x83e58955: Cannot access memory at address 0x83e58955 dead end. 111 remove_copy_if (row_data.begin(), row_data.end(), back_inserter(np_cur->seq), Alignment::is_gap_char); remove_copy_if described in Stroustroup p. 536 - call back_inserter if is_gap_char true. 112 transform (row_data.begin(), row_data.end(), back_inserter(*p), not1(ptr_fun(Alignment::is_gap_char))); transform: Stroustroup p. 531 - back_inserter contains the result of calling not1(ptr_fun(Alignment::is_gap_char))); (duh - negates). loop over input lines - (gdb) break stockholm.cc:122 122 path.make_flush(); comment in seq/alignment.h: void make_flush(); // pad out rows with 0's until all rows same length to (stockholm.cc): 604 if (stock.rows()) first time true - called push_rows second time false - to 612 update_index(); break stockholm.cc:667 - not interesting Summary of data handling in stockholm.cc lines 68, 69 split name, data line 79 and surrounding saves name, sequence at the end of the Sequence_database passed as an argument to void Stockholm::read_Stockholm: separates name, seq. done with the stockholm.cc stuff - back to main (argc=2, argv=0xbffffc14) at /home/pete/work/dart_cvs/dart/src/ecfg/xrate.cc:108 108 Tree_alignment_database align_db (seq_db); /home/pete/work/dart_cvs/dart/src/tree/tree_alignment.cc:484 looks like just initializes (allocates memory) 109 align_db.initialise_from_Stockholm_database (stock_db, FALSE); /home/pete/work/dart_cvs/dart/src/tree/tree_alignment.cc:499 499 clear(); 447 tree_align.clear(); 448 name.clear(); guess: tree_align, name part of struct Tree_alignment_database (tree_alignment.h) - the above initializing the object. 500 for_const_contents (list, stock.align, align) for_const_contents a macro (util/macros.h) this line preprocessed into: for ( list ::const_iterator align = ( stock.align ).begin(), DART_END_ITERalign = ( stock.align ).end(); align != DART_END_ITERalign; ++ align ) 503 tree_align.push_back (Tree_alignment()); 504 Tree_alignment& ta = tree_align.back(); comment above line 506: // copy alignment 506 ta.set_alignment (*align); /home/pete/work/dart_cvs/dart/src/tree/../tree/tree_alignment.h:57 57 void set_alignment (const Alignment& new_align) { align = new_align; reset_maps(); align_changed(); } at /home/pete/work/dart_cvs/dart/src/ecfg/../tree/tree_alignment.h:43 43 void reset_maps() { row2node = Phylogeny::Node_vector (align.rows(), -1); node2row = vector (tree.nodes(), -1); } 0x08084492 in Phylogeny::nodes() const (this=0x824dd8c) at /home/pete/work/dart_cvs/dart/src/ecfg/../tree/phylogeny.h:278 278 int nodes() const { return _sorted_nodes.size(); } 0x08115470 in Tree_alignment::set_alignment(Alignment const&) (this=0x824dd88, new_align=@0x82452a0) at /home/pete/work/dart_cvs/dart/src/tree/../tree/tree_alignment.h:57 57 void set_alignment (const Alignment& new_align) { align = new_align; reset_maps(); align_changed(); } at /home/pete/work/dart_cvs/dart/src/tree/tree_alignment.cc:35 35 void Tree_alignment::align_changed() { } - does it do anything? Tree_alignment_database::initialise_from_Stockholm_database(Stockholm_database const&, bool) (this=0xbffff1b4, stock=@0xbffff2e4, die_if_trees_missing=false) at /home/pete/work/dart_cvs/dart/src/tree/tree_alignment.cc:508 // check if the Stockholm alignment has a "#=GF NH" tag (comment before line 508) 508 Stockholm::Tag_index_map::const_iterator nh_index_set = align->gf_index.find (sstring (Stockholm_New_Hampshire_tag)); not taken. 522 if (die_if_trees_missing) 525 sstring align_name; 530 align_name << ALIGN_NAME_PREFIX << (int) (name.size() + 1); 500 for_const_contents (list, stock.align, align) 533 } not clear if anything really done here - no NH tag - main (argc=2, argv=0xbffffc14) at /home/pete/work/dart_cvs/dart/src/ecfg/xrate.cc:112 // create dummy alphabet 112 Alphabet dummy_alph ("dummy_alphabet", alph_chars.size()); 113 dummy_alph.init_chars (alph_chars.c_str()); where is Alphabet defined? check *.h in src/ecfg, util, hsm, tree; also /usr/include, /usr/include/bits, /sys, check *.cc in ecfg, util, hsm, tree - didn't try newmat nope, seq seq/biosequence.h bingo code in seq/biosequence.cc. 116 const Alphabet& init_alph = dummy_alph.size() ? dummy_alph : seq_db.detect_alphabet(); believe alphabet a user option - not defined here. at /home/pete/work/dart_cvs/dart/src/ecfg/../seq/biosequence.h:171 171 int size() const { return alphabet_size; } (gdb) p alphabet_size $12 = 0 main (argc=2, argv=0xbffffc14) at /home/pete/work/dart_cvs/dart/src/ecfg/xrate.cc:120 // initialise EM_matrix size, alphabet & number of rounds // (the effort of initialising the size & alphabet will be wasted if we later read an initial matrix from a file, but never mind) 120 EM_matrix em_matrix (C, init_alph.size(), max_fork); note: // no. of processes to fork (set to 1 to disable this feature, as pipes seem to be broken under gcc 3.3) #define max_fork 1 opts.add ("c classes -classes", C = 1, "number of dynamic hidden site classes"); (line 66) EM_matrix defined in hsm/em_matrix.h - from em_matrix.cc: EM_matrix::EM_matrix (int C, int A, int max_fork, const Tree_alignment_database* align_db, double timepoint_res) : Piper (max_fork), hidden_alphabet ("", 0) { what's a Piper? util/piper.h: a vector of file descriptors (i, o for each child), pids for children. 121 em_matrix.init_alphabet (init_alph); at /home/pete/work/dart_cvs/dart/src/hsm/em_matrix.cc:85 85 const char* nd_chars = base_alphabet.nondegenerate_chars(); after executing: nd_chars = "arndcqeghilkmfpstwyv" - where did that come from? 86 const char* d_chars = base_alphabet.degenerate_chars(); after executing: d_chars = "x*bz" 87 if ((int) base_alphabet.size() != A || (int) strlen (nd_chars) != A) the value of A is 20. 89 hidden_alphabet.reset (base_alphabet.name.c_str(), C*A); C is 1. see void Alphabet::reset (const char* new_name, int new_size) in seq/biosequence.cc - nothing unusual. 90 hidden_alphabet.case_sensitive = base_alphabet.case_sensitive; 91 int_string C_string (C); what's an int_string? in util/nstring.h: - looks like (sprintf (buf, "%d", n);) makes a string containing :the ascii value of C 92 hidden_alphabet.name << C_string; 93 sstring chars; 94 chars << nd_chars << d_chars; 95 hidden_alphabet.init_degen_chars (chars.c_str()); should be able to look inside these sstring guys: try p chars.c_str Segmentation fault restart, break xrate.cc:120 then to hsm/em_matrix.cc:95 - look at sstring chars (gdb) p chars: $1 = { = {,std::allocator >> = {static npos = Cannot access memory at address 0x916a174 (gdb) info address chars Symbol "chars" is a variable at offset -408 from register ebp. (gdb) p &chars $2 = (sstring *) 0xbfffe4d0 (gdb) info symbol 0xbfffe4d0 No symbol matches 0xbfffe4d0. (gdb) whatis chars type = sstring (gdb) whatis chars.basic_string Type ios_base has no component named basic_string. (gdb) whatis chars->basic_string Type ios_base has no component named basic_string. (gdb) ptype chars type = class sstring : public string_with_streambuf, public basic_ostream > { public: sstring(int, const void **); sstring(int, const void **, size_t, char); etc. (gdb) info scope EM_matrix::init_alphabet Scope for EM_matrix::init_alphabet: Symbol nd_chars is a variable at offset -12 from register $ebp, length 4. Symbol d_chars is a variable at offset -16 from register $ebp, length 4. etc. f nice info where you are. (gdb) set print pretty on maybe helps. (gdb) p chars $7 = { = { ,std::allocator >> = { static npos = Cannot access memory at address 0x916a174 (gdb) p chars.char Segmentation fault (gdb) break em_matrix.cc:83 continue to 95: (gdb) p chars.sbuf $1 = { >> = {_vptr.basic_streambuf = 0x81b0cc8, _M_in_beg = 0x0, _M_in_cur = 0x0, _M_in_end = 0x0, _M_out_beg = 0x0, _M_out_cur = 0x0, _M_out_end = 0x0, _M_buf_locale = { static none = Internal: global symbol `_ZNSt6locale4noneE' found in /usr/local/src/gcc-3.4.3/libstdc++-v3/src/locale.cc psymtab but not in symtab. _ZNSt6locale4noneE may be an inlined function, or may be a template function (if a template, try specifying an instantiation: _ZNSt6locale4noneE). (gdb) x 0x81b0cc8 0x81b0cc8 <_ZTVN21string_with_streambuf21DART_string_streambufE+8>: 0x0804f3d0 (gdb) x 0x0804f3d0 0x804f3d0 <~DART_string_streambuf>: 0x83e58955 (gdb) x 0x83e58955 0x83e58955: Cannot access memory at address 0x83e58955 (gdb) x chars.sbuf.basic_streambuf cannot resolve overloaded method `basic_streambuf': no arguments supplied (gdb) p chars.s Segmentation fault (gdb) set demangle-style gnu - prev auto - no apparent benefit. (gdb) set print object on - prev off - no apparent benefit. (gdb) p chars.sbuf.basic_streambuf cannot resolve overloaded method `basic_streambuf': no arguments supplied give up for now - believe can do get, put from streams, appears not trivial to see what's available to get. continue stepping: Breakpoint 1, EM_matrix::init_alphabet(Alphabet const&) (this=0xbfffe894, base_alphabet=@0x8210280) at /home/pete/dart/src/hsm/em_matrix.cc:95 95 hidden_alphabet.init_degen_chars (chars.c_str()); (gdb) s std::string::c_str() const (this=0xbfffe4d4) at /usr/local/gcc-3.4.3/i686-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:261 261 { return _M_dataplus._M_p; } (gdb) p _M_dataplus._M_p $1 = 0x826887c "arndcqeghilkmfpstwyvx*bz" looks like chars.c_str() (gdb) s 1456 c_str() const { return _M_data(); } (gdb) s std::string::c_str() const (this=0xbfffe4d4) at /usr/local/gcc-3.4.3/i686-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:261 261 { return _M_dataplus._M_p; } Alphabet::init_degen_chars(char const*) (this=0xbfffea08, chars=0x826887c "arndcqeghilkmfpstwyvx*bz") at /home/pete/work/dart_cvs/dart/src/seq/biosequence.cc:353 retry: from 92 hidden_alphabet.name << C_string; ... 95 hidden_alphabet.init_degen_chars (chars.c_str()); Alphabet::init_degen_chars(char const*) (this=0xbfffea08, chars=0x82671bc "arndcqeghilkmfpstwyvx*bz") at /home/pete/dart/src/seq/biosequence.cc:353 353 const int len = strlen (chars); ... 360 degen_lc.to_lower(); sstring::to_lower() (this=0xbfffeb98) at /home/pete/dart/src/util/sstring.cc:14 14 for_contents (sstring, *this, c) pretty straightforward: back to biosequence.cc 361 for (int i = 0; i < len; ++i) 362 char2int_table[degen_lc[i]] = char2int_table[toupper(degen_lc[i])] = -i-1; where is char2int_table? yes, there is a helpful comment in seq/biosequence.h - part of the Alphabet object - degenerate elements have negative numbers. (gdb) p char2int_table $8 = { <_Vector_base >> = { _M_impl = { > = { > = {}, }, members of _Vector_impl: _M_start = 0x825c568, _M_finish = 0x825c968, _M_end_of_storage = 0x825c968 } }, } x works (e.g. 0x825c5a0:) see a bunch of 0x14 think default (unknown) alphabet-size Restart, with breakpoint biosequence.cc:351 - init_degen_chars called with chars "nx*urymkswhbvd" - don't understand why all these are degenerate - after looping through seting the desired char2int_table values, 364 Symbol_score_map wildcard_score_map = flat_score_map (0); function flat_score_map at biosequence.cc:607 - what's a Symbol_score_map? biosequence.h: typedef map Symbol_score_map; Score? see file util/score.h: typedef int Score; where's the constructor for a Symbol_score_map? How big is it? couldn't step inside the one-liner for (int s = 0; s < size(); s++) m.insert (Symbol_score (s, score)); - could break up into more lines - note there's a size method in biosequence: int size() const { return alphabet_size; } saw alphabet_size of 4 (?) 365 Symbol_weight_map wildcard_weight_map = flat_weight_map (1.0); in biosequence.h: typedef map Symbol_weight_map; also in util/score.h: typedef double Prob; Alphabet::flat_weight_map(double) const (this=0x820e400, weight=1) at /home/pete/dart/src/seq/biosequence.cc:516 516 Symbol_weight_map m; flat_weight_map just like flat_score_map - same mysterious size of 4. Maybe it grows (min for nucleotides). 366 degen_weight = vector (len, wildcard_weight_map); remember len is 14 367 degen_score = vector (len, wildcard_score_map); degen_weight, degen_score are part of the Alphabet object (biosequence.h). Guess (Stroustroup p. 448) the default vector entry (len=14 of them) contains wildcard_weight_map in the case of degen_weight. DNA_alphabet_type (this=0x820e400) at /home/pete/dart/src/seq/biosequence.cc:712 712 set_unknown ('n'); (gdb) bt #0 DNA_alphabet_type (this=0x820e400) at /home/pete/dart/src/seq/biosequence.cc:712 #1 0x08138aef in __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at /home/pete/dart/src/seq/biosequence.cc:773 #2 0x08138d16 in _GLOBAL__I__ZNK21Digitized_biosequence10null_scoreERKSt6vectorIiSaIiEE () at /home/pete/dart/src/seq/biosequence.cc:558 #3 0x081999d8 in __do_global_ctors_aux () #4 0x0804bba2 in _init () look at the source for DNA_alphabet_type - clears up some stuff - looks like this initialized at beginning - look for init_degen_chars calls - in DNA_alphabet_type::DNA_alphabet_type() : Alphabet ("DNA", 4): init_degen_chars ("nx*urymkswhbvd"); also in RNA_alphabet_type::RNA_alphabet_type() : Alphabet ("RNA", 4) init_degen_chars ("nx*trymkswhbvd"); in Protein_alphabet_type::Protein_alphabet_type() : Alphabet ("Protein", 20) init_degen_chars ("x*bz"); in Text_alphabet_type::Text_alphabet_type() : Alphabet ("Non-biological text", 52) init_chars ("abcdefghijklmnopqrstuvwxyz,.:;+-/'!?_$&()@0123456789", ""); init_degen_chars ("~*\""); Run quickly through the Protein_alphabet_type constructor: break Protein_alphabet_type::Protein_alphabet_type - didn't hit it. start again - b _init - hit quickly - n takes to __libc_start_main - from there n goes a while restart - break _init when hits it break biosequence.cc:313 (first line of init_chars) see calls for DNA, RNA, protein, text. break Protein_alphabet_type::Protein_alphabet_type /home/pete/dart/src/seq/biosequence.cc:753 not hit - try break biosequence.cc:756 no break biosequence.cc:313 (init_chars) hit it - concentrate on protein see DNA (acgt) RNA (acgu), protein (arndcqeghilkmfpstwyv) - len 20 CHAR2INT_TABLE_SIZE 256 - init all entries to 20. char2int_table entries in the protein alphabet set to their offset in string arndcqeghilkmfpstwyv. 344 nondegen_comp.clear(); proteins - no comp string Protein_alphabet_type (this=0x820ebc0) at /home/pete/dart/src/seq/biosequence.cc:757 757 init_degen_chars ("x*bz"); at /home/pete/dart/src/seq/biosequence.cc:353 353 const int len = strlen (chars); since not case sensitive: for (int i = 0; i < len; ++i) char2int_table[degen_lc[i]] = char2int_table[toupper(degen_lc[i])] = -i-1; char2int_table entries negative, offset into x*bz string - 1 (both upper, lower case). 364 Symbol_score_map wildcard_score_map = flat_score_map (0); see above notes - here 20 entries made 365 Symbol_weight_map wildcard_weight_map = flat_weight_map (1.0); 366 degen_weight = vector (len, wildcard_weight_map); beleive is creating a vector of Symbol_weight_map entries of length 4, each set to a weight map (20 entries) containing Prob 1.0. 367 degen_score = vector (len, wildcard_score_map); likewise a vector of Symbol_score_map entries, length 4 Protein_alphabet_type (this=0x820ebc0) at /home/pete/dart/src/seq/biosequence.cc:758 758 set_unknown ('x'); Alphabet::set_unknown(char) (this=0x820ebc0, unknown_char=120 'x') at /home/pete/dart/src/seq/biosequence.cc:414 414 Symbol_weight_map p = flat_weight_map (1.0 / (double) alphabet_size); 415 set_degen_char (unknown_char, p); Alphabet::set_degen_char(char, std::map, std::allocator > > const&) (this=0x820ebc0, degen_char=120 'x', p=@0xbffffb30) at /home/pete/dart/src/seq/biosequence.cc:386 386 const int i = char2int_deg (degen_char); char2int_deg an inline function from biosequence.h:216 - the function gives the contents in the char2int_table for the input char (here 'x') returns -1 stored in i; 388 degen_weight[-i-1] = p; saves the (input) Symbol_weight_map (entries all init. to 1/alphabet size) in degen_weight, part of the Alphabet object. 389 Symbol_score_map sc; 390 for_const_contents (Symbol_weight_map, p, sw) 391 sc[sw->first] = Prob2Score (sw->second); preprocessed into: for ( Symbol_weight_map ::const_iterator sw = ( p ).begin(), DART_END_ITERsw = ( p ).end(); sw != DART_END_ITERsw; ++ sw ) sc[sw->first] = Score_fns::prob2score(sw->second); prob2score in util/score.h - basically the score is the log of the probability (here 1/20), with some overflow/underflow protection, multiplied by DartScore2NatsRatio which is (DartScore2BitsRatio / Loge2) : DartScore2BitsRatio is 1000, Loge2 the natural (base e) log of 2. see above: typedef map Symbol_score_map; looks like the real initialization - double prob stored as int score. 392 degen_score[-i-1] = sc; remember i is -1 - store away this score map. // check to see if this is the unknown char 394 if (unknown_int >= alphabet_size && sc == flat_score_map (-Prob2Score(alphabet_size))) yes the tests were passed 395 unknown_int = i; Protein_alphabet_type (this=0x820ebc0) at /home/pete/dart/src/seq/biosequence.cc:759 759 set_wildcard ('*'); Alphabet::set_wildcard(char) (this=0x820ebc0, wildcard_char=42 '*') at /home/pete/dart/src/seq/biosequence.cc:408 408 Symbol_weight_map p = flat_weight_map (1.0); 409 set_degen_char (wildcard_char, p); Here the '*' or wildcard character is index -2 - same processing as for 'x'. Protein_alphabet_type (this=0x820ebc0) at /home/pete/dart/src/seq/biosequence.cc:760 760 set_degen_char ('b', "nd"); Alphabet::set_degen_char(char, char const*) (this=0x820ebc0, degen_char=98 'b', possibilities=0x81a3293 "nd") at /home/pete/dart/src/seq/biosequence.cc:400 400 Symbol_weight_map p; 401 const int n_poss = strlen (possibilities); 402 for (int i = 0; i < n_poss; ++i) p[char2int_strict(possibilities[i])] = 1.0 / (double) n_poss; 'b' either 'n' or 'd' asn or asp prob is 1/2; 403 set_degen_char (degen_char, p); Alphabet::set_degen_char(char, std::map, std::allocator > > const&) (this=0x820ebc0, degen_char=98 'b', p=@0xbffffb30) at /home/pete/dart/src/seq/biosequence.cc:386 386 const int i = char2int_deg (degen_char); now i is -3 388 degen_weight[-i-1] = p; save this map also. 390 for_const_contents (Symbol_weight_map, p, sw) 391 sc[sw->first] = Prob2Score (sw->second); ... 761 set_degen_char ('z', "qe"); 762 } (gdb) s __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535) at /home/pete/dart/src/seq/biosequence.cc:776 776 Text_alphabet_type Text_alphabet; this is where these alphabets are set up. Back to xrate main - start new debugging session to 109 align_db.initialise_from_Stockholm_database (stock_db, FALSE); 112 Alphabet dummy_alph ("dummy_alphabet", alph_chars.size()); Alphabet (this=0xbfffedf4, name=0x819a10c "dummy_alphabet", size=0) at /home/pete/dart/src/seq/biosequence.cc:288 288 reset (name, size); Alphabet::reset(char const*, int) (this=0xbfffedf4, new_name=0x819a10c "dummy_alphabet", new_size=0) at /home/pete/dart/src/seq/biosequence.cc:293 293 name = new_name; 294 alphabet_size = new_size; note this is zero. main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:113 113 dummy_alph.init_chars (alph_chars.c_str()); Alphabet::init_chars(char const*, char const*) (this=0xbfffedf4, chars=0x4010f634 "", comp=0x8199b2b "") at /home/pete/dart/src/seq/biosequence.cc:313 313 const int len = strlen (chars); 317 char2int_table[c] = alphabet_size; sets all to zero ... main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:116 116 const Alphabet& init_alph = dummy_alph.size() ? dummy_alph : seq_db.detect_alphabet(); (gdb) s 0x0804ebc7 in Alphabet::size() const (this=0xbfffedf4) at /home/pete/dart/src/ecfg/../seq/biosequence.h:171 171 int size() const { return alphabet_size; } alphabet_size still 0 Sequence_database::detect_alphabet() const (this=0xbffff334) at /home/pete/dart/src/seq/biosequence.cc:1224 1224 vector alph; ... 1236 for (int i = 0; i < (int) alph.size(); ++i) 1238 const double sim = alph_weight[i] * alphabet_similarity (*alph[i]); Sequence_database::alphabet_similarity(Alphabet const&) const (this=0xbffff334, alphabet=@0x820e400) at /home/pete/dart/src/seq/biosequence.cc:1214 1214 double score = 0; 1215 for_const_contents (Sequence_database, *this, np) preprocessed into for ( Sequence_database ::const_iterator np = ( *this ).begin(), DART_END_ITERnp = ( *this ).end(); np != DART_END_ITERnp; ++ np ) 1216 for_const_contents (Biosequence, (*np).seq, c) for ( Biosequence ::const_iterator c = ( (*np).seq ).begin(), DART_END_ITERc = ( (*np).seq ).end(); c != DART_END_ITERc; ++ c ) 261 { return _M_dataplus._M_p; } (gdb) p _M_dataplus._M_p $12 = 0x8264c1c "IYIGNLNRELTEGDILTVFSEYGVPVDVILSRDENTGESQGFAYLKYEDQRSTILAVDNLNGFKIGGRALKI" through VILSRD is the first line of the alignment with the '.' chars removed, continuing (multiple rows for YIS5_YEAST/33-104 since continues) Sequence_database::alphabet_similarity(Alphabet const&) const (this=0xbffff334, alphabet=@0x820e400) at /home/pete/dart/src/seq/biosequence.cc:1217 1217 if (alphabet.contains_strict (*c)) score += 1.0; *c the first char 'I' Alphabet::contains_strict(char) const (this=0x820e400, c=73 'I') at /home/pete/dart/src/ecfg/../seq/biosequence.h:182182 const int i = char2int_table[c]; 183 return i >= 0 && i < alphabet_size; here i is 4, alphabet_size is 4. 1218 else if (alphabet.contains_deg (*c)) score += 0.5; 0x081397ac in Alphabet::contains_deg(char) const (this=0x820e400, c=73 'I') Alphabet::contains_deg(char) const (this=0x820e400, c=73 'I') at /home/pete/dart/src/seq/../seq/biosequence.h:189 189 return i > - (int) degen_lc.size() && i < alphabet_size; again i is 4, alphabet_size is 4. fun with c++ - score string based on belonging to the given alphabet. first time through (guess DNA) score is 2522 (line 1219 of biosequence.cc); RNA score is 2357, protein score is 6396, sim is 6332; text alphabet score is also 6396, sim is 3837 best_alph is 2 main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:120 120 EM_matrix em_matrix (C, init_alph.size(), max_fork); EM_matrix (this=0xbfffe894, C=1, A=20, max_fork=1, align_db=0x0, timepoint_res=0.01) at /home/pete/dart/src/hsm/em_matrix.cc:18 18 { (gdb) s 0x080f74f0 in Diagonalised_matrix_factory (this=0xbfffe894) at /home/pete/dart/src/tree/diagonalised_matrix_factory.cc:8 8 Diagonalised_matrix_factory::Diagonalised_matrix_factory() : min_prob (1e-10) { } 0x0804eb4b in Substitution_matrix_factory (this=0xbfffe894) at /home/pete/dart/src/ecfg/../tree/hasegawa.h:29 29 Hasegawa_matrix_factory() : transition_rate(0), transversion_rate(0), base_frequency (4, 0.0), do_ivratio_correction(1) { } google on Hasegawa likelihood - lots of hits #1: http://www.ingentaconnect.com/content/tandf/usyb/2000/00000049/00000004/art00003 Goldman N. (Ian's friend?) Anderson, Rodrigo Systematic Biology (2000) " Likelihood-based statistical tests of competing evolutionary hypotheses (tree topologies) have been available for approximately a decade. By far the most commonly used is the Kishino-Hasegawa test. However, the assumptions that have to be made to ensure the validity of the Kishino-Hasegawa test place important restrictions on its applicability. In particular, it is only valid when the topologies being compared are specified a priori." 0x080e46e2 in EM_matrix_params (this=0xbfffe8b8) at /home/pete/dart/src/hsm/em_matrix.cc:18 Piper (this=0xbfffe8e4, max_fork=1) at /home/pete/dart/src/util/piper.cc:38 0x080e46c8 in Matrix (this=0xbfffe90c) at /home/pete/dart/src/hsm/../newmat/newmat.h:539 0x0817a19c in GeneralMatrix (this=0xbfffe90c) at /home/pete/dart/src/newmat/newmat4.cc:34 34 { store=0; storage=0; nrows=0; ncols=0; tag=-1; } 0x0817ce86 in BaseMatrix (this=0xbfffe90c) at /home/pete/dart/src/newmat/../newmat/newmat.h:1015 1015 GenericMatrix() : gm(0) {} Janitor (this=0xbfffe90c) at /home/pete/dart/src/newmat/myexcept.cc:127 127 if (do_not_link) (currently false psk) 136 OnStack = true; 140 NextJanitor = JumpBase::jl->janitor; JumpBase::jl->janitor=this; ... 0x080e46ac in SymmetricMatrix (this=0xbfffe92c) at /home/pete/dart/src/hsm/../newmat/newmat.h:601 601 SymmetricMatrix() {} ... 0x080e4690 in DiagonalMatrix (this=0xbfffe94c) at /home/pete/dart/src/hsm/../newmat/newmat.h:702 702 DiagonalMatrix() {} ... EM_matrix (this=0xbfffe894, C=1, A=20, max_fork=1, align_db=0x0, timepoint_res=0.01) at /home/pete/dart/src/hsm/em_matrix.cc:19 19 init_matrix (C, A, align_db, timepoint_res); finally finished constructors which do nothing (except maybe exception guys). EM_matrix::init_matrix(int, int, Tree_alignment_database const*, double) (this=0xbfffe894, new_C=1, new_A=20, new_align_db=0x0, new_timepoint_res=0.01) at /home/pete/dart/src/hsm/em_matrix.cc:32 32 C = new_C; 33 A = new_A; 35 pi = vector (A*C); 36 X = vector > (C, array2d (A, A)); note array2d.h - interesting comment "operator<< and operator>> output the transpose of the array" array2d (this=0xbfffe5d0, xsize=20, ysize=20) at /home/pete/dart/src/hmm/../util/array2d.h:299 299 {} 0x08056fcc in std::vector, std::allocator > >::size() const (this=0xbfffe600) EM_matrix::init_matrix(int, int, Tree_alignment_database const*, double) (this=0xbfffe894, new_C=1, new_A=20, new_align_db=0x0, new_timepoint_res=0.01) at /home/pete/dart/src/hsm/em_matrix.cc:37 37 Y = vector > (A, array2d (C, C)); R = Matrix (A*C, A*C); S = SymmetricMatrix (A*C); mu = DiagonalMatrix (A*C); U = Matrix (A*C, A*C); sqrt_pi = vector (A*C); log_pi = vector (A*C); 47 timepoint_cache.clear(); // set up the cache 50 init_cache (new_align_db, new_timepoint_res); EM_matrix::init_cache(Tree_alignment_database const*, double) (this=0xbfffe894, new_align_db=0x0, new_timepoint_res=0.01) at /home/pete/dart/src/hsm/em_matrix.cc:56 // update the cache parameters 56 align_db = new_align_db; timepoint_res = new_timepoint_res; // update the cache timepoint_cache.clear(); 60 if (align_db != 0) not taken 67 } EM_matrix (this=0xbfffe894, C=1, A=20, max_fork=1, align_db=0x0, timepoint_res=0.01) at /home/pete/dart/src/hsm/em_matrix.cc:23 // hardwire the tolerance & max iterations for NR & EM. this is HACKY believe NR stands for Newton-Raphson 23 nr_tol = .1 * (double) n_params(); 0x080e44ba in EM_matrix::n_params() const (this=0xbfffe894) at /home/pete/dart/src/hsm/../hsm/em_matrix.h:241 241 inline int n_params() const { return kappa_idx(C,0); } 0x080e44e0 in EM_matrix::kappa_idx(int, int) const (this=0xbfffe894, c=1, a=0) at /home/pete/dart/src/hsm/../hsm/em_matrix.h:240 240 inline int kappa_idx (int c, int a) const { return beta_idx() + 1 + ca(c,a); } 0x080e4530 in EM_matrix::beta_idx() const (this=0xbfffe894) at /home/pete/dart/src/hsm/../hsm/em_matrix.h:239 239 inline int beta_idx() const { return Y_idx(0,1,A); } 0x080e4560 in EM_matrix::Y_idx(int, int, int) const (this=0xbfffe894, ci=0, cj=1, a=20) at /home/pete/dart/src/hsm/../hsm/em_matrix.h:238 238 inline int Y_idx (int ci, int cj, int a) const { return X_idx(C,0,1) + a*C*(C-1)/2 + ci*C - (ci*(ci+3))/2 + cj - 1; } 0x080e45ea in EM_matrix::X_idx(int, int, int) const (this=0xbfffe894, c=1, ai=0, aj=1) at /home/pete/dart/src/hsm/../hsm/em_matrix.h:237 237 inline int X_idx (int c, int ai, int aj) const { return pi_idx(C,0) + c*A*(A-1)/2 + ai*A - (ai*(ai+3))/2 + aj - 1; } 0x080e466e in EM_matrix::pi_idx(int, int) const (this=0xbfffe894, c=1, a=0) at /home/pete/dart/src/hsm/../hsm/em_matrix.h:236 236 inline int pi_idx (int c, int a) const { return ca(c,a); } 0x080e4517 in EM_matrix::ca(int, int) const (this=0xbfffe894, c=1, a=0) at /home/pete/dart/src/hsm/../hsm/em_matrix.h:93 93 inline int ca (int c, int a) const { return c * A + a; } // index of state for class c, residue a google on matrix kappa index - www.chestx-ray.com/Statistics - Kappa, is widely used to measure interobserver variability, that is, how often 2 or more observers agree in their interpretations. ... EM_matrix (this=0xbfffe894, C=1, A=20, max_fork=1, align_db=0x0, timepoint_res=0.01) at /home/pete/dart/src/hsm/em_matrix.cc:24 24 nr_max_iter = n_params() * 5000; (gdb) p nr_tol $35 = 23.100000000000001 (gdb) p nr_max_iter $36 = 1155000 so n_params() returns 231. 26 em_min_inc = .001; 27 em_max_iter = m() * 20; em_max_iter is 400. main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:121 121 em_matrix.init_alphabet (init_alph); EM_matrix::init_alphabet(Alphabet const&) (this=0xbfffe894, base_alphabet=@0x820ebc0) at /home/pete/dart/src/hsm/em_matrix.cc:85 85 const char* nd_chars = base_alphabet.nondegenerate_chars(); "arndcqeghilkmfpstwyv" 86 const char* d_chars = base_alphabet.degenerate_chars(); "x*bz" 87 if ((int) base_alphabet.size() != A || (int) strlen (nd_chars) != A) not taken 89 hidden_alphabet.reset (base_alphabet.name.c_str(), C*A); "Protein" EM_matrix::init_alphabet(Alphabet const&) (this=0xbfffe894, base_alphabet=@0x820ebc0) at /home/pete/dart/src/hsm/em_matrix.cc:90 90 hidden_alphabet.case_sensitive = base_alphabet.case_sensitive; int_string (this=0xbfffe590, n=1) at /home/pete/dart/src/hsm/../util/nstring.h:17 17 sprintf (buf, "%d", n); 92 hidden_alphabet.name << C_string; guess: something like "Protein1" 94 chars << nd_chars << d_chars; 95 hidden_alphabet.init_degen_chars (chars.c_str()); "arndcqeghilkmfpstwyvx*bz" for (int a = 0; a < A; ++a) { Symbol_weight_map p; for (int c = 0; c < C; ++c) p.insert (Symbol_weight (ca(c,a), 1)); 0x080e4517 in EM_matrix::ca(int, int) const (this=0xbfffe894, c=0, a=0) at /home/pete/dart/src/hsm/../hsm/em_matrix.h:93 93 inline int ca (int c, int a) const { return c * A + a; } // index of state for class c, residue a EM_matrix::init_alphabet(Alphabet const&) (this=0xbfffe894, base_alphabet=@0x820ebc0) at /home/pete/dart/src/hsm/em_matrix.cc:99 99 for (int c = 0; c < C; ++c) 100 p.insert (Symbol_weight (ca(c,a), 1)) first time - didn't look like did anything except allocation 101 hidden_alphabet.set_degen_char (chars[a], p); ... Alphabet::set_degen_char(char, std::map, std::allocator > > const&) (this=0xbfffea08, degen_char=97 'a', p=@0xbfffe4a0) at /home/pete/dart/src/seq/biosequence.cc:386 386 const int i = char2int_deg (degen_char); note char is 'a': i now set to -1. second time through char 'r' i now set to -2. 390 for_const_contents (Symbol_weight_map, p, sw) same processing as above - believe default probability is 1. ... 103 for (int i = 0; i < (int) strlen (d_chars); ++i) (in em_matrix.cc) "x*bz" 105 const Symbol_weight_map deg_w = base_alphabet.char2weight (d_chars[i]); Alphabet::char2weight(char) const (this=0x820ebc0, c=120 'x') at /home/pete/dart/src/seq/biosequence.cc:445 445 return int2weight (char2int_deg (c)); Alphabet::int2weight(int) const (this=0x820ebc0, i=-1) at /home/pete/dart/src/seq/biosequence.cc:434 434 if (i >= 0) note here i=-1 - returning already set up symbol weight maps. 113 for (int c = 0; c < C; ++c) display_chars << nd_chars; 114 hidden_alphabet.init_display (display_chars); Alphabet::init_display(sstring const&) (this=0xbfffea08, chars=@0xbfffe360) at /home/pete/dart/src/seq/biosequence.cc:379 379 if ((int) chars.size() != 0 && (int) chars.size() != alphabet_size) 381 display_lc = chars; EM_matrix::init_alphabet(Alphabet const&) (this=0xbfffe894, base_alphabet=@0x820ebc0) at /home/pete/dart/src/hsm/em_matrix.cc:116 116 init_param_labels (base_alphabet); EM_matrix::init_param_labels(Alphabet const&) (this=0xbfffe894, base_alphabet=@0x820ebc0) at /home/pete/dart/src/hsm/em_matrix.cc:1265 1265 param_label = vector (n_params(), sstring()); for (int ci = 0; ci < C; ++ci) { const char ci_char = ci < 10 ? ('0' + ci) : ('a' + ci - 10); for (int ai = 0; ai < A; ++ai) { const char ai_char = toupper (base_alphabet.int2char(ai)); param_label[pi_idx(ci,ai)] << "Pi" << ai_char << ci_char; first time through (ci == ai == 0) ci_char == '0', ai_char == 'A' so pi_idx label "PiA0" param_label[kappa_idx(ci,ai)] << "kappa" << ai_char << ci_char; for (int aj = ai + 1; aj < A; ++aj) { const char aj_char = toupper (base_alphabet.int2char(aj)); first time through aj == 1, aj_char == 'R' (second in alphabet) param_label[X_idx(ci,ai,aj)] << "X" << ai_char << aj_char << ci_char; "XAR0" for (int cj = ci + 1; cj < C; ++cj) { const char cj_char = cj < 10 ? ('0' + cj) : ('a' + cj - 10); param_label[Y_idx(ci,cj,ai)] << "Y" << ai_char << ci_char << cj_char; } nothing done since C == 1. ... 1286 param_label[beta_idx()] = "beta"; 0x080e4530 in EM_matrix::beta_idx() const (this=0xbfffe894) at /home/pete/dart/src/hsm/../hsm/em_matrix.h:239 239 inline int beta_idx() const { return Y_idx(0,1,A); ... main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:122 122 if (max_rounds >= 0) currently max_rounds == -1 ... 137 const double prior_dev = rndinit ? (.5 / (double) (C*init_alph.size())) : 0.; currently rndinit == false so prior_dev == 0 138 const double inter_intra_ratio = .1; 139 const double intra_min = 1. / (double) (inter_intra_ratio * (double) (C-1) + (init_alph.size()-1)); intra_min == 0.0526315789473 == 1/19. 140 const double intra_dev = rndinit ? (intra_min / 5.) : 0.; 0 141 const double inter_min = inter_intra_ratio * intra_min; 0.0052631578947 142 const double inter_dev = rndinit ? (inter_intra_ratio * intra_dev) : 0.; 0 143 em_matrix.randomise (prior_dev, intra_min, intra_dev, inter_min, inter_dev); EM_matrix::randomise(double, double, double, double, double) (this=0xbfffe894, prior_dev=0, intra_min=0.052631578947368418, intra_dev=0, inter_min=0.005263157894736842, inter_dev=0) at /home/pete/dart/src/hsm/em_matrix.cc:1291 1291 const double prior_min = 1; const double prior_min = 1; double beta = 0; for (int ci = 0; ci < C; ++ci) for (int ai = 0; ai < A; ++ai) { const int i = ca(ci,ai); i starts at 0 duh pi[i] = prior_min + Rnd::prob() * prior_dev; pi[0] == 1 since prior_dev == 0 beta += pi[i]; double& X_ii = X[ci](ai,ai); 0 for (int aj = 0; aj < A; ++aj) if (aj != ai) skip first time aj == ai == 0. 1304 X[ci](ai,aj) = aj < ai ? X[ci](aj,ai) : (intra_min + Rnd::prob() * intra_dev); recall intra_dev == 0, intra_min == 0.05263157894736841 X_ii -= X[ci](ai,aj); } after finish this loop X_ii == -0.99999999999999956 must mean something. double& Y_ii = Y[ai](ci,ci); 0 C == 0 so the end does nothing - back to 1294 for (int ai = 0; ai < A; ++ai) second time through after line 1300 X_ii again 0 think X matrix init to 0. After this routine, looks like all X entries init to intra_min == 0.0526315789473684 second time hit breakpoint at 1308 double& Y_ii = Y[ai](ci,ci); X_ii == -0.99999999999999956 again. for (int i = 0; i < m(); ++i) pi[i] /= beta; pi[0] == pi[1] == pi[19] == 1 beta == 20 Breakpoint 9, EM_matrix::randomise(double, double, double, double, double) (this=0xbfffe894, prior_dev=0, intra_min=0.052631578947368418, intra_dev=0, inter_min=0.005263157894736842, inter_dev=0) at /home/pete/dart/src/hsm/em_matrix.cc:1318 1318 update(); EM_matrix::update() (this=0xbfffe894) at /home/pete/dart/src/hsm/em_matrix.cc:341 341 update_R(); EM_matrix::update_R() (this=0xbfffe894) at /home/pete/dart/src/hsm/em_matrix.cc:140 140 for (int i = 0; i < m(); ++i) for (int j = i; j < m(); ++j) R(i+1,j+1) = 0; Matrix::operator()(int, int) (this=0xbfffe90c, m=1, n=2) at /home/pete/dart/src/newmat/newmat6.cc:32 32 if (m<=0 || m>nrows || n<=0 || n>ncols) here nrows == ncols == 20 - code is Real& Matrix::operator()(int m, int n) { if (m<=0 || m>nrows || n<=0 || n>ncols) Throw(IndexException(m,n,*this)); return store[(m-1)*ncols+n-1]; } remember R is a A*C x A*C matrix - see em_matrix.cc:37; nice comments em_matrix.h:47 // secondary data Matrix R; // full rate matrix SymmetricMatrix S; // symmetrised version of R DiagonalMatrix mu; // on-diagonal elements of D are eigenvalues of S (and R) Matrix U; // columns of U are eigenvectors of S vector sqrt_pi; // elements are square roots of equilibrium frequencies vector log_pi; // logs of eqm freqs array2d log_UL; // log_UL(k,i) = log (U(i+1,k+1) / sqrt_pi[i]) array2d log_UR; // log_UR(i,k) = log (U(i+1,k+1) * sqrt_pi[i]) struct EM_matrix_params { int C; // site classes int A; // alphabet size vector pi; // equilibrium frequencies vector > X; // intra-class rate matrices; outer vector indexed by class vector > Y; // inter-class rate matrices; outer vector indexed by residue }; Can't figure out how to display R entries using gdb - maybe add something to the log facility to get a dump. Newmat documentation: http://www.robertnz.net/nm10.htm#scalar1 Real* s = A.Store(); // pointer to array of elements described as the way to get "right inside the black box" Dart - what's CTAG(7,XRATE) << "Final log-likelihood = " << Nats2Bits(log_likelihood) << " bits\n"; preprocessed to clog_stream.request_logging(7,"/home/pete/work/dart_cvs/dart/src/ecfg/xrate.cc" " " "XRATE" " ",211,0).file_tag("/home/pete/work/dart_cvs/dart/src/ecfg/xrate.cc",211) << "Final log-likelihood = " << Score_fns::loglike2bits(log_likelihood) << " bits\n"; (xrate.cc:211) util/logfile.h: #define CLOG(n) CLOGREQUEST(n,0).CLOG_FILE_LINE #define CTAG(n,TAGS) CTAGREQUEST(n,TAGS,0).CLOG_FILE_LINE #define CLOGERR CLOGREQUEST(10,0).CLOG_FILE_LINE #define CLOGOUT CLOGREQUEST(10,1).CLOG_FILE_LINE #define CL CLOGSTREAM file logtags.h: note comment /* This file automatically generated by make-logtags.pl. DO NOT EDIT */ yes make-logtags.pl is in src/util. Maybe simpler to write my own code - see this in logfile.cc: } // log the character if (log_to_stdout) cout.put(c); if (log_to_stderr) cerr.put(c); if (log_to_file && logfile_open) logfile_stream.put(c); // set (or unset) bare_newline flag if (c == '\n') { bare_newline = TRUE; if (log_to_stdout) cout.flush(); if (log_to_stderr) cerr.flush(); if (log_to_file && logfile_open) logfile_stream.flush(); ./xrate -loghelp - interesting stuff. look at logtags - see level 6 in biosequence.cc ./xrate -log 6 generates biosequence.cc to stdout. try including a dummy message sstring pk_log_string; pk_log_string << "pk log message\n"; CLOG(6) << pk_log_string; in em_matrix.c update_R() works: CTAG(3,RATE_EM) << "PK RATE_EM ctagging\n"; execute -log RATE_EM back to exploring xrate.cc: note that intra_min == 0.0526315789473 == 1/19 so the row sum and column sum of the initial X and R matrices are both 0 (both: the diagonal entries are -1). Back to the beginning: xrate.cc:120, em_matrix.cc:18 Sets up matrices, no (non-zero at least) initial values. main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:121 121 em_matrix.init_alphabet (init_alph); EM_matrix::init_alphabet(Alphabet const&) (this=0xbfffe894, base_alphabet=@0x820ebc0) at /home/pete/dart/src/hsm/em_matrix.cc:85 85 const char* nd_chars = base_alphabet.nondegenerate_chars(); to main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:122 122 if (max_rounds >= 0) to 137 const double prior_dev = rndinit ? (.5 / (double) (C*init_alph.size())) : 0.; see notes above to 143 em_matrix.randomise (prior_dev, intra_min, intra_dev, inter_min, inter_dev); EM_matrix::randomise(double, double, double, double, double) (this=0xbfffe894, prior_dev=0, intra_min=0.052631578947368418, intra_dev=0, inter_min=0.005263157894736842, inter_dev=0) at /home/pete/dart/src/hsm/em_matrix.cc:1291 1291 const double prior_min = 1; again see above notes - to 1318 update(); EM_matrix::update() (this=0xbfffe894) at /home/pete/dart/src/hsm/em_matrix.cc:341 341 update_R(); EM_matrix::update_R() (this=0xbfffe894) at /home/pete/dart/src/hsm/em_matrix.cc:140 140 for (int i = 0; i < m(); ++i) first part - copy R from X, Y matrices - to 158 CTAG(3,RATE_EM RATE_EM_NEWMAT) << "Solving for equilibrium probabilities by QR decomposition\n"; 159 Matrix Q = R.t(); transpose 160 UpperTriangularMatrix upper; 161 QRZ (Q, upper); QRZ(Matrix&, UpperTriangularMatrix&) (X=@0xbfffe5b0, U=@0xbfffe590) at /home/pete/dart/src/newmat/hholder.cc:95 95 Tracer et("QRZ(1)"); see http://www.robertnz.net/nm10.htm - an extension of the QR decomposition 163 pi[m()-1] = 1; 164 double pi_norm = pi[m()-1]; ... (see em_matrix.cc source) already have logging set up - pi (equilibrium frequencies) are all 0.05 (duh), pi*R all entries 0.00000000. ... EM_matrix::update() (this=0xbfffe894) at /home/pete/dart/src/hsm/em_matrix.cc:342 342 update_S(); EM_matrix::update_S() (this=0xbfffe894) at /home/pete/dart/src/hsm/em_matrix.cc:191 191 double sym_sq = 0; see source - symmetrizes R making S, writes out deviation from the symmetric form - to 222 Jacobi (S, mu, U); 0x08176062 in Jacobi(SymmetricMatrix const&, DiagonalMatrix&, Matrix&) (X=@0xbfffe92c, D=@0xbfffe94c, V=@0xbfffe96c) at /home/pete/dart/src/newmat/jacobi.cc:100 100 { SymmetricMatrix A; Jacobi(X,D,A,V,true); } see robertnz.net - decomposition of a symmetric matrix A A = V * D * V.t() where V is an orthogonal matrix (type Matrix in Newmat) and D is a DiagonalMatrix. as one would expect, all but one of the eigenvalues (entries in the diagonal matrix) are equal, == -1.0 - 1/19.0 - the remaining eigenvalue is zero EM_matrix::update_S() (this=0xbfffe894) at /home/pete/dart/src/hsm/em_matrix.cc:225 225 genvector decomposition restart; b xrate.cc:143 to update_R to update_S // round down positive eigenvalues to (max_neg_eval/100), or 0 for max eigenvalue see source - checks for max negative, max eigenvalues, resets to slightly_negative = max_neg_eval / 100. (calculated here). around line 260 - re-evaluate R, S matrices following: // recalculate S & R following equilibration for (int i = 0; i < m(); ++i) for (int j = i; j < m(); ++j) { double S_sum = 0; for (int k = 0; k < m(); ++k) S_sum += mu(k+1) * U(i+1,k+1) * U(j+1,k+1); S(i+1,j+1) = S_sum; R(i+1,j+1) = S_sum * sqrt_pi[j] / sqrt_pi[i]; } ./xrate.debug -log 1 -log RATE_EM -log RATE_EM_EIGEN -log RATE_EM_EQ >& xrate.RATE_EM_EQ_EIGEN.out rrm.sto gives you a whole bunch of stuff - the first time through, equilibration doesn't change R at all (not surprising) here R == S. 343 update_timepoint_cache(); EM_matrix::update_timepoint_cache() (this=0xbfffe894) at /home/pete/dart/src/hsm/em_matrix.cc:283 283 CTAG(3,RATE_EM RATE_EM_CACHE) << "Updating cached M() & J()\n"; doesn't do anything - believe none set. EM_matrix::update() (this=0xbfffe894) at /home/pete/dart/src/hsm/em_matrix.cc:344 344 update_eigenvectors(); 329 eigenvalue = vector (m()); just stores mu vector as eigenvalue, U as eigenvector in struct Diagonalised_matrix_factory : Substitution_matrix_factory main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:144 144 initially_randomised = TRUE; ... 153 if (&init_alph == &Protein_alphabet) 154 submat_factory = new PAM_factory; PAM_factory (this=0x82671b0) at /home/pete/dart/src/tree/pam.cc:13 13 const int alph_sz = 20; a hard coded table eval one row, evec rows 0 to 19 - set up vectors eigenvalue = vector (eval, eval + alph_sz); eigenvector.push_back (vector (evec0, evec0 + alph_sz)); etc. // secondly, estimate the trees 177 seq_db.seqs2scores (submat_factory->alphabet()); // convert sequences to Score_profile's Sequence_database::seqs2scores(Alphabet const&) (this=0xbffff334, a=@0x820ebc0) at /home/pete/dart/src/seq/biosequence.cc:986 986 CTAG(6,BIOSEQ) << "Sequence_database: converting ASCII sequences to score profiles\n"; 987 int count = 0; 988 for_contents (list, *this, prof) preprocessed to: for ( list ::iterator prof = ( *this ).begin(), DART_END_ITERprof = ( *this ).end(); prof != DART_END_ITERprof; ++ prof ) 990 (*prof).seq2score (a); 0x0813940c in Named_profile::seq2score(Alphabet const&) (this=0x82398d0, a=@0x820ebc0) at /home/pete/dart/src/seq/../seq/biosequence.h:433 433 void seq2score (const Alphabet& a) { a.seq2score (seq, prof_sc); } Alphabet::seq2score(sstring const&, Score_profile&) const (this=0x820ebc0, seq=@0x8239a50, prof=@0x82398d0) at /home/pete/dart/src/seq/biosequence.cc:556 556 Weight_profile wprof; 557 return prof = Score_profile (seq2weight (seq, wprof)); Alphabet::seq2weight(sstring const&, Weight_profile&) const (this=0x820ebc0, seq=@0x8239a50, prof=@0xbfffe600) at /home/pete/dart/src/seq/biosequence.cc:540 540 prof.clear(); 541 542 for_const_contents (Biosequence, seq, c) prof.push_back (char2weight(*c)); 261 { return _M_dataplus._M_p; } (gdb) p _M_dataplus._M_p $2 = 0x8264c1c "IYIGNLNRELTEGDILTVFSEYGVPVDVILSRDENTGESQGFAYLKYEDQRSTILAVDNLNGFKIGGRALKI" YIS5_YEAST/33-104 entire sequence Alphabet::char2weight(char) const (this=0x820ebc0, c=73 'I') at /home/pete/dart/src/seq/biosequence.cc:445 445 return int2weight (char2int_deg (c)); Alphabet::int2weight(int) const (this=0x820ebc0, i=9) at /home/pete/dart/src/seq/biosequence.cc:434 434 if (i >= 0) believe i == 9 is the offset into the nd_chars string "arndcqeghilkmfpstwyv" 437 w.insert (Symbol_weight (i, 1)); 438 return w; note probability 1 - comments from biosequence.h // a Weight_profile is an ungapped profile in probability space // it should probably have all the same methods as Score_profile (see below), but it is more rarely used // typedef pair Symbol_weight; ... 624 return *this; (gdb) p *this $3 = { _M_current = 0x8264c1d "YIGNLNRELTEGDILTVFSEYGVPVDVILSRDENTGESQGFAYLKYEDQRSTILAVDNLNGFKIGGRALKI" } Alphabet::seq2weight(sstring const&, Weight_profile&) const (this=0x820ebc0, seq=@0x8239a50, prof=@0xbfffe600) at /home/pete/dart/src/seq/biosequence.cc:543 543 return prof; ... 991 ++count; going through ungapped sequence (maybe the first row only) and inserting a weight entry (nd vs. degen). ... // secondly, estimate the trees main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:178 178 Subst_dist_func_factory dist_func_factory (*submat_factory); Subst_dist_func_factory (this=0xbfffe714, submat_factory=@0x82671b0, tres=0.01, tmax=10) at /home/pete/dart/src/tree/subdistmat.cc:49 49 { } main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:179 179 align_db.estimate_missing_trees_by_nj (dist_func_factory); Tree_alignment_database::estimate_missing_trees_by_nj(Dist_func_factory&) (this=0xbffff1d4, dist_func_factory=@0xbfffe714) at /home/pete/dart/src/tree/tree_alignment.cc:537 537 for_contents (vector, tree_align, ta) 538 if (ta->tree.nodes() == 0) 539 ta->estimate_tree_by_nj (dist_func_factory); Tree_alignment::estimate_tree_by_nj(Dist_func_factory&) (this=0x823a070, dist_func_factory=@0xbfffe714) at /home/pete/dart/src/tree/tree_alignment.cc:385 385 CTAG(5,TREE_ALIGN) << "Estimating tree for the following sequences: " << align.row_name << "\n"; 389 if (tree.nodes() != 0) 392 Dist_func* dist_func = dist_func_factory.create_dist_func (align); Subst_dist_func_factory::create_dist_func(Alignment const&) (this=0xbfffe714, align=@0x823a3ac) at /home/pete/dart/src/tree/subdistmat.cc:42 42 return new Subst_dist_func (*this, align); ... Tree_alignment::estimate_tree_by_nj(Dist_func_factory&) (this=0x823a070, dist_func_factory=@0xbfffe714) at /home/pete/dart/src/tree/tree_alignment.cc:393 393 Dist_matrix dist_mat (align.rows(), *dist_func); Dist_matrix (this=0xbfffe610, size=90, f=@0x82635c0) at /home/pete/dart/src/seq/distmat.cc:19 19 { rrm.ed - only three rows; Dist_matrix (this=0xbfffe610, size=3, f=@0x823edc8) at /home/pete/dart/src/seq/distmat.cc:19 19 { 20 for (int i = 0; i < size; ++i) 22 (*this)(i,i) = 0; array2d::operator()(int, int) (this=0xbfffe610, x=0, y=0) at /home/pete/dart/src/ecfg/../util/array2d.h:55 55 if (x < 0 || x >= _xsize || y < 0 || y >= _ysize) _xsize, _ysize both 3 58 return _data [x + y * _xsize]; (gdb) p _data [x + y * _xsize] $3 = (const double &) @0x8249f58: 0 Dist_matrix (this=0xbfffe610, size=3, f=@0x823edc8) at /home/pete/dart/src/seq/distmat.cc:23 23 for (int j = i + 1; j < size; ++j) 26 (*this)(i,j) = (*this)(j,i) = f(i,j); Subst_dist_func_factory::Subst_dist_func::operator()(int, int) (this=0x823edc8, i=0, j=2) at /home/pete/dart/src/tree/subdistmat.cc:54 54 Substitution_counts subst_counts (align, i, j); Substitution_counts (this=0xbfffe220, align=@0x823ab64, row1=0, row2=2) at /home/pete/dart/src/tree/subdistmat.cc:6 6 for (int col = 0; col < align.columns(); ++col) 7 if (!align.is_gap (row1, col) && !align.is_gap (row2, col)) 8 for_const_contents (Symbol_score_map, align.get_ssm (row1, col), ss1) Alignment::get_ssm(int, int) const (this=0x823ab64, row=0, col=0) at /home/pete/dart/src/seq/alignment.cc:946 946 if (is_gap(row,col)) THROW Standard_exception ("Attempt to retrieve gap character"); 947 return (*prof[row]) [path.get_seq_pos (row, col)]; 0x08126caf in Alignment_path::get_seq_pos(int, int) const (this=0x823ab70, r=0, c=0) at /home/pete/dart/src/seq/../seq/alignment.h:396 396 { return accumulate (row(r).begin(), row(r).begin() + c, 0); } 0x080b7a44 in Alignment_path::row(int) const (this=0x823ab70, row=0) at /home/pete/dart/src/hmm/../seq/alignment.h:381 381 { return path_data[row]; } Alignment::get_ssm(int, int) const (this=0x823ab64, row=0, col=0) at /home/pete/dart/src/seq/alignment.cc:948 948 } Substitution_counts (this=0xbfffe220, align=@0x823ab64, row1=0, row2=1) at /home/pete/dart/src/tree/subdistmat.cc:9 9 for_const_contents (Symbol_score_map, align.get_ssm (row2, col), ss2) Substitution_counts (this=0xbfffe220, align=@0x823ab64, row1=0, row2=1) at /home/pete/dart/src/tree/subdistmat.cc:10 10 res_pair_count[Residue_pair (ss1->first, ss2->first)] += Score2Prob (ScorePMul (ss1->second, ss2->second)); Score_fns::score_pr_product(int, int) (a=0, b=0) at /home/pete/dart/src/ecfg/../util/score.h:216 216 if (a <= -InfinityScore) ... 228 return a + b; Score_fns::score2prob(int) (sc=0) at /home/pete/dart/src/ecfg/../util/score.h:151 151 if (sc <= 0) 155 return score2prob_lookup[-sc]; 159 } 9 for_const_contents (Symbol_score_map, align.get_ssm (row2, col), ss2) (gdb) b subdistmat.cc:11 Breakpoint 2 at 0x810a0f6: file /home/pete/dart/src/tree/subdistmat.cc, line 11. - never hit so can step through - make rrm.ed1 three rows, each four characters. restart: Dist_matrix (this=0xbfffe610, size=3, f=@0x8231948) at /home/pete/dart/src/seq/distmat.cc:20 20 for (int i = 0; i < size; ++i) 26 (*this)(i,j) = (*this)(j,i) = f(i,j); ... 0x0806322b in array2d::operator()(int, int) (this=0xbfffe610, x=0, y=1) at /home/pete/dart/src/ecfg/../util/array2d.h:58 58 return _data [x + y * _xsize]; Subst_dist_func_factory::Subst_dist_func::operator()(int, int) (this=0x8231948, i=0, j=1) at /home/pete/dart/src/tree/subdistmat.cc:54 54 Substitution_counts subst_counts (align, i, j); 55 Subst_log_like f (factory.submat_factory, subst_counts); 56 Subst_log_like_dt dfdt (factory.submat_factory, subst_counts); 59 Function_cache f_cached (f, factory.tres); s - no function called - see util/maximise.h 60 Function_cache dfdt_cached (dfdt, factory.tres); // do Brent maximisation 64 bracket_maximum (f_cached, t1, t2, t3, 0., factory.tmax); bool bracket_maximum >(Function_cache&, double&, double&, double&, double, double) (f=@0xbfffe1e0, x1=@0xbfffe1a8, x2=@0xbfffe1a0, x3=@0xbfffe198, xmin=0, xmax=10) at /home/pete/dart/src/tree/../util/maximise.h:56 56 const double default_mag = 1.618034; 59 CTAG(2,MAXIMISE MAXIMISE_BRACKET) << "Scanning for bracketing interval in range [" << xmin << " , " << xmax << "]\n"; 63 double f1 = f(x1 = xmin + 0.48 * (xmax-xmin)); (gdb) p f1 $3 = -21.687404375607514 64 double f2 = f(x2 = xmin + 0.52 * (xmax-xmin)); (gdb) p f2 $4 = -21.743799246284006 65 if (f2 < f1) { swap(x1,x2); swap(f1,f2); } // ensure that x1->x2 is uphill 67 x3 = x2 + default_mag * (x2-x1); // first guess for x3 ... at /home/pete/dart/src/tree/../util/maximise.h:106 106 return max(f1,f3) < f2; here f1 == -21.59, f3 == -98.4, f2 == -21.45. at /home/pete/dart/src/tree/subdistmat.cc:65 65 brent_deriv (f_cached, dfdt_cached, t1, t2, t3, factory.tres, tbest, fbest); bool brent_deriv, Function_cache >(Function_cache&, Function_cache&, double, double, double, double, double&, double&) (f=@0xbfffe1e0, df=@0xbfffe1b0, x1=4.1527863999999992, x2=3.1055727899375984, x3=0, res=0.01, xmax=@0xbfffe190, fmax=@0xbfffe188) at /home/pete/dart/src/tree/../util/maximise.h:120 120 const int max_iter = 100; b maximise.h:231 - here xmax == 2.76, fmax == -21.44, iter == 8 Subst_dist_func_factory::Subst_dist_func::operator()(int, int) (this=0x8231948, i=0, j=1) at /home/pete/dart/src/tree/subdistmat.cc:68 // return the ML distance estimate 68 return tbest; (gdb) p tbest $28 = 2.759 69 } Dist_matrix (this=0xbfffe610, size=3, f=@0x8231948) at /home/pete/dart/src/seq/distmat.cc:23 23 for (int j = i + 1; j < size; ++j) 26 (*this)(i,j) = (*this)(j,i) = f(i,j); hit breakpoint in maximise.h:104 again - now x1 == 3.10, f1 == -21.47, x2 == 1.41, f2 == -21.21, x3 == 0, f3 == -42.4, steps == 3 then hit breakpoint maximise.h:231 again - now xmax == 1.62, fmax == -21.20, iter == 6 68 return tbest; tbest == -1.62. now get 1<->2 distance: tbest == 10. 29 CTAG(2,DISTMATRIX) << "Distance matrix:\n" << *this; Tree_alignment::estimate_tree_by_nj(Dist_func_factory&) (this=0x823a828, dist_func_factory=@0xbfffe714) at /home/pete/dart/src/tree/tree_alignment.cc:394 394 delete dist_func; 397 nj.build (align.row_name, dist_mat); NJ_tree::build(std::vector > const&, array2d const&) (this=0xbfffe2d0, nn=@0x823ab64, distance_matrix=@0xbfffe610) at /home/pete/dart/src/tree/nj.cc:10 10 if (nn.size() < 2) here nn.size() is 3. ... 53 bool first_pair = TRUE; ... Tree_alignment::estimate_tree_by_nj(Dist_func_factory&) (this=0x823a828, dist_func_factory=@0xbfffe714) at /home/pete/dart/src/tree/tree_alignment.cc:399 399 tree = nj; 400 build_maps_from_names(); Tree_alignment::build_maps_from_names() (this=0x823a828) at /home/pete/dart/src/tree/tree_alignment.cc:103 103 reset_maps(); ... 127 } Tree_alignment::estimate_tree_by_nj(Dist_func_factory&) (this=0x823a828, dist_func_factory=@0xbfffe714) at /home/pete/dart/src/tree/tree_alignment.cc:401 401 tree_changed(); Tree_alignment_database::estimate_missing_trees_by_nj(Dist_func_factory&) (this=0xbffff1d4, dist_func_factory=@0xbfffe714) at /home/pete/dart/src/tree/tree_alignment.cc:537 537 for_contents (vector, tree_align, ta) 540 } main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:180 180 seq_db.clear_scores(); // clear the Score_profile's to ensure wrong alphabet not used later at /home/pete/dart/src/ecfg/../seq/biosequence.h:522 522 void clear_scores() { for_contents (list, *this, prof) (*prof).prof_sc.clear(); } main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:183 183 if (initially_randomised) // finally, delete the temporary Substitution_matrix_factory if appropriate if (initially_randomised) (pk true) delete submat_factory; // now we can tell the EM_matrix about the Tree_alignment_database 187 em_matrix.init_cache (&align_db, tres); EM_matrix::init_cache(Tree_alignment_database const*, double) (this=0xbfffe894, new_align_db=0xbffff1d4, new_timepoint_res=0.01) at /home/pete/dart/src/hsm/em_matrix.cc:56 56 align_db = new_align_db; have been here before see above. Prev. no align_db - this time there is one 60 if (align_db != 0) 61 for (int n_align = 0; n_align < align_db->size(); ++n_align) note align_db->size() returns 1. ... 67 } main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:188 188 em_matrix.update(); EM_matrix::update() (this=0xbfffe894) at /home/pete/dart/src/hsm/em_matrix.cc:341 341 update_R(); no solution to puzzle about breakpoint not hit. Focus on changes to R, X: review processing to this point. xrate.cc: first time call em_matrix.randomise (prior_dev, intra_min, intra_dev, inter_min, inter_dev); from xrate.cc::143. EM_matrix::randomise(double, double, double, double, double) (this=0xbfffe894, prior_dev=0, intra_min=0.052631578947368418, intra_dev=0, inter_min=0.005263157894736842, inter_dev=0) at /home/pete/dart/src/hsm/em_matrix.cc:1291 - note randomise calls update. then, after loading PAM substitution matrix, building alignment call em_matrix.update() from xrate.cc:188 using the full rrm.sto alignment - set b xrate.cc:188, continue - time around 4'20". Fool around - dump when do update_R - add bool want_dump - modify em_matrix.cc, em_matrix.h, xrate.cc - confirm guess that X matrix is unchanged at this time so R, QR diagonal, pi are all the same as in the first cycle. continue in xrate.cc: 192 const Alphabet& hidden_alphabet = em_matrix.alphabet(); 193 seq_db.seqs2scores (hidden_alphabet); Sequence_database::seqs2scores(Alphabet const&) (this=0xbffff334, a=@0xbfffea08) at /home/pete/dart/src/seq/biosequence.cc:986 986 CTAG(6,BIOSEQ) << "Sequence_database: converting ASCII sequences to score profiles\n"; 990 (*prof).seq2score (a); 0x0813954c in Named_profile::seq2score(Alphabet const&) (this=0x823a910, a=@0xbfffea08) at /home/pete/dart/src/seq/../seq/biosequence.h:433 433 void seq2score (const Alphabet& a) { a.seq2score (seq, prof_sc); } Alphabet::seq2score(sstring const&, Score_profile&) const (this=0xbfffea08, seq=@0x823aa90, prof=@0x823a910) at /home/pete/dart/src/seq/biosequence.cc:556 556 Weight_profile wprof; 557 return prof = Score_profile (seq2weight (seq, wprof)); 542 for_const_contents (Biosequence, seq, c) prof.push_back (char2weight(*c)); ... main (argc=2, argv=0xbffffc34) at /home/pete/dart/src/ecfg/xrate.cc:197 197 if (train) note train an option: "train matrices on indexed alignments using EM" true by default. 199 if (quick || rind) options: quick: "\tuse \"quick\" M-step, i.e. don't use Lagrange multipliers" true by default rind: "use RIND-constrained M-step" false by default. 200 log_likelihood = em_matrix.iterate_quick_EM (rind, intra, inter, forgive, FALSE); EM_matrix::iterate_quick_EM(bool, bool, bool, int, bool) (this=0xbfffe894, rind=false, intra=true, inter=true, forgive=0, infer_class_labels=false) at /home/pete/dart/src/hsm/em_matrix.cc:1603 1603 Loge best_log_likelihood = 0; ... 1608 if (iter >= em_max_iter) (gdb) p em_max_iter $6 = 400 1614 Update_statistics stats = single_quick_EM (rind, intra, inter, infer_class_labels); EM_matrix::single_quick_EM(bool, bool, bool, bool) (this=0xbfffe894, rind=false, intra=true, inter=true, infer_class_labels=false) at /home/pete/dart/src/hsm/em_matrix.cc:1471 1471 CTAG(5,RATE_EM RATE_EM_PROGRESS) << "Computing update statistics (E-phase)\n"; 1472 const Update_statistics stats = get_stats (infer_class_labels); EM_matrix::get_stats(bool) const (this=0xbfffe894, infer_class_labels=false) at /home/pete/dart/src/hsm/em_matrix.cc:487 487 Update_statistics stats = get_stats_unequilibrated (TRUE, infer_class_labels); // symmetrise EM_matrix::get_stats_unequilibrated(bool, bool) const (this=0xbfffe894, symmetrise=true, infer_class_labels=false) at /home/pete/dart/src/hsm/em_matrix.cc:479 479 Update_statistics stats (m()); Update_statistics (this=0xbfffe580, states=20) at /home/pete/dart/src/hsm/em_matrix.cc:360 360 { 361 clear(); EM_matrix::Update_statistics::clear() (this=0xbfffe580) at /home/pete/dart/src/hsm/em_matrix.cc:366 366 log_likelihood = 0.0; { log_likelihood = 0.0; s = vector (states, (double) 0); w = vector (states, (double) 0); u = array2d (states, states, (double) 0); what's states? value is 20 see struct Update_statistics in em_matrix.h: see above em_matrix.cc calls 479 Update_statistics stats (m()); 370 clear_DJU(); { DJU = array2d (states, states, (double) 0); DJU_rev = array2d (states, states, (double) 0); } EM_matrix::get_stats_unequilibrated(bool, bool) const (this=0xbfffe894, symmetrise=true, infer_class_labels=false) at /home/pete/dart/src/hsm/em_matrix.cc:480 480 up_down (stats, 0, symmetrise, infer_class_labels); EM_matrix::up_down(EM_matrix::Update_statistics&, bool, bool, bool) const (this=0xbfffe894, stats=@0xbfffe580, likelihood_only=false, symmetrise=true, infer_class_labels=false) at /home/pete/dart/src/hsm/em_matrix.cc:960 960 if (align_db == 0) THROWEXPR ("No training set!"); 961 const int max_n = align_db->size(); max_n == 1 962 stats.clear_DJU(); again calls { DJU = array2d (states, states, (double) 0); DJU_rev = array2d (states, states, (double) 0); } 966 for (int n_align = 0; n_align < max_n; ++n_align) 968 Alignment_matrix alnmat (*this, n_align, infer_class_labels); Alignment_matrix (this=0xbfffe440, hsm=@0xbfffe894, n_align=0, alloc_class_labels=false) at /home/pete/dart/src/hsm/em_matrix.cc:674 674 { look through code: note ssm stands for a Symbol_score_map see notes above for Symbol_score_map briefly a map default score is 0, weight is 1 ... 680 for (int i = 0; i < hsm.m(); ++i) wildcard_ssm[i] = 0; // loop over columns 682 vector seq_coords = align.path.seq_coords_begin(); Alignment_path::seq_coords_begin() const (this=0x823bbb0) at /home/pete/dart/src/seq/alignment.cc:113 113 return create_seq_coords(); just zero out: alignment.h: Alignment_path::Sequence_coords Alignment_path::create_seq_coords() const { return vector (rows(), (int) 0); } 683 for (int col = 0; col < align.columns(); align.path.inc_seq_coords(seq_coords,col), ++col) 685 Column_matrix& cm = colmat[col]; note colmat (recently) initialized to align.columns(); see em_matrix.h for struct Column_matrix - includes U, D, L tables. 686 cm.alloc (tree.nodes(), states, alloc_class_labels); 0x080844a2 in Phylogeny::nodes() const (this=0x823b86c) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:278 278 int nodes() const { return _sorted_nodes.size(); } // number of nodes (gdb) p _sorted_nodes.size() $14 = 5 tree from log file: 3 leaf nodes, 0 and 2 have common ancestor 3, 1 and 3 common ancestor 4 (root) EM_matrix::Column_matrix::alloc(int, int, bool) (this=0x8248670, _nodes=5, _states=20, alloc_class_labels=false) at /home/pete/dart/src/hsm/em_matrix.cc:529 void EM_matrix::Column_matrix::alloc (int _nodes, int _states, bool alloc_class_labels) { nodes = _nodes; states = _states; // U, D, L tables U = vector > (nodes, vector (states, (Loge) -InfinityLoge)); pk: U a vector of 5 (nodes) vectors of 20 (states) Loge entries each, each initialized to -InfinityLoge in util/score.h: typedef double Loge, #define InfinityLoge 9.87654321e+99 Ian: U(i) = P(subtree | root is in state i) - class, residue D = vector > (nodes, vector (states, (Loge) -InfinityLoge)); L = vector (nodes, (Loge) -InfinityLoge); Ian: L=P(subtree) = sigma or sum over i (pi(i)*U(i). Ian: pi(i)=P(root is in state i) // allowed states for each node; clique roots; posteriors gapped = vector (nodes, (bool) 0); allowed = vector > (nodes, vector()); root = vector (nodes, (int) -1); leaf = vector (nodes, (bool) 1); log_post = vector (nodes, -InfinityLoge); // allocate space for class labels, if mandated if (alloc_class_labels) class_label = sstring (nodes, '-'); pk here alloc_class_labels == false } Alignment_matrix (this=0xbfffe440, hsm=@0xbfffe894, n_align=0, alloc_class_labels=false) at /home/pete/dart/src/hsm/em_matrix.cc:687 687 cm.initialise (tree_align, col, seq_coords, &wildcard_ssm); EM_matrix::Column_matrix::initialise(Tree_alignment const&, int, std::vector > const&, std::map, std::allocator > > const*) (this=0x8248670, tree_align=@0x823b868, column=0, seq_coords=@0xbfffe160, wildcard=0xbfffe180) at /home/pete/dart/src/hsm/em_matrix.cc:550 550 CTAG(2,RATE_EM_INIT) << "Initialising position #" << column+1 << "\n"; CTAG(2,RATE_EM_INIT) << "Initialising position #" << column+1 << "\n"; const PHYLIP_tree& tree = tree_align.tree; const Alignment& align = tree_align.align; vector node_scores (nodes, (const Symbol_score_map*) 0); pk recall nodes == 5. 556 for_rooted_nodes_post (tree, b) 0x08084ee8 in Phylogeny::branches_begin(int, int, bool, bool) const (this=0x823b86c, node=4, parent=-1, top_down=191, visit_parent=255) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:321 321 { return Branch_iter (*this, node, parent, top_down, visit_parent); } pk note that node 4 is the root. Branch_iter (this=0xbfffe0a0, p=@0x823b86c, dad=4, grumpa=-1, top_down=false, visit_grumpa=true) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:220 220 { 0x08084ee1 in iterator (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:208 // default constructor returns end() iterator // 208 Branch_iter() : phylogeny(0), finished(1) {} 0x08084e98 in Branch (this=0xbfffe0c4) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:73 73 Branch () : Node_pair (), length (0) {} 0x08084eb4 in Node_pair (this=0xbfffe0c4) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:46 46 Node_pair () : pair (-1, -1) {} Branch_iter (this=0xbfffe0a0, p=@0x823b86c, dad=4, grumpa=-1, top_down=false, visit_grumpa=true) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:221 221 if (!(top_down && visit_grumpa)) 222 if (!find_children()) Phylogeny::Branch_iter::find_children() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:139 139 if (ignore_kids) { ignore_kids = 0; return 0; } (gdb) p ignore_kids $19 = false 142 Node n = most_recent_node(), p = most_recent_parent(); at /home/pete/dart/src/ecfg/../tree/phylogeny.h:134 134 Node most_recent_node() const { int s = path.size(); return s == 0 ? dad : *path[s-1]; } here path.size() == 0, dad == 4. at /home/pete/dart/src/ecfg/../tree/phylogeny.h:135 135 Node most_recent_parent() const { int s = path.size(); return s == 0 ? grumpa : (s == 1 ? dad : *path[s-2]); } here path.size() still == 0, grumpa == -1 Phylogeny::Branch_iter::find_children() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:143 143 Child_iter c = phylogeny->children_begin (n, p); 0x08084b97 in Phylogeny::children_begin(int, int) const (this=0x823b86c, node=4, parent=-1) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:313 313 { return Relative_iter (_branch_length[node].begin(), _branch_length[node].end(), parent); } { while (i != end) if ((*i).first != avoid_node) break; else ++i; } here (*i).first == 1. Phylogeny::Branch_iter::find_children() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:144 after this: c: begin 1, end 0 144 Child_iter e = phylogeny->children_end (n, p); gives e: begin , end, i all 136601676 (-1?) 145 if (c == e) return 0; 146 path.push_back(c); 147 end.push_back(e); 148 if (top_down) return 1; bool find_children() // returns FALSE unless this path should be extended even further (NB always FALSE if doing a bottom-up strategy) { if (ignore_kids) { ignore_kids = 0; return 0; } while (1) { Node n = most_recent_node(), p = most_recent_parent(); Child_iter c = phylogeny->children_begin (n, p); Child_iter e = phylogeny->children_end (n, p); if (c == e) return 0; path.push_back(c); end.push_back(e); if (top_down) return 1; } return 0; } second time through: 142 Node n = most_recent_node(), p = most_recent_parent(); n == 1, p == 4 143 Child_iter c = phylogeny->children_begin (n, p); 144 Child_iter e = phylogeny->children_end (n, p); 145 if (c == e) return 0; now c == e, return. Branch_iter (this=0xbfffe0a0, p=@0x823b86c, dad=4, grumpa=-1, top_down=false, visit_grumpa=true) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:223 223 finished = !visit_grumpa; 224 setup_branch(); Phylogeny::Branch_iter::setup_branch() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:155 155 if (most_recent_parent() != -1) here most_recent_parent() == 4 156 branch = Branch (most_recent_parent(), most_recent_node(), *phylogeny); at /home/pete/dart/src/ecfg/../tree/phylogeny.h:134 134 Node most_recent_node() const { int s = path.size(); return s == 0 ? dad : *path[s-1]; } now path.size() == 1 *path[0] == 1 98 const Node& operator*() const { return (*i).first; } (*i).first == 1 135 Node most_recent_parent() const { int s = path.size(); return s == 0 ? grumpa : (s == 1 ? dad : *path[s-2]); } here s == 1, dad == 4. 0x0808486c in Branch (this=0xbfffdf70, n1=4, n2=1, p=@0x823b86c) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:75 75 Branch (Node n1, Node n2, const Phylogeny& p) : Node_pair (n1, n2), length (((Phylogeny&)p).branch_length (n1, n2)) {} 0x0808484c in Node_pair (this=0xbfffdf70, n1=4, n2=1) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:47 47 Node_pair (Node n1, Node n2) : pair (n1, n2) {} 0x08084b2d in Phylogeny::branch_length(int, int) (this=0x823b86c, node1=4, node2=1) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:267 267 { return (*(_branch_length[min(node1,node2)].find(max(node1,node2)))).second; } see node with parent == left == right == 1 0x080848a4 in Branch (this=0xbfffdf70, n1=4, n2=1, p=@0x823b86c) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:75 75 Branch (Node n1, Node n2, const Phylogeny& p) : Node_pair (n1, n2), length (((Phylogeny&)p).branch_length (n1, n2)) {} Phylogeny::Branch_iter::setup_branch() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:159 159 } end setup_branch() Branch_iter (this=0xbfffe0a0, p=@0x823b86c, dad=4, grumpa=-1, top_down=false, visit_grumpa=true) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:225 225 } end Branch_iter 0x08084f27 in Phylogeny::branches_begin(int, int, bool, bool) const (this=0x823b86c, node=4, parent=-1, top_down=false, visit_parent=true) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:321 321 { return Branch_iter (*this, node, parent, top_down, visit_parent); } end branches_begin 323 Branch_iter branches_end () const { return Branch_iter (); } 208 Branch_iter() : phylogeny(0), finished(1) {} 0x08084e98 in Branch (this=0xbfffe084) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:73 73 Branch () : Node_pair (), length (0) {} 0x08084eb4 in Node_pair (this=0xbfffe084) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:46 46 Node_pair () : pair (-1, -1) {} 0x08084d16 in Phylogeny::Branch_iter::operator!=(Phylogeny::Branch_iter const&) const (this=0xbfffe0a0, i2=@0xbfffe060) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:167 167 bool operator!=(const Branch_iter& i2) const { return !(*this == i2); } think both path == i2.path and finished == i2.finished are false EM_matrix::Column_matrix::initialise(Tree_alignment const&, int, std::vector > const&, std::map, std::allocator > > const*) (this=0x8248670, tree_align=@0x823b868, column=0, seq_coords=@0xbfffe160, wildcard=0xbfffe180) at /home/pete/dart/src/hsm/em_matrix.cc:558 558 const int p = (*b).first; const int p = (*b).first; const int n = (*b).second; const int r = tree_align.node2row[n]; here p == 4 (parent) n == 1 (node) r == 1 (row?) if (r < 0) // row has no data; allocate '*'s generously 571 else if (align.path(r,column)) // allocate whatever's at this row, if it's not a gap 0x0808567e in Alignment_path::operator()(int, int) const (this=0x823bbb0, row=1, col=0) at /home/pete/dart/src/ecfg/../seq/alignment.h:357 bool Alignment_path::operator() (int row, int col) const 357 { return path_data[row][col]; } pk here == true at /home/pete/dart/src/hsm/em_matrix.cc:572 572 if (align.prof[r]) 573 node_scores[n] = &(*align.prof[r])[seq_coords[r]]; at /home/pete/dart/src/hsm/em_matrix.cc:556 // prune long lineages of '*'s for_rooted_nodes_pre (tree, b) 556 for_rooted_nodes_post (tree, b) Phylogeny::Branch_iter::operator++() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:171 171 if (top_down) // top-down 183 if (path.size() == 0) finished = 1; pk we're bottom up size == 1 186 if (++path.back() == end.back()) { path.pop_back(); end.pop_back(); } guess ++path.back() is node 0, end.back(0) is node 1 not == else find_children(); at /home/pete/dart/src/ecfg/../tree/phylogeny.h:101 101 Relative_iter& operator++() { while (i != end) if ((*(++i)).first != avoid_node) break; return *this; } guess (*(++i)).first takes to node 0 254 return *this; pk believe node 0 bool operator== (const Relative_iter& r) const { return i == r.i && avoid_node == r.avoid_node; } pk first comparison false. Phylogeny::Branch_iter::operator++() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:187 187 else find_children(); Phylogeny::Branch_iter::find_children() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:139 139 if (ignore_kids) { ignore_kids = 0; return 0; } have been here before: while (1) { Node n = most_recent_node(), p = most_recent_parent(); now n == 3 p == 4 Child_iter c = phylogeny->children_begin (n, p); node 0 Child_iter e = phylogeny->children_end (n, p); if (c == e) return 0; not taken path.push_back(c); end.push_back(e); if (top_down) return 1; not taken, repeat loop. } { Node n = most_recent_node(), p = most_recent_parent(); now n == 0 p == 3 Child_iter c = phylogeny->children_begin (n, p); node 0 Child_iter e = phylogeny->children_end (n, p); if (c == e) return 0; this time taken Phylogeny::Branch_iter::find_children() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:151 151 } Phylogeny::Branch_iter::operator++() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:188 188 if (path.size() == 0) finished = !visit_grumpa; now path.size() == 2 finished == false 191 setup_branch(); Phylogeny::Branch_iter::setup_branch() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:155 155 if (most_recent_parent() != -1) 135 Node most_recent_parent() const { int s = path.size(); return s == 0 ? grumpa : (s == 1 ? dad : *path[s-2]); } pk most_recent_parent now 3. 156 branch = Branch (most_recent_parent(), most_recent_node(), *phylogeny); most_recent_node is 0 ... 0x0808486c in Branch (this=0xbfffdfd0, n1=3, n2=0, p=@0x823b86c) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:75 75 Branch (Node n1, Node n2, const Phylogeny& p) : Node_pair (n1, n2), length (((Phylogeny&)p).branch_length (n1, n2)) {} at /home/pete/dart/src/ecfg/../tree/phylogeny.h:267 267 { return (*(_branch_length[min(node1,node2)].find(max(node1,node2)))).second; } believe again going to node 1 (see above) 159 } end setup_branch() Phylogeny::Branch_iter::operator++() (this=0xbfffe0a0) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:192 192 return *this; this: dad == 4, grumpa == -1, 193 } end // preincrement: move to the next Branch 0x08084d16 in Phylogeny::Branch_iter::operator!=(Phylogeny::Branch_iter const&) const (this=0xbfffe0a0, i2=@0xbfffe060) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:167 167 bool operator!=(const Branch_iter& i2) const { return !(*this == i2); } at /home/pete/dart/src/hsm/em_matrix.cc:558 558 const int p = (*b).first; for_rooted_nodes_post (tree, b) { const int p = (*b).first; const int n = (*b).second; const int r = tree_align.node2row[n]; also been here before - now p == 3, n == 0, r == 0 prev. p == 4 n == 1 r == 1 562 if (r < 0) // row has no data; allocate '*'s generously 571 else if (align.path(r,column)) // allocate whatever's at this row, if it's not a gap 573 node_scores[n] = &(*align.prof[r])[seq_coords[r]]; 556 for_rooted_nodes_post (tree, b) { const int p = (*b).first; now is 3 const int n = (*b).second; now is 2 const int r = tree_align.node2row[n]; now is 2 loop again const int p = (*b).first; now is 4 const int n = (*b).second; now is 3 const int r = tree_align.node2row[n]; now is -1 562 if (r < 0) // row has no data; allocate '*'s generously 564 for_children (tree, n, p, c) 0x08084b97 in Phylogeny::children_begin(int, int) const (this=0x823b86c, node=3, parent=4) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:313 313 { return Relative_iter (_branch_length[node].begin(), _branch_length[node].end(), parent); } 565 if (node_scores[*c] != 0) (gdb) p node_scores[*c] $152 = (const Symbol_score_map * const&) @0x8242e00: 0x8246068 567 node_scores[n] = wildcard; 568 break; loop again const int p = (*b).first; now is -1 const int n = (*b).second; now is 4 const int r = tree_align.node2row[n]; now is -1 finally done looping 556 for_rooted_nodes_post (tree, b) 577 for_rooted_nodes_pre (tree, b) 0x08084ee8 in Phylogeny::branches_begin(int, int, bool, bool) const (this=0x823b86c, node=4, parent=-1, top_down=191, visit_parent=255) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:321 321 { return Branch_iter (*this, node, parent, top_down, visit_parent); } Branch_iter (this=0xbfffe060, p=@0x823b86c, dad=4, grumpa=-1, top_down=true, visit_grumpa=true) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:220 220 { pk prev. started with top_down=false now =true 0x08084e98 in Branch (this=0xbfffe084) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:73 73 Branch () : Node_pair (), length (0) {} 46 Node_pair () : pair (-1, -1) {} pk this same as above Branch_iter (this=0xbfffe060, p=@0x823b86c, dad=4, grumpa=-1, top_down=true, visit_grumpa=true) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:221 221 if (!(top_down && visit_grumpa)) prev. time top_down=false 224 setup_branch(); Phylogeny::Branch_iter::setup_branch() (this=0xbfffe060) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:155 155 if (most_recent_parent() != -1) this time most_recent_parent == -1. 158 branch = Branch (most_recent_parent(), most_recent_node(), 0); ... 225 } end Branch_iter 0x08084d7c in Phylogeny::branches_end() const (this=0x823b86c) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:323 323 Branch_iter branches_end () const { return Branch_iter (); } // default constructor returns end() iterator // 208 Branch_iter() : phylogeny(0), finished(1) {} ... 0x08084eb4 in Node_pair (this=0xbfffe0c4) at /home/pete/dart/src/ecfg/../tree/phylogeny.h:46 46 Node_pair () : pair (-1, -1) {} EM_matrix::Column_matrix::initialise(Tree_alignment const&, int, std::vector > const&, std::map, std::allocator > > const*) (this=0x8248670, tree_align=@0x823b868, column=0, seq_coords=@0xbfffe160, wildcard=0xbfffe180) at /home/pete/dart/src/hsm/em_matrix.cc:579 579 const int p = (*b).first; // prune long lineages of '*'s for_rooted_nodes_pre (tree, b) { const int p = (*b).first; p now -1 const int n = (*b).second; n now 4 581 if (p < 0 ? TRUE : node_scores[p] == 0) // only proceed if parent is gapped 582 if (node_scores[n] == wildcard && wildcard != 0) // only proceed if node is a wildcard 584 int n_children = 0; ... 592 if (n_ungapped_children < 2 && n_children > 1) here n_ungapped_children == n_children == 2 577 for_rooted_nodes_pre (tree, b) loop again const int p = (*b).first; p now 4 const int n = (*b).second; n now 1 and again const int p = (*b).first; p now 4 const int n = (*b).second; n now 3 loop until p is 3, n is 2 (guess the end): 597 initialise (tree_align.tree, node_scores); at /home/pete/dart/src/hsm/em_matrix.cc:626 626 for (int n = 0; n < nodes; ++n) remember nodes == 5 for (int n = 0; n < nodes; ++n) { gapped[n] = 0; allowed[n].clear(); root[n] = -1; leaf[n] = 1; } 634 for_rooted_nodes_pre (tree, b) const Phylogeny::Node p = (*b).first; const Phylogeny::Node c = (*b).second; start: p == -1 c == 4 // if child row has a gap, it's not a leaf 640 if (node_scores[c] == 0) not taken 644 if (!gapped[c]) 646 if (p >= 0) leaf[p] = 0; // if child row is ungapped, parent is not a leaf // if parent row is ungapped, child is not a clique root 649 if (p >= 0 ? !gapped[p] : 0) 653 root[c] = c; // start a new clique with this as the root actually the root of the entire tree 654 clique.push_back (c); // get allowed states 658 for_const_contents (Symbol_score_map, *node_scores[c], ss) 660 allowed[c].push_back (ss->first); 661 U[c][ss->first] = Score2Nats (ss->second); -9.8765432100000003e+99 -InfinityLoge loop here a while (maybe 20 times) 665 } end void EM_matrix::Column_matrix::initialise (const PHYLIP_tree& tree, const vector& node_scores) were all children initialized or only the root (4)? at /home/pete/dart/src/hsm/em_matrix.cc:598 598 } end void EM_matrix::Column_matrix::initialise (const Tree_alignment& tree_align, int column, const vector& seq_coords, const Symbol_score_map* wildcard) Alignment_matrix (this=0xbfffe440, hsm=@0xbfffe894, n_align=0, alloc_class_labels=false) at /home/pete/dart/src/hsm/em_matrix.cc:683 683 for (int col = 0; col < align.columns(); align.path.inc_seq_coords(seq_coords,col), ++col) we've been here before - now col == 1 align.columns() == 4 problem with breakpoint - back to Update_statistics (this=0xbfffe580, states=20) at /home/pete/dart/src/hsm/em_matrix.cc:361 361 clear(); EM_matrix::Update_statistics::clear_DJU() (this=0xbfffe580) at /home/pete/dart/src/hsm/em_matrix.cc:377 377 } 362 } 480 up_down (stats, 0, symmetrise, infer_class_labels); ... 966 for (int n_align = 0; n_align < max_n; ++n_align) EM_matrix::Alignment_matrix::Alignment_matrix at /home/pete/dart/src/hsm/em_matrix.cc:674 674 { recently explored in depth - try b 689, b 969 - hit 969, not 689. Breakpoint 4, EM_matrix::up_down(EM_matrix::Update_statistics&, bool, bool, bool) const (this=0xbfffe894, stats=@0xbfffe580, likelihood_only=false, symmetrise=true, infer_class_labels=false) at /home/pete/dart/src/hsm/em_matrix.cc:969 969 alnmat.fill_up(); EM_matrix::Alignment_matrix::fill_up() (this=0xbfffe440) at /home/pete/dart/src/hsm/em_matrix.cc:745 745 for (int col = 0; col < align.columns(); ++col) 746 colmat[col].fill_up (hsm, tree, col); tree=@0x823b86c, column=0) at /home/pete/dart/src/hsm/em_matrix.cc:697 697 for_rooted_nodes_post (tree, b) const Phylogeny::Node p = (*b).first; p == 4 const Phylogeny::Node c = (*b).second; c == 1 702 if (!gapped[c]) 704 if (c == root[c]) here c == 1, root[c] == 4 711 const Timepoint_data& pc = hsm.timepoint_data ((*b).length); 712 for_const_contents (vector, allowed[p], i) preprocessed to for ( vector ::const_iterator i = ( allowed[p] ).begin(), DART_END_ITERi = ( allowed[p] ).end(); i != DART_END_ITERi; ++ i ) 714 Loge csum = -InfinityLoge; 715 for_const_contents (vector, allowed[c], j) preprocessed to for ( vector ::const_iterator j = ( allowed[c] ).begin(), DART_END_ITERj = ( allowed[c] ).end(); j != DART_END_ITERj; ++ j ) 716 NatsPSumAcc (csum, NatsPMul (pc.M(*i,*j), U[c][*j])); what's pc? reference to Timepoint_data csum == 11? array2d::operator()(int, int) const (this=0x824aa2c, x=0, y=11) at /home/pete/dart/src/ecfg/../util/array2d.h:63 63 if (x < 0 || x >= _xsize || y < 0 || y >= _ysize) here _xsize == _ysize == 20 maybe the 11 stands for "k", aa offset 11 in nd_chars - first aa in sequence 1 YIS1_YEAST 66 return _data [x + y * _xsize]; (gdb) p _data [x + y * _xsize] $18 = (const double &) @0x824b138: -2.9985785743847853 looks like a log liklihood - how did it get there? guess coming from M(i,j) - see in log file: Calculating M() & J() for time T=5.57 M(i,j) at T=5.57: followed by lots of numbers near -2.99858 that log message comes from em_matrix.cc in EM_matrix::update_timepoint_cache() note: according to the log file, the branch length joining nodes 1, 3 is 5.57136