click on the Biowiki logo to go to homepage
Edit Raw Print
Links Diffs RSS
About Stats Recent


Research Teaching Blog
Fall10 | Sandbox
Biowiki > Teaching > Bio E 131 > ArraysAndHashesLab

Search

Advanced search...

Topics

PageRank Checker

Arrays and Hashes in Perl

By the end of this lab, you should know:

  • how to use arrays and hashes in Perl
  • one way to write the reverse complement exercise
  • some things to keep in mind when doing error-checking


0. Before we get started...

  • In the following lab, we will use a $ symbol to signify a UNIX prompt. The prompt you see in your terminal window may/may not end in a $, but just remember that you don't actually type in the $, just the stuff after it.

  • This lab assumes that you are now comfortable with using a text editor to write a Perl script, the proper format of a Perl script, and the UNIX commands you have to issue to make your script executable. In addition, you should already know how to read arguments from the command line, read data from user input (i.e. standard input), and read from files. If you need a refresher/reminders along the way, please refer back to Perl Basics.

1. Arrays

By now, you already know the basics of writing and executing Perl scripts, and you even wrote one for reverse complementing sequences in a FASTA file.

As you begin to write more complex programs, data structures like arrays and hashes will become increasingly useful. These list structures are great for storing a set of items and providing easy access to them. At a very simplistic level, the major difference between arrays and hashes is just the way you access the elements within them - in an array, you access elements using integer indices and in a hash, you access elements using keys.

The word "key" is ambiguous on purpose, because nearly anything can be a key. What happens on a very low level when you ask for the element with the key X in a hash is to convert X into some numerical value using some built-in hash function and then use that numerical value to access an internal array. Why then, you might ask, wouldn't I write my own array and my own hash function? Well, you could and in some programming languages, you have to if you want to use such a structure. But, the purpose of hashes (aka hash tables) is to provide very quick access to its elements and this all comes down to a good hash function. In computer science classes, you can spend weeks talking about what's a good hash function, so for most of us, using built-in hashes means not having to worry about any of this.

So with that in mind, let's start with arrays. Arrays are useful for storing a list of similar items, e.g. a list of numbers, a list of strings, and even a list of lists. You've already used arrays in the last lab - the @ARGV you used to retrieve command line arguments is an array of strings that Perl creates for you. Perl has a lot of built-in functions for working with arrays and lists in general and we're definitely not going to go through all of them. See the perldoc at perl.org for a complete listing.

  • Creating/Using Arrays - In lecture, we talked about numerous ways to create arrays and how to access array elements. Let's briefly review:
    # all of the following are valid ways to create an array
    
    @a = (1,2,3,4,5); 
    @b = ('a','c','g','t'); 
    @c = 1..5;
    @d = qw(a c g t);
    
    # to access an element in an array, you use the square brackets []. and since each element
    # is a scalar, you precede it with a $ instead of the @ used for arrays
    
    print "$a[0]\n";    # remember that Perl array indices start at 0
    
    $i = 2;
    print "$a[$i]\n";   # the index you use to access the array element can even be stored in another variable
    

    Hopefully, this all sounds vaguely familiar. OK, let's try making our own array! We'll work again with our favorite file format (FASTA). Let's try to read in all the sequence names from a FASTA file and store them in an array (we'll refer to this script as seqNamesArray.pl):

    #! /usr/bin/perl -w
    
    open (FASTAFILE, $ARGV[0]);    # the user will enter the filename from the command line
    $index = 0;
    while ($line = <FASTAFILE>) {
            if ($line =~ /^>/) {
                    chomp($line);       # what happens if you leave out this chomp?
                    $names[$index++] = $line;    # notice how you can increment the index at the same time.  Also, we never defined @names; what does perl assume we mean?
            }
    }
    print "@names\n";       # check out how easy it is to print out an array in Perl!
    close (FASTAFILE);
    

    Test this script out with hemoglobin.fasta. You'll notice that it'll print out a pretty big mess of sequence names, one write after the other separated by only a space. You'll notice that it saves the > character into the array as well, which is not that nice. How do we get rid of that first character? There are a couple of ways to do it, the simplest of which involves using regular expressions. But since we won't be talking too much about regular expressions until next week's lab, let's do it a different way using the substr function, which extracts a portion of a string, starting at a specified offset:

                    $names[$index++] = substr($line, 1);
    
    Try this out. The first character of the string is at offset 0, so substr will return everything after the > character. We'll go over a better way of printing out this array shortly.

  • List context vs. scalar context - Prof Holmes mentioned this briefly in lecture and there's a whole section on 'Context' in the 'Learning Perl' book (also here, here, or on this newsgroup posting). We recommend reading more about the concept of Perl context if you're still confused after this lab since it's pretty important and can cause seemingly mysterious bugs in your programs otherwise.
    Without going into too much detail, every expression in Perl is evaluated in a particular context; scalar and list being the two major ones. For example, contrast these two statements:
    @array = EXPRESSION;        # EXPRESSION is expected to produce a list (list context)
    $scalar = EXPRESSION;       # EXPRESSION is expected to produce a scalar (scalar context)
    

    Here, the placeholder "EXPRESSION" stands for anything that can be a valid Perl expression. For example, an expression like @names can be evaluated in either list or scalar contexts. Depending on which context it finds itself in, this expression produces a different value. Let's add onto our script and try this out:

    @array = @names;
    $scalar = @names;
    print "list context: @array\n";
    print "scalar context: $scalar\n";
    

    Notice how the expression @names in a scalar context returns the length of the array! This is actually pretty useful, so it's a handy tip to remember. Check out some of the links above for more complicated examples on context.

  • Iterating through arrays - Once you've stored data into an array, you probably want some way to get the data back out. We already saw how to print out the whole array, because the print statement is smart and does this for us automatically. But there will be many other situations when you will need to go through each element of the array one-by-one (aka iterate through an array). Let's say in our FASTA example, we want to check if we have a sequence for 'protease' in the file. We already read in the file and stored the names in an array, so we just need to check each element in the array and see if any one of them is 'protease'. An easy way to iterate through an array is using the foreach loop:
    $found = 0;
    foreach $name (@names) {      # this will keep setting $name to a different element in the array
            if ($name eq "protease") {
                    $found = 1;
            }
    }
    print "found protease in file!\n" if $found; # Note that here we are using Perl's alternative if statement syntax
    

    In fact, this is not a great way to look for something because you have to look through the entire array before you find what you're looking for. Even if we were clever and we added some code to stop going through the array when you've found "protease", you may still have to go through the entire array if "protease" is at the end. There are lots of ways to improve this algorithm but they are mostly outside the scope of this class; some of these have to do how you perform the search and some have to do with how you store the data in the first place. When we talk about hashes next, you'll see one way to improve the performance of this search.

    Now modify seqNamesArray.pl to use foreach to print out the name of each sequence on a separate line.

  • Doing something to the whole array - Sometimes, you want to do the same thing to everything in the array. Obviously, you can iterate through the array and do what you want to each element of the array, but Perl provides an even simpler way to do this, via a function map. Say in our case, we want to turn all of our names to uppercases.
    @uc_names = map(uc($_), @names);
    print "@uc_names\n";
    

    What the map function does is essentially iterate through the @names array for you, setting each element to the default variable $_ and then applying the function you specified, i.e. uc, on it. The map function returns the results in another array, so it doesn't actually modify the original array you passed in.

    Another useful function is grep, which is very similar to map. Instead of applying a function to each element of an array, it checks whether each element satisfies a specified condition and returns only those elements that satisfied it. For example, if we want only those names that are longer than 10 characters,

    @long_names = grep(length($_) > 60, @names);
    print "Long Names: @long_names\n";
    

2. Hashes

Now that we've played around with arrays, let's move on to hashes. As mentioned before, hashes are pretty similar to arrays, except that instead of using integers to access them, you use keys.

  • Creating/Using hashes - There are a couple of ways you can directly assign values into a hash. Specifically, in lecture, we talked about
    %comp = ('Cyp12a5' => 'Mitochondrion', 
             'MRG15' => 'Nucleus', 
             'Cop' => 'Golgi', 
             'bor' => 'Cytoplasm', 
             'Bx42' => 'Nucleus');
    

    Working with the same FASTA file we were using for seqNamesArray.pl, let's now read the data into a hash. This time, we'll read both the names and the sequences and store them into the hash accordingly, with the names used as keys. Remember that elements in a hash are accessed using a set of curly braces { } instead of the square brackets for arrays. (You may also notice that the structure of this script is quite similar to the solution for the reverse complement exercise -- if you don't understand why the program is structured this way, read the next section for an explanation.)

    #! /usr/bin/perl -w
    
    open (INPUT, $ARGV[0]);
    while ($line = <INPUT>) 
    {
            chomp $line;
            if ($line =~ /^>/) 
            {
                    if (defined($seq)) 
                    {
                            $sequences{$name} = $seq;
                            $seq = "";
                    }
                    $name = substr($line, 1);
            } 
            else 
            {
                    $seq .= $line;
            }
    }
    $sequences{$name} = $seq;
    close (INPUT);
    

    Unfortunately, unlike arrays, if you try to put %sequences into a print statement, it doesn't print out the results in a pretty way (just each key and corresponding value mashed into each other). One way to check whether you put in all the data is to use the built-in functions keys and values for hashes, which return arrays that you can just print out nicely:

    @seq_keys = keys %sequences;
    print "@seq_keys\n";
    

  • Iterating through hashes - Like arrays, sometimes you will need to go through each element in your hash one-by-one. The easiest way to do this is via the keys and values functions, which return an array that you can iterate through using foreach. Say we want to see if there's a sequence for 'protease' in the file again:
    $found = 0;
    foreach $name (keys %sequences) {
            if ($name eq "dna polymerase") {
                    $found = 1;
            }
    }
    print "found protease in file!\n" if $found;
    

  • But why iterate when you can... - So now we come to one of the nice things about hashes. In the above example, we wanted to find out if there is a protease in the sequence file. We did this by reading the file into a hash and then going through each element of the hash and seeing if its key (name) matches "protease". But because hash elements are accessed by their keys, we can actually just try to access the value associated with "protease". If there is no element associated with a particular key, we'll get an undef value returned to us.
    print "there is a protease in file\n" if defined($sequences{"protease"});
    print "there is no kinase in file\n" if !defined($sequences{"kinase"});
    
    Nice, huh? Not only is the code you write shorter, but the time it took to find out whether something exists in the hash is much much shorter than it took to look through an entire array. This doesn't matter so much for this short example but when you have massive amounts of data, you'll appreciate this performance difference.

3. Reverse Complement a FASTA file

For the homework, you were given the task to write a Perl script that would read a FASTA file, specified by the user, and print to the screen the reverse complement of every sequence in the file in FASTA format. Now that you've tried your hand at this, let's go through the meat of the problem using hashes. We'll start simple, and build to the whole solution.

First, let's try to write our program assuming there is only one sequence in the FASTA file first. Reverse complementing a sequence that spans multiple lines does not mean reverse complementing each line individually. So we need to store each line of the sequence and then when we've reached the end, do the reverse complement.

So let's describe what we want to do in English: "We want to open the file specified by the user and read lines from it. If we see a line that begins with >, we don't need to process it and can just print it out (remember, this is because our output has to also be in FASTA format). If a line doesn't begin with >, it's part of the sequence and we want to store it. Once we're done with reading the sequence (in this case, when we're done with the file because we're assuming 1 sequence per file), we will do the reverse complement."

Describing what we want to do in English helps us plan out our program before we start worrying about the specifics of Perl syntax. This is a really good habit to get into. You can even go one step further and plan out the structure of your program by writing "pseudo-code":

    open file specified by user
    for each line in the file {
         if starts with > {
              print line to screen w/o processing
         } else {
              save the line as part of the sequence
         }
    }
    reverse complement the sequence
    

Why would you go through the trouble to write this psuedo-code? Well, first it makes you plan out the program as a whole. And second, it breaks down your program into small tasks - if you start by writing this out in your text editor, all you have to do next is to replace each line of the pseudo-code with the set of Perl commands that will accomplish each small task. (Some programmers even like to save the pseudo-code as comments in a program, because pseudo-code is much easier to read than actual code.)

So now let's go through each line of the pseudo-code:

  • open file specified by user : No problem! You did this during the last lab

      open (FASTAFILE, $ARGV[0])

  • for each line in the file : This can be accomplished using a while loop and with the < > operators. You can save each line to either a named variable or the default variable of $_. We'll use the named variable method.

      while ($line = <FASTAFILE>)

  • if starts with > : We can do this by pattern matching the >

      if ($line =~/^>/)

  • print line w/o processing : That's a simple print statement

      print $line;

  • save the line as part of the sequence : This task is a little trickier. Since we don't know how long the sequence will be, we basically need to keep adding onto the sequence until the file ends. Fortunately, Perl provides an easy way to do this with the string concatenation operator ".". Notice that we use the two-for-one operator ".=" which concatenates and assigns the value in the same call. Also, each line we read using the < > brackets will end in a newline character, which we don't want to keep in the sequence, so we use chomp to strip it out.

      chomp ($line);
      $seq .= $line;

    You may have noticed that we introduced a new variable $seq here. Since Perl by default doesn't require that we declare variables ahead of time, it won't mind. Until we use this variable the first time, i.e. the first time we read a sequence line, $seq will simply be undefined.

  • reverse complement the sequence : OK, at this point in the program, we've stored the full sequence in $seq and we can do the actual reverse complement

      $rev = reverse(uc($seq)); # Convert sequence to uppercase and reverse the order
      $rev =~ tr/ACTG/TGAC/; # Transliterate the sequence: exchange each letter in the set {A,C,T,G} by {T,G,A,C} respectively
      print "$rev\n";

So the full script so far is:

    #! /usr/bin/perl
    
    open (FASTAFILE, $ARGV[0]);
    while ($line = <FASTAFILE>) {
            if ($line =~ /^>/) {
                    print $line;
            } else {
                    chomp($line);
                    $seq .= $line;
            }
    }
    $rev = reverse(uc($seq));
    $rev =~ tr/ACTG/TGAC/;
    print "$rev\n";
    

Now that we can take care of one sequence in the file, we can extend the program to handle multiple sequences. What needs to change? Well, in a file with multiple sequences, we can't just reverse complement when we reach the end of the file; we need to do it each time we're done reading in one whole sequence. How do we know when we're done reading in one whole sequence? We know when we either reach the end of the file or encounter another line that starts with >. So we copy the code that processes and outputs the sequence to the if clause:

    #! /usr/bin/perl
    
    open (FASTAFILE, $ARGV[0]);
    while ($line = <FASTAFILE>) {
            if ($line =~ /^>/) {
                    $rev = reverse(uc($seq));
                    $rev =~ tr/ACTG/TGAC/;
                    print "$rev\n";
                    $seq = "";
                    print $line;
            } else {
                    chomp($line);
                    $seq .= $line;
            }
    }
    $rev = reverse(uc($seq));
    $rev =~ tr/ACTG/TGAC/;
    print "$rev\n";
    

One last problem: With the program above, the first time we encounter a line starting with >, the script will try to reverse complement $seq. But we haven't even read a sequence in yet! So, we should put in a check, to make sure we've actually saved a sequence before we try to process it. So the final program is

    #! /usr/bin/perl
    
    open (FASTAFILE, $ARGV[0]);
    while ($line = <FASTAFILE>) {
            if ($line =~ /^>/) {
                    if (defined($seq)) {
                            $rev = reverse(uc($seq));
                            $rev =~ tr/ACTG/TGAC/;
                            print "$rev\n";
                            $seq = "";
                    }
                    print $line;
            } else {
                    chomp($line);
                    $seq .= $line;
            }
    }
    $rev = reverse(uc($seq));
    $rev =~ tr/ACTG/TGAC/;
    print "$rev\n";
    

The program is working now but there's an important change we can make. We can make it more modular by moving the code that does the reverse complement into a subroutine. This way if the code changes, we only have to make the change in one location. This way the code has greater maintainability. Code maintenance becomes a big issue as the code length and complexity increases.

    #! /usr/bin/perl
    
    open (FASTAFILE, $ARGV[0]);
    while ($line = <FASTAFILE>) {
        if ($line =~ /^>/) {
            if (defined($seq)) {
                $rcSeq = revComp($seq);
                print "$rcSeq\n";
                $seq = "";
            }
            print $line;
        } else {
            chomp($line);
            $seq .= $line;
        }
    }
    $rcSeq = revComp($seq);
    print "$rcSeq\n";
    
    sub revComp()
    {
        my $sequence = shift(@_); # @_ contains the subroutine arguments 
        $rev = reverse(uc($sequence));
        $rev =~ tr/ACTG/TGAC/;
        return $rev;
    }
    

Notice that when you run the program with the -w (warning) flag, you get a warning about calling a function too early to check the prototype. This is perl-speak for saying that you're calling a subroutine before it's been declared in the code. The program still functions, because it actually does correctly declare and use the function, but perl has no way of knowing this until actually executing. As per the perldoc:


%s() called too early to check prototype
(W prototype) You've called a function that has a prototype before the parser 
saw a definition or declaration for it, and Perl could not check that the call 
conforms to the prototype. You need to either add an early prototype declaration 
for the subroutine in question, or move the subroutine definition ahead of the 
call to get proper prototype checking. Alternatively, if you are certain that you're 
calling the function correctly, you may put an ampersand before the name to avoid the warning.

For this class, you can just ignore the warning (or use the ampersand). However, it's better programming practice to declare function prototypes if you're working on a larger project.

Also notice how we used the shift function to get the arguments of the subroutine. shift will remove the first element of the array and returns it. It turns out that it was a really good idea to use the subroutine because then we can do add the option for outputting in RNA format easily.

    #! /usr/bin/perl
    
    open (FASTAFILE, $ARGV[0]);
    $rnaFlag = $ARGV[1]; # get the second command line argument
    while ($line = <FASTAFILE>) {
        if ($line =~ /^>/) {
            if (defined($seq)) {
                $rcSeq = revComp($seq);
                print "$rcSeq\n";
                $seq = "";
            }
            print $line;
        } else {
            chomp($line);
            $seq .= $line;
        }
    }
    $rcSeq = revComp($seq);
    print "$rcSeq\n";
    
    sub revComp()
    {
        my $sequence = shift(@_); # The my scope operator makes this variable private to this subroutine.
        $rev = reverse(uc($sequence));
        if (defined($rnaFlag) && (lc($rnaFlag) eq "rna") )
        {
            $rev =~ tr/ACTG/UGAC/; # A->U for RNA
        }
        else
        {
            $rev =~ tr/ACTG/TGAC/; # A->T for DNA
        }
        return $rev;
    }
    

Now we can add in the part about printing out the length of the sequence in base pairs. Since we have to print out the sequence in two places, we can put this code in a subroutine too, and while we're at it, we'll have it print out in columns of 80. It's up to the programmer how much to modularize the code. It's usually a good idea to put code in its own subroutine if it gets called multiple times, can possibly be reused elsewhere, and especially if it's likely to change in the future. Note that we moved the chomp command to operate on all lines that are read from the file. This ensures that when we add text to the sequence name line, we don't have an embedded line feed character.

    #! /usr/bin/perl
    
    open (FASTAFILE, $ARGV[0]);
    $rnaFlag = $ARGV[1];
    while ($line = <FASTAFILE>) {
        chomp $line;
        if ($line =~ /^>/) {
            if (defined($seq)) {
                $rcSeq = revComp($seq);
                printSeq($seqName, $rcSeq);
                $seq = "";
            }
            $seqName = $line;
        } else {
            $seq .= $line;
        }
    }
    $rcSeq = revComp($seq);
    printSeq($seqName, $rcSeq);
    
    sub revComp()
    {
        my $sequence = shift(@_); 
        $rev = reverse(uc($sequence));
        if (defined($rnaFlag) && (lc($rnaFlag) eq "rna") )
        {
            $rev =~ tr/ACTG/UGAC/;
        }
        else
        {
            $rev =~ tr/ACTG/TGAC/;
        }
        return $rev;
    }
    
    sub printSeq()
    {
        my $name = shift(@_);
        my $sequence = shift(@_);
        my $len = length($sequence);
        $name .= ", $len bp";
        print "$name\n";
        while($sequence =~ /(.{80})|(.+)/g)
        {
            print $1 , "\n" if $1;
            print $2 , "\n" if $2;
        }
    }
    

I'll leave you to work out how the text columnation works

4. Checking for errors

When programming, it's good practice to think of a list of errors that can occur when a user runs your program (aka runtime errors) and how you might handle some of these errors. What kinds of runtime errors can occur? How do you think of possible errors? There's no easy way to answer these questions, and learning how to handle errors is a skill that you learn and get better at through experience.

When you first start thinking about the errors that can happen in your program, you should think about the assumptions you make in your program and what a user can do to make these assumptions be wrong and cause the program to behave unexpectedly (e.g. crashing). One obvious way the user can cause errors in your program is through the input s/he passes to your program. When thinking about error-handling, ask yourself lots of 'What if' questions. In the case of our reverse complement script, you might ask yourself:

  • What if the filename the user specifies is not a real file? (The program assumes that it can open the file using the filename from the command line)
  • What if the file the user told me to reverse complement is not a FASTA file? (The program assumes that it'll be processing a FASTA file)
  • What if the file is in FASTA format, but it doesn't contain DNA sequences? (The program assumes that the sequences will only contain ACGTs but a user could pass in protein sequences or random, non-sensical sequences)
  • What if the sequences are in all lowercase, all uppercase, or mixed cases? (In the program we wrote above, this won't be a problem because we convert everything to uppercase before we reverse complement, but this could be an issue if the program we write assumes all lowercase, for example)
  • What if the user calls the program without any arguments? (The program assumes that $ARGV[0] contains the filename)
  • What if the user calls the program with too many arguments? (Usually this case won't break your program, but you might still want to handle it by telling the user what's the proper way to use your program)

The lists of errors can become very long for complex programs and usually, it's pretty hard to think of every single thing the user can do to break your program. In this class, we won't be making you do exhaustive error-handling. But you should handle very obvious errors (e.g. not passing the right number of arguments into the program, file operation errors, file format errors, etc).

Note that runtime errors like the ones described above are different from bugs in the program. Bugs are caused by code that's not written correctly and when a user passes in a valid input, your program freaks out and crashes or does something weird. In this class, by the time a user uses your program, it should be bug-free. In the real world, software companies also try to make bug-free programs but when a program is so large that it's written by hundreds of people, that gets very difficult to do. In large and complex programs, runtime errors and bugs are usually discovered during testing. Software companies employ whole groups of people, "testers" or "QA engineers", to try to find bugs/errors in programs. Any of you who have ever used a product in alpha or beta test has helped out in the testing process (if you reported the bugs you found back to the software developers). Video game companies, too, employ large groups of testers to play video games all day to test out their latest games. Even with all the time and money spent on testing, bugs still get through - surely all of you have had programs on your home computer crash on you before.

But back to our list of errors...Once you have a list of errors, you need to think about how to handle them gracefully. For runtime errors, a very common error-handling strategy is to print out some sort of useful error message. For example, when a user passes incorrect number of arguments to your program, it might be nice to tell her what's a proper way to use your program. If a user gives you a filename that's not the name of a real file, it would be good to let her know so she can check if there was a typo. Obviously, a useful error message should not be something like "The program crashed," which won't help the user figure out what she did wrong.

An example: For the reverse complement program, if the user gives you a filename that's not a real file, you can just print out an error message "The file you specified does not exist." This can be done very simply using the || (or) operator, because of the way these expressions are evaluated. As Prof Holmes mentioned in lecture, in a statement like (A) || (B), if (A) evaluates to true, then Perl will not bother evaluating the second part of the expression (B) since that expression is already guaranteed to be true. So to handle nonexistent file errors:

    open(FASTAFILE, $ARGV[0]) || die "The file you specified does not exist";

So, here is our final version of the reverse complement script with some basic error handling added. Notice that we use the unless operator to print out an error message unless the user specified the filename.

    #! /usr/bin/perl
    
    die "Must enter filename!" unless defined($ARGV[0]);
    open (FASTAFILE, $ARGV[0]) || die "Can't open file $ARGV[0]!\n";
    $rnaFlag = $ARGV[1];
    while ($line = <FASTAFILE>) {
        chomp $line;
        if ($line =~ /^>/) {
            if (defined($seq)) {
                $rcSeq = revComp($seq);
                printSeq($seqName, $rcSeq);
                $seq = "";
            }
            $seqName = $line;
        } else {
            $seq .= $line;
        }
    }
    $rcSeq = revComp($seq);
    printSeq($seqName, $rcSeq);
    
    sub revComp()
    {
        my $sequence = shift(@_); 
        $rev = reverse(uc($sequence));
        if (defined($rnaFlag) && (lc($rnaFlag) eq "rna") )
        {
            $rev =~ tr/ACTG/UGAC/;
        }
        else
        {
            $rev =~ tr/ACTG/TGAC/;
        }
        return $rev;
    }
    
    sub printSeq()
    {
        my $name = shift(@_);
        my $sequence = shift(@_);
        my $len = length($sequence);
        $name .= ", $len bp";
        print "$name\n";
        while($sequence =~ /(.{80})|(.+)/g)
        {
            print $1 , "\n" if $1;
            print $2 , "\n" if $2;
        }
    }
    

Time to do a Perl exercise (Homework)

Perl Sequence Simulator

Administrative Concerns:

  • You will be graded on correctness (90%) and style (10%). Please see the Style Guidelines for expectations about your style.

  • Grading for correctness will be automated, so if your output format does not match the format described above, you will lose points. It's ok if you have a little extra whitespace at the ends of lines or between different classes of sequences.

  • Turn in your program by uploading the .pl file to bSpace. Contact Oscar if you have trouble with this.
  • As mentioned in lecture, you may work with 1 other student if you so choose. (Remember, you can turn in at most 3 other assignments with the same student). If you do work with another student, put "I worked with xxxx" in a comment at the top of your .pl file. Each person should turn in code, even if it's the same.

  • Information about LATE assignments: You lose 20% of the points of the assignment for every day it's late. Contact Josh or Professor Holmes at least 48 hours before the due date if you have extenuating circumstances, if reasonably possible.

-- AngiChau - 25 Sep 2005


  • Add discussion of split and join to array section.

Actions: Edit | Attach | New | Ref-By | Printable view | Raw view | Normal view | See diffs | Help | More...