to Python Basics HW.
Here your task is to reverse complement a FASTA file. Part of the solution has already been described in the Python lecture notes. Here are the requirements for your script:
- Open a file, whose filename is specified by the user as a command-line argument. That is, if the name of your Python script is
programname and the name of the file is
filename, then the script should be run by typing the following at the Unix command line:
- Do some basic error handling: verify that the user entered a filename and that the file can be opened. If not, print informative error messages and exit.
- Read the contents of the file, assuming it is a FASTA file of DNA sequences, and as you're doing so, print the name and reverse-complement of every sequence on the standard output, in FASTA format. This means that you have to output no more than 80 characters per line! (You only have to worry about this for the actual sequence, not the description line).
- Enable a command line argument and the program logic to output the complement as an RNA sequence. That is, if you use the command line to type:
$ programname filename rna
the program should output the complement as RNA instead of DNA (U's instead of T's).
- You should add the sequence length L in basepairs to the end of the sequence label line that starts with ">" using a format of ", L bp". If the line was originally "> GFP, mut3", which had a dna length of 450, it should now read "> GFP, mut3, 450 bp".
- You will be graded on correctness (90%) and style (10%). Please see the StyleGuidelines for expectations about your style. You are not expected to use functions for this exercise, though you certainly may, so the "no redundancy" requirement is relaxed.
- Turn in your program by uploading it to your individual wiki page.
- As mentioned in lecture, you may work with 1 other student if you so choose. (Remember, you can turn in at most 3 other assignments with the same student). If you do work with another student, put "I worked with xxxx" in a comment at the top of your .py file. Each person should turn in code, even if it's the same.
- Information about LATE assignments: You lose 20% of the points of the assignment for every day it's late. Contact %GSI% or Professor Holmes at least 48 hours before the due date if you have extenuating circumstances, if reasonably possible.
Here are some hints/things to think about:
- A general tip for writing programs - always try to write out in English what you want the program to do, and before you start writing a complicated program, write out pseudocode first...that is, a quasi-code-like representation of what your code will need to look like.
- Create your own test files by visiting the NCBI website to find nucleotide sequences. Search for some protein you know of (eg, hemoglobin) to get a long listing of results, then select a couple sequences from the list (avoid the 'whole genome' sequences and stick to the 'mRNA' sequences so you don't end up trying to process ridiculously huge files). Then on the dropdown boxes near the top, you have the option to show the selected sequences in FASTA format and also to save them in a file.
- Keep in mind that a valid FASTA file can contain 1 or more sequences. Test your script first with one sequence and then add more to your test file.
- If a sequence is longer than a line, you cannot just do a line-by-line reverse complement
- Don't get frustrated if your program doesn't work the way you want in the beginning. Even with many years of programming, a program rarely works on the first try!
Copyright © 2008-2013 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback