-- ShicongXie - 08 Dec 2009
Semi-Automated Primer -> GFF for specific regions of interests
Sequencing Primers for specific features
One of the detection methods for viruses (either to pick out strain variation or clinical tests) is to sequence the genome, or more likely, specific genes in the genome corresponding to key viral gene products. Of course, this requires sequencing primers. Primer3 is a popular primer-picking tool, and was first pointed out as a resource by Ice, (Get Possible Primer). But there was no easy way to input a sequence into Primer3Plus of a specific gene, but with extra "padding" sequences for possible primer binding "buffer". So I wrote perl scripts to
- pull out from a GFF file (using BioPerl::DB::GenBank) the FASTA sequence of features with the given type and source fields.
- then, generate a "padded" version of those files given a "window". Additionally, this adds the {} (include region) and [] (target region) tags onto the sequence automatically.
Input into Primer3Plus
I tried running a local version of primer3, but the configuration options were confusing and it was VERY slow. So go to primer3plus, a better, web-based version.
Directly paste the fasta files generated by parse4primer3plus.pl; select "Sequencing" on the top left. You should get a page of optimal sequencing primers to use. You should avoid using the "upload" feature since it gets rid of the pre-generated [] and {} tags.
Parsing the output of primer3
The output is very straightforward if you're using primer3plus. However, there is no easy way to get the whole output into a text file, so maybe the most convenient thing to do is to manually enter the GFF lines.
Actually, there is an option on primer3plus to save your generated primers into the "Primer Manager". Go ahead and select all the primers that interest you (if you're doing sequencing primers, it should be all of them) and "Send them to Primer Manager" Go to the "Primer Manger" and download all your selected primers as a text file. (Extension: .fa)
Then use the following script to parse your file.
Script Documentations
seqfromGFF.pl
This program will take the genes present in a GFF file and use Bio Perl? to pull out specific sequences, given the --type and the --source. (e.g. --type gene, --source genbank) (Case insensitive.)
USAGE: seqfromGFF.pl input.gff --output output.fasta --type A --source B
Default output: output.fasta
parse4primer3plus.pl
This script will take the output of seqfromGFF.pl (input.fasta) and try to "pad" it with extra sequences from the original genome (source.fasta). The number of padding is given by --window. It will also add the [] and {} demarcations used to specify regions in primer3plus.
-
- USAGE
- parse4primer3plus.pl --input input.fasta --source source.fasta --output name --window 20
Default output: output.fasta
primer2GFF.pl
This script will take the output of a primer3plus primer manager .fa file, and convert it to a JBrowse-friendly GFF file.
- USAGE
- primer2GFF.pl
OUTPUT PATH: primers.gff
input file path
the name you want on the first field
of your GFF
|