Skip to content

Latest commit

 

History

History
24 lines (15 loc) · 1.11 KB

README.md

File metadata and controls

24 lines (15 loc) · 1.11 KB

This is a collection of small python functions that I have used to automatically

  • download Genbank files based on a list with Genbank IDs
  • subsequently extract information from the Genbank files based on the fields such as "source location"

The functions are derived from a number of posts at http://stackexchange.com/ that I have been looking through to solve this task. I have tested the functionality in the Anaconda environment on a Windows PC (http://continuum.io/downloads)

(Python 2.7.6 |Anaconda 1.9.2 (64-bit)| (default, Nov 11 2013, 10:49:15) [MSC v.1500 64 bit (AMD64)] on win32)

I have attached:

  • A demo for running the workflow "script.py"
  • A module with functions for downloading and parsing genbank files "Genbank_module.py"
  • an example file with genbank IDs "ACCESSION_IDs.txt"

Running the "script.py" with these files in the working directory should result in:

  • an output file "source_list.txt"
  • a number of genbank files

A reminder

"In order not to overload the E-utility servers, NCBI recommends that users post no more than three URL requests per second" - http://www.ncbi.nlm.nih.gov/books/NBK25497/