Skip to content

Python script to convert CAGE BED files to BigBed format

Notifications You must be signed in to change notification settings

FAANG/bedToBigBed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bedToBigBed

This is a Python script to convert BED files to BigBed format, used by the UCSC Genome Browser.

At the moment, this script works specifically for the CAGE track files as it takes into consideration it's format.

To run the script:

python main.py <dir_path>

where dir_path is the path of the directory containing the list of BED files.

The original BED files submitted to us (novel_CAGE and annot_CAGE) do not abide to the UCSC rules for BED format and therefore several changes were made to the files before they could be converted to BigBed.

After the conversion script is run on the directory path, 2 sub-directories are created in the dir_path.

Those 2 sub-directpries are namely: updated and bigbed

1) updated directory

This folder contains the modified bed files which abide to the UCSC rules for BED files format.

The changes made are:

  • inclusion of the name field. "." is used since the name field has not been provided and is thus considered empty.
  • moving the width column to the end because it’s a non-standard user-defined column and needs to be after all other BED fields
  • swapping the order of score and strand to abide to BED fields ordering
  • removal of headers
  • editing the chromEnd value from 16617 to 16616 because of error message thrown by the bedToBigBed application. The chromEnd value provided by our submitter is 16617 while the value of the chromEnd size for NC_001941.1 is 16616. See chrom.sizes file CF_002742125.1_Oar_rambouillet_v1.0.chrom.sizes
  • score value must be between 0 and 1000. Score was therefore changed to int and where the value is greater than 1000, only the first 3 digits are considered as score - assuming that the decimal point was misplaced by our submitter.
  • an autosql file is used to describe the fields and include the non-standard fields to ensure that conversion to bigBed happens seamlessly

2) bigbed

This folder contains the successfully generated bigBed files, ready to be uploaded to UCSC.

About

Python script to convert CAGE BED files to BigBed format

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published