Interval basics

Task 1: Implement functions to count overlaps and calculate Jaccard index

Implement the 3 functions in intervals.R. Descriptions of the functions can be found there. Additional instructions for each function follow:

countOverlapsSimple: your function may use a sequential search to identify overlaps (no need to implement binary search or lower complexity algorithm). You should write this function using base R only, without relying on GenomicRanges or other Bioconductor packages. The inputs should be base R objects (a list for the query, and a data.frame for the database, see examples in the function documentation). (20 pts)
calculateJaccardScore: for this function, you may make use of the Bioconductor GenomicRanges package, which includes fast implementations of interval overlap and union. In particular, you may be interested in the intersect and union methods that operate on GenomicRanges::GRanges objects. The inputs to this function should both be GenomicRanges::GRanges objects. (20 pts)
pairwiseJaccard: Input should be a base R list of GRanges objects, and you should return a square matrix with dimension equal to the number of elements in the input list. (10 pts)

Task 2: Calculate and interpret similarity results

There are 4 narrowPeak files stored under /data. These are ChIP-seq datasets, for two cell types (HepG2 and HeLaS3) and two transcription factors (Jun and CTCF). I filtered the data to keep things smaller and faster for this assignment.

Use your functions to calculate the pairwise Jaccard similarity for this set of 4 experiments, then answer the following questions:

Which two interval sets are the most similar? (10 pts)
Which two interval sets are the most different? (10 pts)
Based on these results, which factor, CTCF or Jun, would you predict varies more across cell types? (10 pts)
Based on these results, do the genomic locations found by ChIP-seq experiments depend more on the cell-type, or on the transcription factor being assayed? (20 pts)

Testing:

You can test your work by running within the interval_basics directory:

Rscript tests/testDriver.R

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
src		src
tests		tests
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interval basics

Task 1: Implement functions to count overlaps and calculate Jaccard index

Task 2: Calculate and interpret similarity results

Testing:

About

Releases

Packages

Languages

uvacobi/interval_basics

Folders and files

Latest commit

History

Repository files navigation

Interval basics

Task 1: Implement functions to count overlaps and calculate Jaccard index

Task 2: Calculate and interpret similarity results

Testing:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages