NEWS

TODOs

warning/error for convert_labels when a chunk has no noPeaks labels.

error in convert_labels for noPeaks label with sample groups up?

test create_track_hub.

negative test cases for denormalize* functions.

use popen in C to pipe bigWigToBedGraph stdout to our coverage
counting and denormalizing functions, which should be much faster than
using an intermediate bedGraph
file. http://www.cs.uleth.ca/~holzmann/C/system/shell_commands.html

use compiled code instead of command line programs for bigWig/bigBed.

2020.2.13 PR#20

Import new PeakSegJoint which uses double instead of int to avoid
overflow errors.

2019.12.12 PR#19

import data.table >= 1.12.4 for list column support.

new problem.models function which reads CSV/RDS models, then saves all
models to RDS, then deletes all CSV files, to help avoid going over
quota on number of files in big data sets.

jobs_create gains target.minutes argument.

test readBigWig for no data to read.

test case for denormalizeBigWig with spaces in path.

problem.PeakSegFPOP writes to temporary disk.

if(verbose)cat(...) for CRAN.

2019.9.27 PR#18

bedGraphToBigWig: bugfix for unsorted bedGraph files.

2019.9.18 PR#17

remove pipeline, use jobs_create_run instead.

remove create_problems_all.

use nc instead of namedCapture.

2019.8.9 PR#16

appveyor CRAN checks -- passing on windows for CRAN submission.

move tests which need ucsc to test-pipeline-* so they are only run on
travis, where ucsc command line programs are available.

2019.8.7 PR#14

new version of PeakSegDisk::PeakSegFPOP_dir.

new version of namedCapture with variable_args_list.

2019.5.24 PR#13

use future.apply::future_lapply for parallelization.

2018.11.16 PR#12

Test pipeline-noinput jobs_ funs without running create_problems_all
before.

verbose flag added to several functions.

never save problem.bed files any more.

jobs_submit_batchtools works with job subsets, test step >= 5.

2018.11.15 PR#11

require data.table >= 1.11.6 to fread empty file as null instead of error.

jobs_* functions to support SLURM via batchtools.

Testing pipeline-noinput on Travis with SLURM.

2018.10.3

Remove PeakSegFPOP C++ solver, import PeakSegDisk instead.

2018.09.17

total.cost -> total.loss, less confusing nomenclature since we usually
refer to the Poisson "loss" function, and the total "cost" function
that we minimize includes the penalty and model complexity term.

2018.09.13

build script only uses CRAN tests.

2018.09.10

status column (feasible/infeasible) changed to n_equality_constraints
(integer >= 0, zero means no equality constraints/feasible, positive
means some equality constraints/infeasible).

when we run out of disk space, error in R -- tested via tmpfs
(testthat/test-out-of-disk-space.R).

Bugfix for large penalties, test at
testthat/test-CRAN-PeakSegFPOP_disk.R.

New problem.sequentialSearch function, test at
testthat/test-problem-sequentialSearch.R.

2018.01.25

New minutes.limit argument for problem.target, which can be used for
early stopping, useful for time limited jobs like testing on travis.

Several modifications to problem.target, see below for examples. The
main idea remains the same: refine the limits of the largest min error
interval, until there is no more searching to do at the edges of that
interval. However now we search a few more penalties in parallel:

1. we search the borders and one penalty in the middle of each min
error interval (but we only use the largest interval, and we don't
care about the middle penalty, for the stopping criterion).

2. We don't stop if there are any places to explore where we see fp
increasing and fn decreasing -- this typically indicates a possibility
for lowering the error, see examples below.

In the case below, we miss the min-error model, but we could get it by
exploring the middle of the target interval. (in parallel before
termination)

     penalty peaks     status fp fn errors
1:       Inf     0   feasible  0  2      2
2: 3169.9938     1   feasible  2  0      2
3: 1581.6023     2   feasible  1  0      1
                 3             0  0      0 
4:  787.4066     4   feasible  1  0      1
5:  527.7266     6   feasible  2  0      2
6:  211.5639    14   feasible  2  0      2
7:   63.6843    26 infeasible  3  0      3
8:    0.0000   238 infeasible  3  0      3
[1] 559.9769
Next = 559.976929694436 mc.cores= 6 

In the case below, we miss the min-error model with 2 peaks, 1 error
because the target interval is the infinite interval that selects 0
peaks -- maybe could get the min if we explored the limits of every
min error interval (not just the biggest one).

        penalty peaks     status fp fn errors
 1:         Inf     0   feasible  0  2      2
 2: 10766.15837     1   feasible  1  2      3
                    2             1  0      1
 3:  1318.09401     3 infeasible  1  1      2
 4:   757.42870     4 infeasible  1  1      2
 5:   716.69571     5 infeasible  2  0      2
 6:   703.89812     7 infeasible  3  0      3
 7:   595.41734     8 infeasible  3  0      3
 8:   340.70804    15 infeasible  4  0      4
 9:    69.72866    32 infeasible  4  0      4
10:     0.00000   638 infeasible  4  0      4


In the case below we need to explore between 6 and 16 peaks, because
both fn and fp are decreasing at that point.
      penalty peaks     status fp fn errors
1:        Inf     0   feasible  0  7      7
2: 4637.52407     1   feasible  0  5      5
3: 1591.01410     6   feasible  1  2      3
4:  859.85170    16   feasible  2  1      3
5:  510.66132    31   feasible  3  1      4
6:  317.52513    48   feasible  3  1      4
7:  194.70909    74 infeasible  5  0      5
8:   64.14391   137 infeasible  8  0      8
9:    0.00000  1205 infeasible  9  0      9

2017.01.22

Fix optimization bug in
PeakSegPipeline::problem.PeakSegFPOP("~/PeakSegPipeline-test/input/samples/kidney/MS002201/problems/chr10:18024675-38818835",
"18299537.4379804") by adding a special case when the functions are
equal on the right, with no crossing points before that, by using the
fact that the Log coefficient can be used to test which function is
greater at log_mean=-Inf.

2017.01.19

---- FPOP target does not find zero error model -- need to explore fp,
we can do that in parallel without losing any time.
This is H3K36me3_AM_immune/McGill0004
      penalty peaks     status fp fn errors
1:        Inf     0   feasible  0  3      3
2: 24195.6886     4   feasible  1  1      2
3: 13706.5254     5   feasible  1  0      1
4: 10354.2183    10   feasible  1  0      1
5:  6219.2421    17   feasible  1  0      1
6:  3580.6824    44   feasible  1  0      1
7:   322.9294   399 infeasible  1  0      1
8:     0.0000  4683 infeasible  1  0      1
--- The corresponding PDPA model has 0 errors for 1 and 2 peaks,
which where not explored by the current FPOP target algo.
     sample.id peaks       loss errors
 1: McGill0101     0  -16289.63      3
 2: McGill0101     1 -700673.64      0
 3: McGill0101     2 -751953.96      0
 4: McGill0101     3 -792692.07      1
 5: McGill0101     4 -831639.35      2
 6: McGill0101     5 -853440.37      1
 7: McGill0101     6 -866254.31      1
 8: McGill0101     7 -880097.83      1
 9: McGill0101     8 -892845.31      1
10: McGill0101     9 -903400.20      1

2017.12.30

remove animint2 dep

use new PeakSegJoint with Faster algo.

2017.12.13

in problem.coverage:
- caching now also checks that start of
  first line of coverage is equal to problem start.
- we compress data after inserting zeros.

in problem.train, remove targets with two infinite limits before
training.

2017.12.03

bugfix for numerical issue in min-more C++ code: when looking for a
minimum, and the function is decreasing on the interval, there is a
new special case. Before we were always using the right of the
interval as a new minimum (and starting to add a constant), now we
test the cost at the left and right, and if it is numerically
constant, then we just add the interval and continue looking for a
minimum. An analogous special case was already implemented for
min-less.

2017.11.30

denormalize rounds to nearest integer, test case for 0.1, 0.2, etc.

2017.11.22

denormalize* functions.

2017.11.21

downloadBigWigs function.

2017.11.08

problem.predict and problem.predict.allSamples now return a data.table
of peaks with sample.id and sample.group columns -- this can now be
passed to create_problems_joint to avoid hitting the file system
again.

New problem.pred.cluster.targets which, for one problem, does separate
peak prediction for all samples, then clusters peaks across all
samples to create joint problems, then computes joint target intervals
for labeled joint problems.

problem.joint.targets now saves problem/problemID/jointTargets.rds
which problem.joint.train now looks for. This is (1) faster than
looking for all problem/problemID/jointProblems/jointID/target.tsv
files, and (2) it gives a file output to problem.joint.targets which
did not have one before.

2017.10.14

In problem.train we got
Error in plot_clone(p) : attempt to apply non-function
with ggplot2_2.2.1 installed, and
Imports: ggplot2Animint, animint2
I think this has something to do with conflicting S3 methods
for the gg class.
For now I fixed it by moving these Imports to Suggests,
and using requireNamespace and animint2:: and ggplot2Animint::

2017.09.01

use animint2 instead of animint.

First green build with three jobs on travis: CRAN, input, noinput.

2017.08.11

add bigwig.R and mclapply.R (removed from PeakSegJoint).

2017.08.09

compiling, installing, passing demo test.

pipeline stops in target interval computation if there is non-integer
data in bigWig files.

import PeakSegJoint >= 2017.08.08 which returns mean.mat, which we use
to derive a log peak height relative to background, in
problem.joint.predict.

2017.06.19

copied code from PeakSegFPOP repo, modified for R interface.