-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathNEWS
313 lines (217 loc) · 9.63 KB
/
NEWS
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
TODOs
warning/error for convert_labels when a chunk has no noPeaks labels.
error in convert_labels for noPeaks label with sample groups up?
test create_track_hub.
negative test cases for denormalize* functions.
use popen in C to pipe bigWigToBedGraph stdout to our coverage
counting and denormalizing functions, which should be much faster than
using an intermediate bedGraph
file. http://www.cs.uleth.ca/~holzmann/C/system/shell_commands.html
use compiled code instead of command line programs for bigWig/bigBed.
2020.2.13 PR#20
Import new PeakSegJoint which uses double instead of int to avoid
overflow errors.
2019.12.12 PR#19
import data.table >= 1.12.4 for list column support.
new problem.models function which reads CSV/RDS models, then saves all
models to RDS, then deletes all CSV files, to help avoid going over
quota on number of files in big data sets.
jobs_create gains target.minutes argument.
test readBigWig for no data to read.
test case for denormalizeBigWig with spaces in path.
problem.PeakSegFPOP writes to temporary disk.
if(verbose)cat(...) for CRAN.
2019.9.27 PR#18
bedGraphToBigWig: bugfix for unsorted bedGraph files.
2019.9.18 PR#17
remove pipeline, use jobs_create_run instead.
remove create_problems_all.
use nc instead of namedCapture.
2019.8.9 PR#16
appveyor CRAN checks -- passing on windows for CRAN submission.
move tests which need ucsc to test-pipeline-* so they are only run on
travis, where ucsc command line programs are available.
2019.8.7 PR#14
new version of PeakSegDisk::PeakSegFPOP_dir.
new version of namedCapture with variable_args_list.
2019.5.24 PR#13
use future.apply::future_lapply for parallelization.
2018.11.16 PR#12
Test pipeline-noinput jobs_ funs without running create_problems_all
before.
verbose flag added to several functions.
never save problem.bed files any more.
jobs_submit_batchtools works with job subsets, test step >= 5.
2018.11.15 PR#11
require data.table >= 1.11.6 to fread empty file as null instead of error.
jobs_* functions to support SLURM via batchtools.
Testing pipeline-noinput on Travis with SLURM.
2018.10.3
Remove PeakSegFPOP C++ solver, import PeakSegDisk instead.
2018.09.17
total.cost -> total.loss, less confusing nomenclature since we usually
refer to the Poisson "loss" function, and the total "cost" function
that we minimize includes the penalty and model complexity term.
2018.09.13
build script only uses CRAN tests.
2018.09.10
status column (feasible/infeasible) changed to n_equality_constraints
(integer >= 0, zero means no equality constraints/feasible, positive
means some equality constraints/infeasible).
when we run out of disk space, error in R -- tested via tmpfs
(testthat/test-out-of-disk-space.R).
Bugfix for large penalties, test at
testthat/test-CRAN-PeakSegFPOP_disk.R.
New problem.sequentialSearch function, test at
testthat/test-problem-sequentialSearch.R.
2018.01.25
New minutes.limit argument for problem.target, which can be used for
early stopping, useful for time limited jobs like testing on travis.
Several modifications to problem.target, see below for examples. The
main idea remains the same: refine the limits of the largest min error
interval, until there is no more searching to do at the edges of that
interval. However now we search a few more penalties in parallel:
1. we search the borders and one penalty in the middle of each min
error interval (but we only use the largest interval, and we don't
care about the middle penalty, for the stopping criterion).
2. We don't stop if there are any places to explore where we see fp
increasing and fn decreasing -- this typically indicates a possibility
for lowering the error, see examples below.
In the case below, we miss the min-error model, but we could get it by
exploring the middle of the target interval. (in parallel before
termination)
penalty peaks status fp fn errors
1: Inf 0 feasible 0 2 2
2: 3169.9938 1 feasible 2 0 2
3: 1581.6023 2 feasible 1 0 1
3 0 0 0
4: 787.4066 4 feasible 1 0 1
5: 527.7266 6 feasible 2 0 2
6: 211.5639 14 feasible 2 0 2
7: 63.6843 26 infeasible 3 0 3
8: 0.0000 238 infeasible 3 0 3
[1] 559.9769
Next = 559.976929694436 mc.cores= 6
In the case below, we miss the min-error model with 2 peaks, 1 error
because the target interval is the infinite interval that selects 0
peaks -- maybe could get the min if we explored the limits of every
min error interval (not just the biggest one).
penalty peaks status fp fn errors
1: Inf 0 feasible 0 2 2
2: 10766.15837 1 feasible 1 2 3
2 1 0 1
3: 1318.09401 3 infeasible 1 1 2
4: 757.42870 4 infeasible 1 1 2
5: 716.69571 5 infeasible 2 0 2
6: 703.89812 7 infeasible 3 0 3
7: 595.41734 8 infeasible 3 0 3
8: 340.70804 15 infeasible 4 0 4
9: 69.72866 32 infeasible 4 0 4
10: 0.00000 638 infeasible 4 0 4
In the case below we need to explore between 6 and 16 peaks, because
both fn and fp are decreasing at that point.
penalty peaks status fp fn errors
1: Inf 0 feasible 0 7 7
2: 4637.52407 1 feasible 0 5 5
3: 1591.01410 6 feasible 1 2 3
4: 859.85170 16 feasible 2 1 3
5: 510.66132 31 feasible 3 1 4
6: 317.52513 48 feasible 3 1 4
7: 194.70909 74 infeasible 5 0 5
8: 64.14391 137 infeasible 8 0 8
9: 0.00000 1205 infeasible 9 0 9
2017.01.22
Fix optimization bug in
PeakSegPipeline::problem.PeakSegFPOP("~/PeakSegPipeline-test/input/samples/kidney/MS002201/problems/chr10:18024675-38818835",
"18299537.4379804") by adding a special case when the functions are
equal on the right, with no crossing points before that, by using the
fact that the Log coefficient can be used to test which function is
greater at log_mean=-Inf.
2017.01.19
---- FPOP target does not find zero error model -- need to explore fp,
we can do that in parallel without losing any time.
This is H3K36me3_AM_immune/McGill0004
penalty peaks status fp fn errors
1: Inf 0 feasible 0 3 3
2: 24195.6886 4 feasible 1 1 2
3: 13706.5254 5 feasible 1 0 1
4: 10354.2183 10 feasible 1 0 1
5: 6219.2421 17 feasible 1 0 1
6: 3580.6824 44 feasible 1 0 1
7: 322.9294 399 infeasible 1 0 1
8: 0.0000 4683 infeasible 1 0 1
--- The corresponding PDPA model has 0 errors for 1 and 2 peaks,
which where not explored by the current FPOP target algo.
sample.id peaks loss errors
1: McGill0101 0 -16289.63 3
2: McGill0101 1 -700673.64 0
3: McGill0101 2 -751953.96 0
4: McGill0101 3 -792692.07 1
5: McGill0101 4 -831639.35 2
6: McGill0101 5 -853440.37 1
7: McGill0101 6 -866254.31 1
8: McGill0101 7 -880097.83 1
9: McGill0101 8 -892845.31 1
10: McGill0101 9 -903400.20 1
2017.12.30
remove animint2 dep
use new PeakSegJoint with Faster algo.
2017.12.13
in problem.coverage:
- caching now also checks that start of
first line of coverage is equal to problem start.
- we compress data after inserting zeros.
in problem.train, remove targets with two infinite limits before
training.
2017.12.03
bugfix for numerical issue in min-more C++ code: when looking for a
minimum, and the function is decreasing on the interval, there is a
new special case. Before we were always using the right of the
interval as a new minimum (and starting to add a constant), now we
test the cost at the left and right, and if it is numerically
constant, then we just add the interval and continue looking for a
minimum. An analogous special case was already implemented for
min-less.
2017.11.30
denormalize rounds to nearest integer, test case for 0.1, 0.2, etc.
2017.11.22
denormalize* functions.
2017.11.21
downloadBigWigs function.
2017.11.08
problem.predict and problem.predict.allSamples now return a data.table
of peaks with sample.id and sample.group columns -- this can now be
passed to create_problems_joint to avoid hitting the file system
again.
New problem.pred.cluster.targets which, for one problem, does separate
peak prediction for all samples, then clusters peaks across all
samples to create joint problems, then computes joint target intervals
for labeled joint problems.
problem.joint.targets now saves problem/problemID/jointTargets.rds
which problem.joint.train now looks for. This is (1) faster than
looking for all problem/problemID/jointProblems/jointID/target.tsv
files, and (2) it gives a file output to problem.joint.targets which
did not have one before.
2017.10.14
In problem.train we got
Error in plot_clone(p) : attempt to apply non-function
with ggplot2_2.2.1 installed, and
Imports: ggplot2Animint, animint2
I think this has something to do with conflicting S3 methods
for the gg class.
For now I fixed it by moving these Imports to Suggests,
and using requireNamespace and animint2:: and ggplot2Animint::
2017.09.01
use animint2 instead of animint.
First green build with three jobs on travis: CRAN, input, noinput.
2017.08.11
add bigwig.R and mclapply.R (removed from PeakSegJoint).
2017.08.09
compiling, installing, passing demo test.
pipeline stops in target interval computation if there is non-integer
data in bigWig files.
import PeakSegJoint >= 2017.08.08 which returns mean.mat, which we use
to derive a log peak height relative to background, in
problem.joint.predict.
2017.06.19
copied code from PeakSegFPOP repo, modified for R interface.