Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is 'NaN' in the third column in retreived txt files #121

Open
jiangshan529 opened this issue Nov 12, 2022 · 18 comments
Open

What is 'NaN' in the third column in retreived txt files #121

jiangshan529 opened this issue Nov 12, 2022 · 18 comments

Comments

@jiangshan529
Copy link

Hello, I have retrieved the information from a .hic file. However, I found 'nan' values in the third column. What does this mean? Is this missing values? Why should such value appear? Thanks!

60000 60000 3709.8407084423666
60000 65000 nan
60000 70000 nan
60000 85000 974.9826322352375
85000 85000 6662.110695460637

@sa501428
Copy link
Member

If a particular column was very sparse and KR/SCALE had to discard the column/row when normalizing the matrix, then all entries in the discarded row/column will be NaN.

Copy link

It means that these rows (bins) had to be removed during normalization. You should not see any NaN (Not a Number) in raw (un-normalized) map.

@jiangshan529
Copy link
Author

It means that these rows (bins) had to be removed during normalization. You should not see any NaN (Not a Number) in raw (un-normalized) map.

Hi, thanks for your reply. I have run Straw with normalization set to 'None'. And I am not sure this time why there are float numbers in the third column.

60000 60000 1
60000 65000 1
60000 70000 1
60000 85000 1
85000 85000 26
85000 90000 16
90000 90000 53
90000 95000 12
95000 95000 27
58570000 58605000 67.67821603676141
58575000 58605000 179.20274301772898
58580000 58605000 200.11589450569903
58585000 58605000 100.45844430432754
58590000 58605000 237.57597248105807
58595000 58605000 600.7486450401082
58605000 58605000 7401.773853029534

@moshe-olshansky
Copy link

moshe-olshansky commented Nov 15, 2022 via email

@jiangshan529
Copy link
Author

Are you using oe (observed over expected)? If so, float numbers should not surprise you. On Tuesday, 15 November 2022 at 02:40:34 pm AEDT, jiangshan529 @.> wrote: It means that these rows (bins) had to be removed during normalization. You should not see any NaN (Not a Number) in raw (un-normalized) map. Hi, thanks for your reply. I have run Straw with normalization set to 'None'. And I am not sure this time why there are float numbers in the third column. 60000 60000 1 60000 65000 1 60000 70000 1 60000 85000 1 85000 85000 26 85000 90000 16 90000 90000 53 90000 95000 12 95000 95000 27 58570000 58605000 67.67821603676141 58575000 58605000 179.20274301772898 58580000 58605000 200.11589450569903 58585000 58605000 100.45844430432754 58590000 58605000 237.57597248105807 58595000 58605000 600.7486450401082 58605000 58605000 7401.773853029534 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

Hi, the code I am using is 'result = straw.straw('NONE',"./4DNFI2TK7L2F.hic", "19", "19", "BP", 5000, 'chr19_5k.txt')'

@moshe-olshansky
Copy link

moshe-olshansky commented Nov 15, 2022 via email

@jiangshan529
Copy link
Author

Where does this hic file come from? Is it a (weighted) combination of several maps? On Tuesday, 15 November 2022 at 02:53:12 pm AEDT, jiangshan529 @.> wrote: Are you using oe (observed over expected)? If so, float numbers should not surprise you. On Tuesday, 15 November 2022 at 02:40:34 pm AEDT, jiangshan529 @.> wrote: It means that these rows (bins) had to be removed during normalization. You should not see any NaN (Not a Number) in raw (un-normalized) map. Hi, thanks for your reply. I have run Straw with normalization set to 'None'. And I am not sure this time why there are float numbers in the third column. 60000 60000 1 60000 65000 1 60000 70000 1 60000 85000 1 85000 85000 26 85000 90000 16 90000 90000 53 90000 95000 12 95000 95000 27 58570000 58605000 67.67821603676141 58575000 58605000 179.20274301772898 58580000 58605000 200.11589450569903 58585000 58605000 100.45844430432754 58590000 58605000 237.57597248105807 58595000 58605000 600.7486450401082 58605000 58605000 7401.773853029534 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> Hi, the code I am using is 'result = straw.straw('NONE',"./4DNFI2TK7L2F.hic", "19", "19", "BP", 5000, 'chr19_5k.txt')' — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

It is from the 4D genome project, performed by micro-C method.
https://data.4dnucleome.org/files-processed/4DNFI2TK7L2F/#details

@moshe-olshansky
Copy link

moshe-olshansky commented Nov 15, 2022 via email

@moshe-olshansky
Copy link

moshe-olshansky commented Nov 15, 2022 via email

@jiangshan529
Copy link
Author

By the way, have you tried using dump command in juicer tools? Does it produce identical results (to straw)? On Tuesday, 15 November 2022 at 03:02:28 pm AEDT, jiangshan529 @.> wrote: Where does this hic file come from? Is it a (weighted) combination of several maps? On Tuesday, 15 November 2022 at 02:53:12 pm AEDT, jiangshan529 @.> wrote: Are you using oe (observed over expected)? If so, float numbers should not surprise you. On Tuesday, 15 November 2022 at 02:40:34 pm AEDT, jiangshan529 @.> wrote: It means that these rows (bins) had to be removed during normalization. You should not see any NaN (Not a Number) in raw (un-normalized) map. Hi, thanks for your reply. I have run Straw with normalization set to 'None'. And I am not sure this time why there are float numbers in the third column. 60000 60000 1 60000 65000 1 60000 70000 1 60000 85000 1 85000 85000 26 85000 90000 16 90000 90000 53 90000 95000 12 95000 95000 27 58570000 58605000 67.67821603676141 58575000 58605000 179.20274301772898 58580000 58605000 200.11589450569903 58585000 58605000 100.45844430432754 58590000 58605000 237.57597248105807 58595000 58605000 600.7486450401082 58605000 58605000 7401.773853029534 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> Hi, the code I am using is 'result = straw.straw('NONE',"./4DNFI2TK7L2F.hic", "19", "19", "BP", 5000, 'chr19_5k.txt')' — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> It is from the 4D genome project, performed by micro-C method. https://data.4dnucleome.org/files-processed/4DNFI2TK7L2F/#details — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

Sorry, I just downloaded the .hic file. How should use the dump command?

@sa501428
Copy link
Member

Please post this question to the forum. We reserve github issues for bugs.

@sa501428
Copy link
Member

https://groups.google.com/g/3d-genomics
And that way, the community as a whole will also benefit from the answers.
Thanks!

@moshe-olshansky
Copy link

moshe-olshansky commented Nov 15, 2022 via email

@jiangshan529
Copy link
Author

Have you downloaded juicer_tools.jar?If so, do java -jar juicer_tools.jar or/and java -jar juicer_tools.jar dump to see the usage. On Tuesday, 15 November 2022 at 03:20:17 pm AEDT, jiangshan529 @.> wrote: By the way, have you tried using dump command in juicer tools? Does it produce identical results (to straw)? On Tuesday, 15 November 2022 at 03:02:28 pm AEDT, jiangshan529 @.> wrote: Where does this hic file come from? Is it a (weighted) combination of several maps? On Tuesday, 15 November 2022 at 02:53:12 pm AEDT, jiangshan529 @.> wrote: Are you using oe (observed over expected)? If so, float numbers should not surprise you. On Tuesday, 15 November 2022 at 02:40:34 pm AEDT, jiangshan529 @.> wrote: It means that these rows (bins) had to be removed during normalization. You should not see any NaN (Not a Number) in raw (un-normalized) map. Hi, thanks for your reply. I have run Straw with normalization set to 'None'. And I am not sure this time why there are float numbers in the third column. 60000 60000 1 60000 65000 1 60000 70000 1 60000 85000 1 85000 85000 26 85000 90000 16 90000 90000 53 90000 95000 12 95000 95000 27 58570000 58605000 67.67821603676141 58575000 58605000 179.20274301772898 58580000 58605000 200.11589450569903 58585000 58605000 100.45844430432754 58590000 58605000 237.57597248105807 58595000 58605000 600.7486450401082 58605000 58605000 7401.773853029534 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> Hi, the code I am using is 'result = straw.straw('NONE',"./4DNFI2TK7L2F.hic", "19", "19", "BP", 5000, 'chr19_5k.txt')' — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> It is from the 4D genome project, performed by micro-C method. https://data.4dnucleome.org/files-processed/4DNFI2TK7L2F/#details — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.> Sorry, I just downloaded the .hic file. How should use the dump command? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.>

Hi, I have run it.

When I use 'observed', no float numbers appeared.

when I use 'oe', it looks like this:
60000 60000 3.0029617E-4
60000 65000 9.408559E-4
60000 70000 0.0040292614
60000 85000 0.012957309
85000 85000 0.0078077004
85000 90000 0.015053694
90000 90000 0.015915697
90000 95000 0.011290271
95000 95000 0.008107997

@moshe-olshansky
Copy link

moshe-olshansky commented Nov 15, 2022 via email

@sa501428 sa501428 reopened this Nov 15, 2022
@sa501428
Copy link
Member

sa501428 commented Nov 15, 2022

Apologies - if there is indeed a bug (different outputs from straw vs dump), then we should indeed discuss it here. Can you share the commands you used for juicer tools dump and straw, and the respective outputs? Can you also confirm what version of straw was used?

@jiangshan529
Copy link
Author

Apologies - if there is indeed a bug (different outputs from straw vs dump), then we should indeed discuss it here. Can you share the commands you used for juicer tools dump and straw, and the respective outputs? Can you also confirm what version of straw was used?

Hi, I think hicstraw.straw and juicer dump give the same result, but straw.straw gives a different result at end lines.

The code I am using for straw is:
result = straw.straw('NONE',"./4DNFI2TK7L2F.hic", "19", "19", "BP", 5000, 'chr19_5k.txt')
f1 = open('chr19_5k.txt','w')
for i in range(len(result[0])):
cmd1= "{0}\t{1}\t{2}\n".format(result[0][i], result[1][i], result[2][i])
f1.write(cmd1)

And there's another code from hicstraw:
with open(OutFile, mode='w') as fp_out:
result = hicstraw.straw(datatype, Norm, HiCFile, CHR1, CHR2, 'BP', resolution)
for i in range(len(result)):
print("{0}\t{1}\t{2}\t{3}\t{4}".format(chr1name, (result[i].binX + int(resolution / 2)), chr2name, (result[i].binY + int(resolution / 2)), result[i].counts), file=fp_out)

The code I used for juicer dump is:
java -Xmx48000m -Djava.awt.headless=true -jar juicer_tools_1.22.01.jar dump observed NONE 4DNFI2TK7L2F.hic 19 19 BP 5000 >dump.txt.

Interestingly, the result of the top lines are the same by using the three ways:

60000 60000 1.0

60000 65000 1.0

60000 70000 1.0

60000 85000 1.0

85000 85000 26.0

85000 90000 16.0

90000 90000 53.0

90000 95000 12.0

95000 95000 27.0

However, for the end lines, the result are different:

Result of straw.straw:

58570000 58605000 67.67821603676141

58575000 58605000 179.20274301772898

58580000 58605000 200.11589450569903

58585000 58605000 100.45844430432754

58590000 58605000 237.57597248105807

58595000 58605000 600.7486450401082

58605000 58605000 7401.773853029534

Result of hicstraw.straw:

58575000 58605000 14.0

58580000 58605000 19.0

58585000 58605000 4.0

58590000 58605000 2.0

58595000 58605000 2.0

58600000 58605000 1.0

58605000 58605000 52.0

Result of juicer dump(observed):

58575000 58605000 14.0

58580000 58605000 19.0

58585000 58605000 4.0

58590000 58605000 2.0

58595000 58605000 2.0

58600000 58605000 1.0

58605000 58605000 52.0

Result of juicer dump(oe):

60000 60000 3.0029617E-4

60000 65000 9.408559E-4

60000 70000 0.0040292614

60000 85000 0.012957309

85000 85000 0.0078077004

85000 90000 0.015053694

90000 90000 0.015915697

@sa501428
Copy link
Member

is straw.straw c++ and hicstraw.straw python? or what versions are you using?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants