Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script for extracting and intersecting hashes and their abundances from a sketch or a group of sketches #3505

Open
ccbaumler opened this issue Jan 24, 2025 · 1 comment
Labels
code herein lies code

Comments

@ccbaumler
Copy link
Contributor

@ctb @bettafische

Place holder to find a time to meet to make this script.

@ctb
Copy link
Contributor

ctb commented Jan 25, 2025

code from here should work: https://github.com/ctb/sourmash_plugin_abundhist/blob/4f1b7bfb50adf9f6388a14828618e9f1d2a605c4/src/sourmash_plugin_abundhist.py#L112

extracted and adjusted:

#! /usr/bin/env python 
import sys
import sourmash

def extract_hashes_and_abunds(minhash):
    assert minhash.track_abundance, "track abundance must be set"

    # will return list of (hash, abund) 
    return list(minhash.hashes.items())


KSIZE=31
MOLTYPE='DNA'

db = sourmash.load_file_as_index(sys.argv[1])
db = db.select(ksize=KSIZE, moltype=MOLTYPE)
assert len(db) == 1

sig = list(db.signatures())[0]

print(extract_hashes_and_abunds(sig.minhash))

rough procedure would be:

  • build the intersection you're interested in, using sig intersect ... -A <abundances-from> to make sure you're borrowing the abundances from the sketch (usually the reads or the WGS data set), and/or use sig merg to merge your sketches;
  • run this script :)

@ctb ctb added the code herein lies code label Jan 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code herein lies code
Projects
None yet
Development

No branches or pull requests

2 participants