Skip to content

Commit

Permalink
updated python README
Browse files Browse the repository at this point in the history
  • Loading branch information
Psy-Fer committed Jun 25, 2021
1 parent be0b3c0 commit 9113d6b
Showing 1 changed file with 182 additions and 44 deletions.
226 changes: 182 additions & 44 deletions python/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,50 +4,188 @@ The slow5 python library (pyslow5) allows a user to read slow5 and blow5 files.

## Installation

git clone [email protected]:hasindu2008/slow5lib.git
cd slow5lib
make
make pyslow5
make pyslow5 install
Initial setup and example info for environment
###### slow5lib needs python3.4.2 or higher.
```bash
# If your native python3 meets this requirement, you can use that, or use a
# specific version installed with deadsnakes below. If you install with deadsnakes,
# you will need to call that specific python, such as python3.8 or python3.9,
# in all the following commands until you create a virtual environment with venv.
# Then once activated, you can just use python3.

# To install a specific version of python, the deadsnakes ppa is a good place to start
# This is an example for installing python3.7
# you can then call that specific python version
# > python3.7 -m pip --version
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt install python3.7 python3.7-dev python3.7-venv


# get zlib1g-dev
sudo apt-get update && sudo apt-get install -y zlib1g-dev

# Chekc with
python3 --version

# You will also need the python headers if you don't already have them installed.

sudo apt-get install python3-dev
```

Building and installing the python library.

```bash
python3 -m venv /path/to/slow5libvenv
source /path/to/slow5libvenv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install setuptools cython numpy wheel

git clone [email protected]:hasindu2008/slow5lib.git
cd slow5lib
make

# CHOOSE A OR B:
# |=======================================================================|
# |A. Install with pip if wheel is present, otherwise it uses setuptools |
python3 -m pip install . --use-feature=in-tree-build
# |=======================================================================|
# |B. Or build and install manually with setup.py |
# |build the package |
python3 setup.py build
# |If all went well, install the package |
python3 setup.py install
# |=======================================================================|

# This should not require sudo if using a python virtual environment/venv
# confirm installation, and find pyslow5==<version>
python3 -m pip freeze

# Ensure slow5 library is working by running the basic tests
python3 ./python/example.py


# To Remove the library
python3 -m pip uninstall pyslow5
```

## Usage

import pyslow5 as slow5

# open file
s5 = slow5.Open('examples/example.slow5','r')

# read all reads sequentially
for read in s5.seq_reads(pA=True):
print("read_id:", read['read_id'])
print("read_group:", read['read_group'])
print("digitisation:", read['digitisation'])
print("offset:", read['offset'])
print("range:", read['range'])
print("sampling_rate:", read['sampling_rate'])
print("len_raw_signal:", read['len_raw_signal'])
print("signal:", read['signal'][:10])

# read one read using readID, returns None if not found
readID = "r4"
read = s5.get_read(readID, pA=True)

# random access reads from list, if read not found, returns None
read_list = ["r1", "r3", "null_read", "r5", "r2", "r1"]
selected_reads = s5.get_read_list(read_list)
for r, read in zip(read_list,selected_reads):
if read is not None:
print(r, read['read_id'])
else:
print(r, "read not found")

# Get header attributes
attr = "flow_cell_id"
val = s5.get_header_value(attr)
print(f"flow_cell_id: {val}")
attr = "exp_start_time"
val = s5.get_header_value(attr)
print(f"exp_start_time: {val}")
attr = "heatsink_temp"
val = s5.get_header_value(attr)
print(f"heatsink_temp: {val}")
#### `Open(FILE, mode, DEBUG=0)`:

The pyslow5 libraryr has one main Class, `pyslow5.Open` which opens a slow5/blow5 (slow5 for easy reference) file for reading.

`FILE`: the file or filepath of the slow5 file to open
`mode`: mode in which to open the file. Currently, only `r` is accepted for read only.

This is designed to mimic Python's native Open() to help users remember the syntax

Example:

```python
import pyslow5

# open file
s5 = pyslow5.Open('examples/example.slow5','r')
```

When opening a slow5 file for the first time, and index will be created and saved in the same directory as the file being read. This index will then be loaded. For files that already have an index, that index will be loaded.

#### `seq_reads(pA=False, aux=None)`:

Access all reads sequentially in an opened slow5.
+ If readID is not found, `None` is returned.
+ pA = Bool for converting signal to picoamps.
+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found
+ returns `dict` = dictionary of main fields for read_id, with any aux fields added

Example:

```python
# create generator
reads = s5.seq_reads()

# print all readIDs
for read in reads:
print(read['read_id'])

# or use directly in a for loop
for read in s5.seq_reads(pA=True, aux='all'):
print("read_id:", read['read_id'])
print("read_group:", read['read_group'])
print("digitisation:", read['digitisation'])
print("offset:", read['offset'])
print("range:", read['range'])
print("sampling_rate:", read['sampling_rate'])
print("len_raw_signal:", read['len_raw_signal'])
print("signal:", read['signal'][:10])
print("================================")
```

#### `get_read(readID, pA=False, aux=None)`:

Access a specific read using a unique readID. This is a ranom access method, using the index.
+ If readID is not found, `None` is returned.
+ pA = Bool for converting signal to picoamps.
+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found
+ returns `dict` = dictionary of main fields for read_id, with any aux fields added

Example:

```python
readID = "r1"
read = s5.get_read(readID, pA=True, aux=["read_number", "start_mux"])
if read is not None:
print("read_id:", read['read_id'])
print("len_raw_signal:", read['len_raw_signal'])
```


#### `get_read_list(read_list, pA=False, aux=None)`:

Access a list of specific reads using a list `read_list` of unique readIDs. This is a random access method using the index, so order of readIDs does impact access speed.
+ If readID is not found, `None` is returned.
+ pA = Bool for converting signal to picoamps.
+ aux = `str` '<attr_name>'/'all' or list of names of auxiliary fields added to return dictionary, `None` if `<attr_name>` not found
+ returns `dict` = dictionary of main fields for read_id, with any aux fields added

Example:

```python
read_list = ["r1", "r3", "null_read", "r5", "r2", "r1"]
selected_reads = s5.get_read_list(read_list)
for r, read in zip(read_list,selected_reads):
if read is not None:
print(r, read['read_id'])
else:
print(r, "read not found")
```


#### `get_header_names()`:

Returns a list containing the uninon of header names from all read_groups

#### `get_header_value(attr, read_group=0)`:

Returns a `str` of the value of a header attribute (`attr`) for a particular read_group.
Returns `None` if value can't be found

#### `get_all_headers(read_group=0)`:

Returns a dictionary with all header attributes and values for a particular read_group
If there are values present for one read_group, and not for another, the attribute will still be returned for the read_group without, but with a value of `None`.

#### `get_aux_names()`:

Returns an ordered list of auxiliary attribute names. (same order as get_aux_types())

This is used for understanding which auxiliary attributes are available within the slow5 file, and providing selections to the `aux` keyword argument in the above functoions

#### `get_aux_types()`:

Returns an ordered list of auxiliary attribute types (same order as get_aux_names())

This can mostly be ignored, but will be used in error tracing in the future, as auxiliary field requests have multiple types, each with their own calls, and not all are used. It could be the case a call for an auxiliary filed fails, and knowing which type the field is requesting is very helpful in understanding which function in C is being called, that could be causing the error.

See documentation for full example

0 comments on commit 9113d6b

Please sign in to comment.