update to v2.0

CovertLab · Mar 22, 2019 · 6834cda · 6834cda
2 parents 3d12a2f + 4cfa844
commit 6834cda
Show file tree

Hide file tree

Showing 43 changed files with 1,310 additions and 126 deletions.
diff --git a/Dockerfile b/Dockerfile
@@ -13,6 +13,10 @@ RUN conda install matplotlib==1.5.1 \
     pypng==0.0.18 mahotas==1.4.1  opencv-python==3.2.0.7 \
     git+https://github.com/jfrelinger/cython-munkres-wrapper \
     jupyter
+RUN pip install numba notebook==5.4.1
+RUN pip install fast-histogram
+RUN pip install keras==2.0.0 tensorflow==1.8.0
+
 
 EXPOSE 8888
 WORKDIR /home

diff --git a/README.md b/README.md
@@ -1,9 +1,11 @@
 # CellTK
 
-Live-cell analysis toolkit.  
-For the active development version with more functionality, please visit [here](https://github.com/braysia/CellTK).
+Live-cell analysis toolkit.
+#### For the active development version, please visit [here](https://github.com/braysia/CellTK).
+#### v0.4 is used in [Nature Protocols](https://www.ncbi.nlm.nih.gov/pubmed/29266096) paper.
 
-Image processing is simply an image conversion/transformation process.  
+
+Image processing is simply an image conversion/transformation process.
 CellTK has the following five major processes which all implement conversion between img and labels.
 
 1. preprocessing:   img -> img
@@ -12,22 +14,21 @@ CellTK has the following five major processes which all implement conversion bet
 4. tracking: labels -> labels*
 5. postprocessing: labels -> labels*
 
-where  
+where
 - img: np.ndarray[np.float32] (e.g. a raw image from a microscope)
-- labels: np.ndarray[np.int16] (e.g. nuclear objects)  
+- labels: np.ndarray[np.int16] (e.g. nuclear objects)
 \* tracked objects have consistent values over frames
 
+For each processes, you can find a module named ___\*\_operation.py___. (e.g. _celltk/preprocess_operations.py_).
 
-For each processes, you can find a module named ___\*\_operation.py___. (e.g. _celltk/preprocess_operations.py_).    
-
-These files are the "repositories" of functions.  
+These files are the "repositories" of functions.
 They simply contain a list of functions which takes an input and convert images. If you need a new function, simply add it to here.
 
-  
+
 When you input a raw image, it should take TIFF or PNG files with various datatypes as well.
 
 ### Command line Example:
-The simplest way to apply a function is to use ___command.py___.  
+The simplest way to apply a function is to use ___command.py___.
 This option is convenient to play with functions and parameters.
 
 
@@ -36,8 +37,8 @@ python celltk/command.py -i data/testimages0/CFP/img* -f constant_thres -p THRES
 python celltk/command.py -i data/testimages0/CFP/img* -l output/c1/img* -f run_lap track_neck_cut -o output/nuc
 ```
 
-___-i___ for images path, ___-l___ for labels path, ___-o___ for an output directory, ___-f___ for a function name from ___*operation.py___ modules, ___-p___ for arguments to the function. 
-   
+___-i___ for images path, ___-l___ for labels path, ___-o___ for an output directory, ___-f___ for a function name from ___*operation.py___ modules, ___-p___ for arguments to the function.
+
 Note that, time-lapse files need to have file names in a sorted order.
 
 
@@ -60,14 +61,14 @@ This configuration file contains operations defined like this:
 You can find how to set up a configuration file [here](doc/CONFIGURE_YML.md).
 
 ### Apply to extract single-cell properties
-After segmenting and tracking cells, we want to extract single-cell properties as a table.  
+After segmenting and tracking cells, we want to extract single-cell properties as a table.
 
 Unlike other five major processes, ___celltk/apply.py___ produces __csv__ and __npz__ file as an output.
 
 ```
 python celltk/apply.py -i data/testimages0/CFP/img* -l output/nuc/img* -o output/array.npz
 ```
-By default, it will use a folder name as a table key.  
+By default, it will use a folder name as a table key.
 To specify table keys, use ___-p___ and ___-s___ in a command line.
 ```
 python celltk/apply.py -i data/testimages0/YFP/img* -l output/nuc/img* -o output/array.npz -p nuc -s YFP
@@ -92,7 +93,7 @@ Or use ___obj\_names___ and ___ch\_names___ in a caller.
 ```
 
 
-The output can be loaded with LabeledArray class.    
+The output can be loaded with LabeledArray class.
 e.g.
 ```
 python -c "from celltk.labeledarray import LabeledArray;arr = LabeledArray().load('output/array.npz');print arr.labels;print arr['CFP', 'nuc', 'x']"
@@ -101,7 +102,29 @@ python -c "from celltk.labeledarray import LabeledArray;arr = LabeledArray().loa
 For visualization and manipulation of these arrays, I recommend to take a loot at [covertrace](https://github.com/braysia/covertrace).
 
 
-## Running Docker Container
+## Install dependencies
+
+
+If you do not need a dev version, simply
+```
+pip install celltk
+```
+This will register `celltk` command, where you can pass input file like `celltk input_file/input_tests1.yml`.
+___________________________
+
+It is compatible with [poetry](https://github.com/sdispater/poetry).
+```
+git clone https://github.com/braysia/CellTK.git & cd CellTK
+pip install poetry
+poetry install
+```
+Install the additional package may speed up computation.
+```
+pip install git+https://github.com/jfrelinger/cython-munkres-wrapper
+```
+_________
+
+The other option is to use Docker container.
 ```
 docker pull braysia/celltk
 docker run -it -v /$FOLDER_TO_MOUNT:/home/ braysia/celltk

diff --git a/setup.py → _setup.py b/setup.py → _setup.py
@@ -2,7 +2,7 @@
 
 setup(
     name="celltk",
-    version="0.4",
+    version="2.0",
     packages=find_packages(),
     author='Takamasa Kudo',
     author_email='[email protected]',

diff --git a/celltk/apply.py b/celltk/apply.py
@@ -22,27 +22,33 @@
 from scipy.ndimage.morphology import binary_fill_holes
 from scipy.ndimage.morphology import binary_dilation
 
+import warnings
+warnings.filterwarnings('ignore', category=np.VisibleDeprecationWarning)
+warnings.filterwarnings('ignore', category=pd.io.pytables.PerformanceWarning)
+
 logger = logging.getLogger(__name__)
 
 
 PROP_SAVE = ['area', 'cell_id', 'convex_area', 'cv_intensity',
              'eccentricity', 'major_axis_length', 'minor_axis_length', 'max_intensity',
              'mean_intensity', 'median_intensity', 'min_intensity', 'orientation',
              'perimeter', 'solidity', 'std_intensity', 'total_intensity', 'x', 'y', 'parent', 'num_seg']
+MAX_NUMCELL = 100000
 
 
 def find_all_children(labels):
-
     mask = binary_fill_holes(labels < 0)
     mask[labels < 0] = False
-    return np.unique(labels[mask]).tolist()
+    clabelnums = np.unique(labels[mask]).tolist()
+    if 0 in clabelnums:
+        clabelnums.remove(0)
+    return clabelnums
 
 
 def find_parent_label(labels, child_label):
     mask = binary_dilation(labels == child_label)
     mask[labels == child_label] = False
-    assert len(np.unique(labels[mask])) == 1
-    return labels[mask][0]
+    return max(set(labels[mask].tolist()), key=labels[mask].tolist().count)
 
 
 def add_parent(cells, labels):
@@ -55,9 +61,6 @@ def add_parent(cells, labels):
     return cells
 
 
-
-# def add_parent_id(labels, img, cells):
-#     return cells
 def apply():
     pass
 
@@ -85,15 +88,22 @@ def df2larr(df):
     return larr
 
 
-def multi_index(cells, obj_name, ch_name):
-    frames = np.unique([i.frame for i in cells])
-    index = pd.MultiIndex.from_product([obj_name, ch_name, PROP_SAVE, frames], names=['object', 'ch', 'prop', 'frame'])
-    column_idx = pd.MultiIndex.from_product([np.unique([i.cell_id for i in cells])])
-    df = pd.DataFrame(index=index, columns=column_idx, dtype=np.float32)
-    for cell in cells:
-        for k in PROP_SAVE:
-            df[cell.cell_id].loc[obj_name, ch_name, k, cell.frame] = np.float32(getattr(cell, k))
-    return df
+def _cells2array(cells):
+    arr = np.zeros((len(cells), len(PROP_SAVE)), np.float32)
+    for cnum, cell in enumerate(cells):
+        arr[cnum, :] = [getattr(cell, k) for k in PROP_SAVE]
+    return arr
+
+
+# def multi_index(cells, obj_name, ch_name):
+#     frames = np.unique([i.frame for i in cells])
+#     index = pd.MultiIndex.from_product([obj_name, ch_name, PROP_SAVE, frames], names=['object', 'ch', 'prop', 'frame'])
+#     column_idx = pd.MultiIndex.from_product([np.unique([i.cell_id for i in cells])])
+#     df = pd.DataFrame(index=index, columns=column_idx, dtype=np.float32)
+#     for cell in cells:
+#         for k in PROP_SAVE:
+#             df[cell.cell_id].loc[obj_name, ch_name, k, cell.frame] = np.float32(getattr(cell, k))
+#     return df
 
 
 def caller(inputs_list, inputs_labels_list, output, primary, secondary):
@@ -105,21 +115,31 @@ def caller(inputs_list, inputs_labels_list, output, primary, secondary):
     obj_names = [basename(dirname(i[0])) for i in inputs_labels_list] if primary is None else primary
     ch_names = [basename(dirname(i[0])) for i in inputs_list] if secondary is None else secondary
 
-    store = []
     for inputs, ch in zip(inputs_list, ch_names):
         for inputs_labels, obj in zip(inputs_labels_list, obj_names):
             logger.info("Channel {0}: {1} applied...".format(ch, obj))
+            arr = np.ones((MAX_NUMCELL, len(PROP_SAVE), len(inputs)), np.float32)  * np.nan
             for frame, (path, pathl) in enumerate(zip(inputs, inputs_labels)):
                 img, labels = imread(path), lbread(pathl, nonneg=False)
                 cells = regionprops(labels, img)
                 if (labels < 0).any():
                     cells = add_parent(cells, labels)
                 [setattr(cell, 'frame', frame) for cell in cells]
                 cells = [Cell(cell) for cell in cells]
-                store.append(cells)
+                tarr = _cells2array(cells)
+                index = tarr[:, 1].astype(np.int32)
+                arr[index, :, frame] = tarr
 
             logger.info("\tmaking dataframe...")
-            df = multi_index([i for ii in store for i in ii], obj, ch)
+            cellids = np.where(~np.isnan(arr[:, 0, :]).all(axis=1))[0]
+            marr = np.zeros((len(cellids), arr.shape[1], arr.shape[2]))
+            for pn, i in enumerate(cellids):
+                marr[pn] = arr[i]
+            sarr = np.swapaxes(marr, 0, 2)
+            narr = sarr.reshape((sarr.shape[0]*sarr.shape[1], sarr.shape[2]), order='F')
+            index = pd.MultiIndex.from_product([obj, ch, PROP_SAVE, range(arr.shape[-1])], names=['object', 'ch', 'prop', 'frame'])
+            df = pd.DataFrame(narr, index=index, columns=cellids)
+
             if exists(join(output, 'df.csv')):
                 ex_df = pd.read_csv(join(output, 'df.csv'), index_col=['object', 'ch', 'prop', 'frame'])
                 ex_df.columns = pd.to_numeric(ex_df.columns)

diff --git a/celltk/caller.py b/celltk/caller.py
@@ -9,8 +9,37 @@
 from utils.file_io import make_dirs
 import sys
 
+
+
 logger = logging.getLogger(__name__)
 
+import os
+from os.path import join, basename, exists, dirname
+import collections
+
+def multi_call(inputs):
+    contents = load_yaml(inputs)
+    pin = contents['PARENT_INPUT']
+    pin = pin[:-1] if pin.endswith('/') or pin.endswith('\\') else pin
+    input_dirs = [join(pin, i) for i in os.listdir(pin) if os.path.isdir(join(pin, i))]
+    contents_list = []
+    for subfolder in input_dirs:
+        conts = eval(str(contents).replace('$INPUT', subfolder))
+        conts['OUTPUT_DIR'] = join(conts['OUTPUT_DIR'], basename(subfolder))
+        contents_list.append(conts)
+    return contents_list
+
+
+def convert(data):
+    if isinstance(data, basestring):
+        return str(data)
+    elif isinstance(data, collections.Mapping):
+        return dict(map(convert, data.iteritems()))
+    elif isinstance(data, collections.Iterable):
+        return type(data)(map(convert, data))
+    else:
+        return data
+
 
 def extract_path(path):
     f = glob(path)
@@ -82,7 +111,7 @@ def run_operation(output_dir, operation):
     functions, params, images, labels, output = parse_operation(operation)
     inputs = prepare_path_list(images, output_dir)
     logger.info(inputs)
-    
+
     inputs_labels = prepare_path_list(labels, output_dir)
     output = join(output_dir, output) if output else output_dir
     caller = _retrieve_caller_based_on_function(functions[0])
@@ -119,6 +148,19 @@ def call_operations(contents):
     logging.getLogger("PIL").setLevel(logging.WARNING)
     run_operations(contents['OUTPUT_DIR'], contents['operations'])
     logger.info("Caller finished.")
+    return
+
+
+def _parallel(args):
+    '''
+    Use this function if you want to multiprocess using PARENT_INPUT argument 
+    (see input_fireworks.yml). 
+    '''
+    contents_list = multi_call(args.input[0])
+    contents_list = [convert(i) for i in contents_list]
+    pool = multiprocessing.Pool(args.cores, maxtasksperchild=1)
+    pool.map(call_operations, contents_list, chunksize=1)
+    pool.close()
 
 
 def parse_args():
@@ -133,7 +175,12 @@ def parse_args():
 def main():
     args = parse_args()
     if len(args.input) == 1:
-        single_call(args.input[0])
+        contents = load_yaml(args.input[0])
+        if "PARENT_INPUT" in contents:
+            _parallel(args)
+        else:
+            call_operations(contents)
+            # single_call(args.input[0])
     if len(args.input) > 1:
         num_cores = args.cores
         print str(num_cores) + ' started parallel'

diff --git a/celltk/command.py b/celltk/command.py
@@ -12,7 +12,7 @@ def main():
     parser.add_argument("-l", "--inputs_labels", help="images", nargs="*", default=None)
     parser.add_argument("-o", "--output", help="output directory", type=str, default='temp')
     parser.add_argument("-f", "--functions", help="functions", nargs="*")
-    parser.add_argument("-p", "--param", nargs="*", help="parameters", default=[])
+    parser.add_argument("-p", "--param", nargs="+", help="parameters", action='append')
     args = parser.parse_args()
 
     params = ParamParser(args.param).run()
@@ -23,7 +23,7 @@ def main():
 
     if len(args.functions) == 1 and args.functions[0] == 'apply':
         pass
-        # ch_names = operation['ch_names'] if 'ch_names' in operation else images
+    # ch_names = operation['ch_names'] if 'ch_names' in operation else images
         # obj_names = operation['obj_names'] if 'obj_names' in operation else labels
         # caller(zip(*inputs), zip(*args.inputs_labels), args.output, obj_names, ch_names)
     elif args.inputs_labels is None:

diff --git a/celltk/labeledarray/LICENSE b/celltk/labeledarray/LICENSE
diff --git a/celltk/labeledarray/README.md b/celltk/labeledarray/README.md
diff --git a/celltk/labeledarray/__init__.py b/celltk/labeledarray/__init__.py
diff --git a/celltk/labeledarray/labeledarray/__init__.py b/celltk/labeledarray/labeledarray/__init__.py
diff --git a/celltk/labeledarray/labeledarray/labeledarray.py b/celltk/labeledarray/labeledarray/labeledarray.py
@@ -83,7 +83,7 @@ def _label2idx(self, item):
         if boolarr.all():
             return (slice(None, None, None), ) + (slice(None, None, None), ) * (self.ndim - 1)
         minidx = min(tidx) if min(tidx) > 0 else None
-        maxidx = max(tidx) if max(tidx) < self.shape[0] - 1 else None
+        maxidx = max(tidx)+1 if max(tidx)+1 < self.shape[0] else None
         if boolarr.sum() > 1:
             return (slice(minidx, maxidx, None), ) + (slice(None, None, None), ) * (self.ndim - 1)
 

diff --git a/celltk/labeledarray/labeledarray/utils.py b/celltk/labeledarray/labeledarray/utils.py