Skip to content

Latest commit

 

History

History
53 lines (47 loc) · 3.61 KB

README.md

File metadata and controls

53 lines (47 loc) · 3.61 KB

LGDV Phoniatric Division

The Human Laser Endoscopic (HLE) Dataset

This repository contains the human-laser endoscopic dataset proposed in the Paper Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy. It contains 10 high-speed in-vivo endoscopy videos of human vocal folds during phonation, that were captured using a 4000 FPS camera. While recording, a laser projection unit projected a symmetric 18 by 18 laser grid pattern into the laryngeal area. For further information about the system used to record these videos, please have a look at the paper: Endoscopic laser-based 3d imaging for functional voice diagnostics by Semmler et al.

This is a joint work of the Chair of Visual Computing of the Friedrich-Alexander University of Erlangen-Nuremberg and the Phoniatric Division of the University Hospital Erlangen.

Files

  • visualize.py - An example on how to use the data supplied in this dataset.
  • camera_calibration.json - Contains the intrinsic 3x3 camera matrix, as well as the distortion coefficients.
  • laser_calibration.json - Contains the lasers rotation matrix, the inter-laser angle alpha, the laser grid dimensions and lastly the 3D-translation of the laser.
  • [A-Z][A-Z]/[A-Z][A-Z].avi the actual recording of size 512x256 (HEIGHT x WIDTH).
  • [A-Z][A-Z]/[A-Z][A-Z].json labels extracted from the video. For further instructions have a look at Data.

Data

You can extract the data automatically using the supplied extract_dataset.py script, via
python extract_dataset.py --dataset_path DATASET_PATH
If you do not move files around before hand, you can also just use
python extract_dataset.py
The extracted data will include six directories:

  • glottal_mask: Binary mask of the glottis
  • heatmap: 2D Gaussians depicting the positions of the laser dots
  • mask: Binarized positions of the laser dots
  • png: The recording itself
  • points2d: Numpy array containing sub-pixel accurate 2D point positions
  • vf_mask: A mask of the vocal folds itself.

The data in the specific json files contain:

  • GlottalSegmentation: The Glottal Segmentation as per Frame Polygons of type FRAME_NUM x VERTICES x 2
  • GlottalMidline: The Glottal Midline per Frame of type FRAME_NUM x 2 x 2
  • 2DPoints: The 2d points lying on the superior surface of the vocal folds of type FRAME_NUM x X x Y x 2. Points are of type NaN, if they are not visible inside the Frame.
  • 3DPoints: The triangulated 3D Points of type FRAME_NUM x N x 3
  • Offset: The X- and Y-Offsets of the 2D Points that need to be added to the X and Y Coordinates of the 2DPoints to reconstruct the correct labels.

Examples

CFCMDDMK

Citation

Please cite this paper, if this work helps you with your research:

@InProceedings{10.1007/978-3-031-16449-1_1,
  author="Henningson, Jann-Ole and Stamminger, Marc and D{\"o}llinger, Michael and Semmler, Marion",
  title="Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy",
  booktitle="Medical Image Computing and Computer Assisted Intervention -- MICCAI 2022",
  year="2022",
  pages="3--12",
  isbn="978-3-031-16449-1"
}

You can find a PDF of the Paper in the Vocal3D Repository. Or get it here: Springer Link.