The Human Laser Endoscopic (HLE) Dataset

This repository contains the human-laser endoscopic dataset proposed in the Paper Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy. It contains 10 high-speed in-vivo endoscopy videos of human vocal folds during phonation, that were captured using a 4000 FPS camera. While recording, a laser projection unit projected a symmetric 18 by 18 laser grid pattern into the laryngeal area. For further information about the system used to record these videos, please have a look at the paper: Endoscopic laser-based 3d imaging for functional voice diagnostics by Semmler et al.

This is a joint work of the Chair of Visual Computing of the Friedrich-Alexander University of Erlangen-Nuremberg and the Phoniatric Division of the University Hospital Erlangen.

Files

visualize.py - An example on how to use the data supplied in this dataset.
camera_calibration.json - Contains the intrinsic 3x3 camera matrix, as well as the distortion coefficients.
laser_calibration.json - Contains the lasers rotation matrix, the inter-laser angle alpha, the laser grid dimensions and lastly the 3D-translation of the laser.
[A-Z][A-Z]/[A-Z][A-Z].avi the actual recording of size 512x256 (HEIGHT x WIDTH).
[A-Z][A-Z]/[A-Z][A-Z].json labels extracted from the video. For further instructions have a look at Data.

Data

You can extract the data automatically using the supplied extract_dataset.py script, via
python extract_dataset.py --dataset_path DATASET_PATH
If you do not move files around before hand, you can also just use
python extract_dataset.py
The extracted data will include six directories:

glottal_mask: Binary mask of the glottis
heatmap: 2D Gaussians depicting the positions of the laser dots
mask: Binarized positions of the laser dots
png: The recording itself
points2d: Numpy array containing sub-pixel accurate 2D point positions
vf_mask: A mask of the vocal folds itself.

The data in the specific json files contain:

GlottalSegmentation: The Glottal Segmentation as per Frame Polygons of type FRAME_NUM x VERTICES x 2
GlottalMidline: The Glottal Midline per Frame of type FRAME_NUM x 2 x 2
2DPoints: The 2d points lying on the superior surface of the vocal folds of type FRAME_NUM x X x Y x 2. Points are of type NaN, if they are not visible inside the Frame.
3DPoints: The triangulated 3D Points of type FRAME_NUM x N x 3
Offset: The X- and Y-Offsets of the 2D Points that need to be added to the X and Y Coordinates of the 2DPoints to reconstruct the correct labels.

Examples

Citation

Please cite this paper, if this work helps you with your research:

@InProceedings{10.1007/978-3-031-16449-1_1,
  author="Henningson, Jann-Ole and Stamminger, Marc and D{\"o}llinger, Michael and Semmler, Marion",
  title="Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy",
  booktitle="Medical Image Computing and Computer Assisted Intervention -- MICCAI 2022",
  year="2022",
  pages="3--12",
  isbn="978-3-031-16449-1"
}

You can find a PDF of the Paper in the Vocal3D Repository. Or get it here: Springer Link.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

The Human Laser Endoscopic (HLE) Dataset

Files

Data

Examples

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

The Human Laser Endoscopic (HLE) Dataset

Files

Data

Examples

Citation