Skip to content

Commit

Permalink
Update prelim results
Browse files Browse the repository at this point in the history
  • Loading branch information
Richard Zhao committed May 10, 2017
1 parent 0ab0250 commit 72ec939
Showing 1 changed file with 25 additions and 7 deletions.
32 changes: 25 additions & 7 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ and [Richard Zhao](mailto:[email protected]) (richardz)
![frames]({{ site.baseurl }}/public/img/temple_3.gif)
![flow]({{ site.baseurl }}/public/img/flow.gif)

The top gif is an input frame sequence, from which we calculate an optical flow (bottom) which
represents movement of subjects in the input.

## Summary

We implement super-realtime (>30fps), high resolution optical flows on a mobile GPU platform. Fast
Expand All @@ -23,15 +26,22 @@ as object detection or image stabilization.

## Challenges

Image pyramids
The main technical challenges associated with this project involve optimizing the algorithm to run
on the NVIDIA Jetson, which has a less powerful CPU and GPU than traditional desktop machines.

Number of patches increases
Since copying memory between the device and host is the main performance bottleneck, we designed
the architecture as a pipeline which essentially performs copies at just the beginning and end of
the pipeline.

Maintaining accuracy of the flow
The most significant computational bottleneck in the original implementation was the construction
of image pyramids (a series of downsampled images, and their gradients). We used CUDA kernels to
significantly improve the performance of this step.

Memory management
Additionally, during the gradient descent phase of the algorithm, which acts on local patches of
the image, careful management of thread blocks is required to hide the system's memory latency.

Optimizing to Jetson (which has a lackluster CPU)
Finally, all of our optimizations are done while preserving the accuracy of the computed flow. This
makes our approach both fast and accurate enough for realtime use.

## Preliminary Results

Expand All @@ -40,8 +50,16 @@ All results are from our code running on an NVIDIA Jetson TX2.
### Optical Flow (total)

Using a hybrid GPU-CPU implementation, we achieve an end-to-end latency of roughly 10ms. This
is a speedup of roughly 10x.
is a speedup of around **10x**.

### Image pyramid construction

90 ms => 3 ms (30x speedup)
The image pyramid construction step was optimized to run in just 3 ms, which is a speedup of
**30x** over our optimized CPU version, which takes 90 ms.

## Remaining Work

Before the deadline, we stil have some final tuning to do on the gradient descent algorithm,
and finalizing the video processing pipeline (currently, our pipeline operates on two images at a
time). The performance figures should remain, however, roughly the same.

0 comments on commit 72ec939

Please sign in to comment.