The goals of this development are to deliver complete video pipeline for three popular hi-rez imaging/camera sensors:
2-lane RPiV2.1
, based on Sony IMX219, in1280x720P@60Hz
RGB888 -HD
4-lane OneInchEye
, based on Sony IMX283, in1920x1080P@30Hz
RGB888 -FHD
2-lane OV2740
, in a "Webcam" setup for Lukas Henkel's openLaptop
This is for mature, low-cost Artix7 FPGAs, by using its IBUFDS, IDELAY, ISERDES primitives in the camera front-end. These primitives are available in all IOBs, hence ubiquitous, relatively easy to work with, and already supported by opensource PNR for Xilinx Series7. This also allows future reduction of the total solution cost by migrating to Spartan7, which does not have GTP transceivers (aka "true SerDes").
To be clear -- We don't plan on using any of the 3rd party D-PHY front-end silicon devices specified in XAPP894. Moreover, we won't even use the passive signal conditioning networks that Xilinx is recommending. Instead, our objective is to achieve robust Signal Integrity (SI) and flawless HD/FHD video flow by pulling-in only on-chip resources.
This brings about multiple challenges. Compared to StereoNinja's original work, which was for LatticeSemi ECP5 FPGA, our target device does not come with D-PHY receiver circuit (no wonder why Artix7 costs less 😇).
On top of it, we are doing it on meager internal-to-FPGA BRAM budget, without expending much storage. In other words, PSRAM, SDRAM, DDR or similar external chips are NOT called to rescue. That's a major upgrade to StereoNinja's design, who realized end-to-end, camera-to-monitor streaming for only 640x480 VGA, and only in Grayscale, far below our vibrant RG888 color palette and HD/FHD quality.
This design is simpler and more affordable, hence uniquely appealing to the makers. It also paves a path to a cost-efficient openLaptop camera subsystem.
It is indeed the NLnet's objective for this project to bring StereoNinja's pioneering work to an order of magnitude higher performance and functionality level.
On the video feed sink side, we plan to showcase real-time streaming to:
- HDMI monitor
- 1Gbps Ethernet, using UDP packets, rendered on a PC with VLC Player
The follow on "Webcam" project (aka Phase2) is to add UVC (USB2.0 Video Class) to this list. In prep for this future work, we plan to develop a number of add-on functions:
Enable OV2740 camera chip
Image Signal Processing (ISP) 4 Webcam
White Balance, Color Correction, Gamma Correction
Video Compression 4 Webcam
- JPEGNot needed for 1GE. But, compression is a must-have for the 1024P@30Hz over USB2
While our design is pushing Artix7 to its limits, it's these very silicon limits that stand in the way of getting more from the given imaging sensors. Recall that even StereoNinja's generally faster and better LatticeSemi FPGA cannot generate HDMI at 1920x1080@60Hz.
Using Vivado tool chain, we were able to bring this design to the point where the only remaining factor preventing further resolution increase is the Max Specified Toggle Rate for Artix7 I/O structures and clock distribution networks.
We are thrilled to use openXC7 toolkit, including its web-based CI/CD flow. That's both for the security of images taken, and to help openXC7 attain the level of robustness found in commercial / proprietary CAE tools, Xilinx Vivado in particular. In that sense, OpenEye-CamSI is the continuation of our TetriSaraj, which was the first openXC7 test case for a design more complex than a mere blinky.
Our goal is to bring to the light issues that arrive from openXC7's first-ever exposure to demanding, cutting-edge designs like this one.
It is important to note that, in its current state, openXC7 is rather immature, without even the basic timing awareness, yet alone timing-driven optimizations. It has never been used for designs that push the underlying silicon to the max. The as-is openXC7 is simply not adequate for proper timing closure.
While another project is underway and looking to plug this major opensource STA gap, it won't be ready for our Phase1. We're therefore planning Phase2 and Phase3, hoping to try this new timing closure prowess... Stress-testing, congesting, overwhelming it, all for the sake of making it more amenable to realizing higher Utilization and Fmax metrics with silicon at hand.
The choice of our development platform was governed by the benefit for the greater community. The boards were selected for their opensource CRUVI connectivity spec. Yes, they are hardly used and don't come with support collateral found on the more popular hardware platforms. That's exactly why we opted for them!
We have indeed come across quite a few board problems and idiosyncrasies, spending a fair amount of time chasing issues that simply should not have been there. Still, since those are both opensource and EU products, this extra effort was for a good cause. We are certain that this project will help increase their visibility, and boost their acceptance rate among open makers.
- Familiarize with Trenz hardware platform: Connectivity, clocking, power, etc.
- Bring up Blinky on Trenz
- Standalone HDMI image generation: 1280x720P@60Hz RGB888 (HD)
- Standalone HDMI image generation: 1920x1080P@30Hz RGB888 (FHD@30Hz)
- Standalone HDMI image generation: 1920x1080P@60Hz RGB888 (FHD@60Hz). Established that FHD@60Hz is physically impossible with given silicon
- Experiments with IMX219 configuration and resolution options
- Sniff Raspberry Pie interactions with Camera
- Familiarize with libcamera drivers
- Experiments with LVDS and termination schemes. How to do better than XAPP894, sharing insights with Lukas
- Test opensource 4-lane adapter, sharing feedback with Edmund
- Full redesign, fixing bugs and expanding scope, to now include 2 and 4 lanes
- Clock Detection logic without standard LP I/O
- SYNC Decoding logic and Byte Alignement
- Word Alignement
- Header Decoding and Stripping
- Acquire static image from Camera, transfer it to DualPort BRAM, then HDMI
- Uncovered & debugged crosstalk problems on VHDPlus CRUVI adapter
- Found Trenz signal inversions and inconsistencies, then shared with Antti
- HD video transfer from Camera to HDMI - At first jerky, with visible frame loss
- Found CDC bug in opensource AsyncFIFO, sharing insights with IP owners
- Debayering logic for Color Space Conversion
- Synchronization logic for smooth video streaming, w/o external storage
For this first play, the hardware was used in the following config:
- Trenz Carrier Card (TEB0707-02)
- Trenz 4x5 SoM with Artix-7 FPGA (TE0711-01)
- VHDPlus HDMI + 2-lane CSI Camera adapter
- Raspberry Pi V2.1 camera (Sony IMX219)
Our HDMI image generator is limited by the toggle rate that's realistically possible with Artix7 clock and I/O primitives. The max we can get from it is:
- 720P@60Hz
- 1080P@30Hz
More about this and silicon limitations in HDMI issue. Here is the test image that our HDMI RTL outputs on its own, w/o camera connected to it:
There are many configurable registers in the IMX219 imaging sensor. Having fully familiarized with them, both by sniffing RPi I2C transactions and running own experiments, we've settled on the 720P@60Hz. I2C Controller was developped from the scratch, and used to load camera registers. More on it in I2C issue.
Here is an illustration of I2C waveforms, i.e. our Control Plane protocol.
Sony IMX219 camera sensor is used for image acquisition. It is connected to FPGA with a 15-pin flex cable (FFC), using VHDPlus CRUVI module. We recommend checking our blog for additional detail on this topic.
It turned out that the 4-lane CRUVI connector had a serious design flaw, essentially shorting system power. Having identified its root cause, we had to fully redesign it. We have also run into Trenz I2C wiring and supply complications related to onboard CPLD. Luckily, we managed to find a way around it without having to open the CPLD and change its internals.
The VHDPlus CRUVI carries three 100 Ohm termination resistors, one for clock, plus two for data, as shown below:
Location of these resistors close to data source is a major SI problem. Termination must be placed at the end of Transmission Line (TL), next to sink. Yet, on this system, the termination is not only misplaced, but there are also three connectors in the path of high speed signals:
- Camera to CRUVI ->
that's where the stock termination is placed
- CRUVI to Carrier Card
- Carrier to FPGA SOM Card.
Consequently, we had to remove the stock termination and replace it with internal-to-FPGA resistors, so essentially relocating it to the very end of TL.
When Artix7 internal termination is used in connecton to LVDS_25 IOSTANDARD, it is important to set the correspoding bank's VCCIO to 2.5V. Only then will their differential resistance be 100 Ohms.
That's on Trenz hardware done in the following way:
- switch DIP 2 to ON state, to set the IOV to 2.5V
- use Jumpers J14, J16, and J17 to connect VCCIO to IOV.
Once all these hardware and board problems were put behind, we turned focus back to RTL design.
Given the goal to minimize overhead and eliminate the need for LP pins (see XAPP894), RTL had to provide a clever substitute for our non-existent I/O compared to standard Camera Serial Interface. After some experimentation, we settled on a scheme that detects blanking intervals between frames by using Clock_Lock_FSM with thresholds and timeouts. The output of this Clock_Lock_FSM is then used as global reset for the camera side of pipeline. That's to say that the video pipeline is out of reset only when camera clock is active and stable.
To have fluid and seamless video, we need to pass Pixel data with Line and Frame synchronization pulses from Camera to HDMI clock. Aiming for low-cost solution, this Clock Domain Crossing (CDC) and Timebase Handoffs are accomplished using Line Buffering instead of full Frame Buffering, so saving storage resources. More on this topic in Line buffering issue.
In addition to AsyncFIFO for csi_clock->hdmi_clock CDC, the signals that play crucial role in the Timebase Handoffs process are:
- csi_line_in
- hdmi_line_in
They mark the beginning of each new scan line in incoming video from Camera, as well as outgoing line to HDMI.
Furthermore, Async FIFO is kept in reset when either Camera or HDMI are Out-Of-Frame. It is through this fifo_areset_n and hdmi_reset_n that we are forcing HDMI to track the Camera. rgb2hdmi is the bridge between Camera and HDMI+GE worlds. Timing diagram below contains additional detail.
In all honesty, it took us a bit of trial-and-error to get it right. That was to some extent due to CDC bug we found in the fullness count of AsyncFIFO, which is the IP block we pulled from one of the opensource libraries.
In the end, when everything was tuned, and AsyncFIFO CDC problem factored out of the solution, the final result came to be as nice and polished as follows:
The following diagram illustrates design clocking scheme and block structure:
Design operates off of a single external 100MHz clock, from which we generate 200MHz refclk for IDELAY and detection of camera activity. The camera clock comes in two forms:
- CSI Bit clock, for sampling and capturing incoming DDR data
- CSI Byte clock (= Bit clock / 4) for the rest of CSI video pipeline
The frequency of camera Bit and Byte clocks is the function of sensor resolution. A PLL on HDMI side generates two specific clocks from the common 100MHz input:
- Parallel, for RGB888 HDMI data
- Serial, (x5) for OSERDES and transmission of multiplexed video to monitor
The frequency of these two clocks is the function of HDMI resolution. We provide Verilog macros in the central top_pkg.sv
for selection of HDMI resolution.
The datapath is a straight video pipeline with option for width scaling via NUM_LANE
parameter, which is also located in the central top_pkg.sv.
The pipeline starts with circuits for capturing serial Double Data Rate (DDR) camera stream, eye-centering it using IDELAY, converting it to paralell format with ISERDES, then searching for Byte boundaries, followed by Word alignement, followed by Packet Header parsing and extraction of video Payload pixels.
It is only at this point, where video is unpacked, that we may engage in ISP. The ISP is a set of functions that could be as elaborate as one is willing to invest in them. Here is a good read on it. The extent of ISP for this project is clearly defined. The future Phase2 and Phase3 would add more.
Debayer is the first ISP function we implemented. Without it, the displayed colors would have looked weird. More on it in Debayer issue.
- Repeat the same for the 4-lane IMX283 camera sensor
- Step-by-step introduce the following 3 ISP elements:
- Debayer
- Dead Pixel Management
- Manual Exposure Control
- Implement another (lower) resolution of our choice
Configuring the IMX283 sensor for different resolutions requires precise adjustment of its registers. For each data entry, the first four digits represent the register address, while the last two digits indicate the value to be written. Below, we describe the configuration process for achieving both 720p and 1080p resolutions at 60 FPS using Mode 3 and Mode 2, respectively.
This sensor requires specific delays after setting some registers. In the .mem files, this is implemented as an additional byte appended to the three bytes representing the address and value of each register. These delays are crucial for ensuring proper operation and stability of the camera. More details can be found in our issue.
For 720p resolution, Mode 3 is used, which applies 3x3 horizontal/vertical binning. This method combines pixel values within a 3x3 grid for each raw color channel, reducing resolution while maintaining image quality.
The following registers enable Mode 3 operation:
3004 1e
3005 18
3006 10
3007 00
- Y_OUT_SIZE: Set the vertical size to 722 (0x2D2) to account for skipping 2 rows for debayer processing:
302f d2
3030 02
- WRITE_VSIZE: Set the vertical output size to 726 (0x2D6):
3031 d6
3032 02
- H_TRIMMING_START: Start the horizontal cropping at position 840 (0x348):
3058 48
3059 03
- H_TRIMMING_END: End the horizontal cropping at position 4584 (0x11E8):
305a e8
305b 11
These settings result in 1920 8-bit data points per line, equivalent to 1280 12-bit raw data points used for color processing.
- VMAX: Set to 2551 (0x9F7) to ensure frame blanking:
3038 f7
3039 09
- HMAX: Set to 470 (0x1D6) to ensure line blanking:
3036 d6
3037 01
These settings ensure a frame rate of 60 FPS.
For 1080p resolution, Mode 2 is used, which applies 2x2 horizontal/vertical binning. This method combines pixel values within a 2x2 grid for each raw color channel, maintaining higher resolution while reducing the overall pixel count.
The following registers enable Mode 2 operation:
3004 0d
3005 11
3006 50
3007 00
- Y_OUT_SIZE: Set the vertical size to 1082 (0x43A):
302f 3a
3030 04
- H_TRIMMING_START: Start the horizontal cropping at position 840 (0x348):
3058 48
3059 03
- H_TRIMMING_END: End the horizontal cropping at position 4584 (0x11E8):
305a e8
305b 11
These settings result in 2880 8-bit data points per line, equivalent to 1920 12-bit raw data points used for color processing.
- VMAX: Set to 2234 (0x8BA) to ensure frame blanking:
3038 ba
3039 08
- HMAX: Set to 540 (0x21C) to ensure line blanking:
3036 1c
3037 02
These settings ensure a frame rate of 60 FPS.
Resolution | Mode | Binning | Y_OUT_SIZE | VMAX | HMAX | Frame Rate |
---|---|---|---|---|---|---|
720p | Mode 3 | 3x3 | 722 | 2551 | 470 | 60 FPS |
1080p | Mode 2 | 2x2 | 1082 | 2234 | 540 | 60 FPS |
These configurations leverage the IMX283's advanced binning capabilities to optimize performance while maintaining high image quality.
You can watch a demonstration of this setup in action at the following link:
One of the key challenges in working with high-resolution sensors, like the IMX283, is managing dead pixels. These are defective pixels on the sensor that can negatively impact image quality. On this system, several strategies were implemented to effectively eliminate the impact of dead pixels.
Due to the limitations of the Artix-7 FPGA, it is not possible to support the full resolution output of the IMX283 sensor. To address this, the resolution was reduced while still preserving image quality by utilizing all available pixels on the sensor. This was achieved through binning, a preprocessing step offered by the sensor itself.
-
For 720p Resolution:
A 3x3 binning method is used. From an 8x8 matrix of pixels, a 2x2 matrix is generated. This means that each new pixel is a combination of several neighboring pixels. -
For 1080p Resolution:
A 2x2 binning method is used. From a 4x4 matrix of pixels, a 2x2 matrix is generated, combining adjacent pixels into one.
This process is illustrated in the image below:
By combining neighboring pixels, binning effectively acts as a form of filtering. This directly mitigates the effect of dead pixels, as their contribution is averaged out with surrounding pixel data.
In addition to binning, the debayering algorithm implemented on this system further reduces the impact of dead pixels. The algorithm performs additional averaging across pixels to reconstruct color information. This further minimizes the possibility of dead pixels appearing in the final image.
Manual exposure control in the IMX283 sensor is achieved through the configuration of the SVR
(Start Vertical Readout) and SHR
(Start Horizontal Readout) registers. These registers determine the timing of the sensor's integration period, which is critical for controlling exposure and image brightness.
In our configuration we use:
- SVR = 0: This value sets the integration start vertical period to the minimum possible value, effectively aligning the start of integration with the beginning of the readout period.
- SHR = 16: This value specifies the integration start horizontal period. A higher
SHR
value reduces the integration time by shifting the horizontal readout window closer to the end of the frame.
By selecting these values, the integration time is minimized, which is suitable for scenarios with bright lighting conditions or high frame rate requirements.
- Add 1GE as second video sink, then move display to remote PC, via UDP VLC
- Document implementation via block diagram and project repo
- Port final design from Vivado to openXC7
- Simulate it with Verilator and cocoTB, in CI/CD system
- Document scripts and flows used in this process
(Chili.CHIPS*ba team)
- Enable OV2740 camera chip
(Webcam team)
- Add 3 new ISP functions
- White Balance [ ] Color Correction [ ] Gamma Correction
- and JPEG video compression
The hardware platform originally selected for this project proved to be a capital miss and source of most of our troubles.
We ended up having to debug board and connectivity issues on PCBAs that no one used before, or combinations thereof that the manufacturer never tested. Be it incorrect termination resistors, straight shorts, cold solder joints and opens on flaky connectors, signal integrity degradation from too much modularity / discontinuity on the path, swaps of differential pairs, or wiring high-speed clocks to the non-Clock Capable (CC) FPGA pins, the share of board issues we had to deal with was overwhelming.
On top of that comes scarce availability of Trenz board, with lead times in excess of three months for a simple passive CRUVI Debug Card. All that has cost us dearly in time and effort. Yes, CRUVI is open-source. Yes, Trenz is European. But, we cannot afford to keep investing and loosing so much in order to support that cause.
Going forward, we are parting away with Trenz CRUVI system, and switching to Puzhitech PA-StarLite. This compact card brings everything we need for video projects off-the-bat, within basic package, including 2-lane MIPI CSI connector, HDMI output and 1Gbps Ethernet. No need for multiple add-on cards and connectors to put together a useable system that's 3x more expensive, more fragile, and not in stock.
This card also comes with solid expension potential via two 40-pin standard 100mil headers. They are mechanically robust, physically accessible for debugging, and still can carry relatively high-speed signals thanks to short, balanced wiring on the mainboard.
While Puzhitech board already comes with 15-pin 0.5mm FPC for 2-lane MIPI CSI, the 40-pin 100mil headers can be used to hook up additional RPi cameras via its 15-pin 1mm FPC, as well as 22-pin 0.5mm connector for the 4-lane interface. Below is a picture of the adapters we used. We recommend fitting them with DIP connectors that have longer legs, so that they protrude on the bottom side, and also serve as attachment points for oscilloscope. That alleviates the need for the €36 Trenz Debug Connector and its impossibly long lead time of 3+ months.
- The project proposal is under construction
We are grateful to:
- NLnet Foundation's sponsorship for giving us an opportunity to work on this fresh new take at FPGA video processing.
- StereoNinja, whose project has set the stage for this major upgrade of opensource camera performance and functionality, as well as its expansion from LatticeSemi to Xilinx ecosystem.