Framegrabber video capture with reliable high-bandwidth
Framegrabber White-PaperMünchen, )
With a PC-based system running Windows, heavy computational workloads and system traffic often interfere with the activities that are essential for reliable video capture. A video capture application generates numerous system events, which must be handled with minimal latency to ensure that all video images arrive successfully in system memory. Due to its non-deterministic nature, Windows(r) does not inherently guarantee a timely response to the events associated with video capture. Consequently, the image capture reliability can be adversely affected by heavy user-mode activities (i.e., user applications) occurring during video capture.
The video capture throughput (i.e., amount of pixel data that must be acquired in a given amount of time), especially from high frame rate cameras, also has an effect on reliability by increasing the frequency of these events. A higher event frequency increases the likelihood of lost images. More importantly, most systems today are designed to fully occupy the PC's resources (i.e., CPUs, memory and peripheral I/O) with image processing, analysis and other tasks, such as networking, which aggravate the situation. A solution that permits reliable high-throughput video capture and processing irrespective of the load on the system is therefore needed.
Matrox Radient and Matrox Imaging Library (MIL)
The Matrox Radient eCL is a family of high-performance frame grabbers with multiple high-bandwidth video inputs. The Matrox Radient eCL supports all configurations from the Camera Link(r) interface standard: Base, Medium and Full (including 10-tap). It is capable of simultaneously capturing from up to four Base video cameras (i.e., up to 4 x 255 MB/s or 1,020 MB/s total with Matrox Radient eCL-QB) or two Full video cameras (i.e., up to 2 x 850 MB/s or 1,700 MB/s total with Matrox Radient eCL-DF).
To handle these large input data rates, the Matrox Radient frame grabber is equipped with a sizable amount of on-board memory, typically 2GB, for buffering direct memory access (DMA) transfers to the host system. A PCIe(r) x8 point-to-point bi-directional host interface, capable of 2GB/s peak, transfers video data to the host without discarding images.
In addition, the Matrox Radient offloads repetitive CPU-intensive tasks using a dedicated Altera(r) Stratix III/IV FPGA device. The FPGA processing functions include, but are not limited to, spatial and temporal filtering, gain and offset correction, dead pixel correction, optical and perspective distortion correction, Bayer color interpolation, color space conversions, and frequency domain transformations. This offload capability frees valuable CPU resources for the rest of the application and also accelerates the overall application.
The Matrox Radient is programmed using Matrox Imaging Library (MIL), a comprehensive collection of software tools for developing industrial imaging applications. MIL offers an extensive list of programming functions for image capture, processing, analysis, annotation, display and archiving. These functions are carefully optimized to address the severe time constraints encountered in many applications.
Specifically designed to maximize image capture performance, MIL can perform image capture control in Windows® kernel mode to ensure greater determinism and faster response time. In addition, MIL’s multi-buffered mechanism supports callback functions for implementing simultaneous capture and processing. This further strengthens application reliability by limiting, if not eliminating, the occurrences of discarded images when the host image processing time occasionally exceeds the image capture time.
Command queuing: the key
For the study on reliable high-bandwidth capture using the Matrox Radient, standard MIL functions were used to queue a recurring sequence of commands onto the board through the software driver: image capture, image processing (on the FPGA) and image buffer transfer (using DMA to copy from on-board to host memory). The ability to queue commands onto the Matrox Radient is key to its success in delivering reliable high-bandwidth image capture and on-board processing performance with minimum system variability (jitter).
Test setup and results
The test equipment used for the study included:
- PC: Dual Intel Xeon E5645 CPUs (12 cores) with 24 GB DDR3 SDRAM running 64-bit Windows Vista.
- Frame Grabber: Matrox Radient eCL-DF (i.e., dual-Full) board in the PC and connected to a Camera Link video simulator.
- Video source: Camera Link video simulator generating two 1K x 1K x 8-bit streams, each sent over 8 taps at 85 MHz and producing 630 fps or MB/s of image data (for a total of 1260 MB/s).
A MIL-based benchmarking application simultaneously captured from two video streams to the Matrox Radient's memory. To demonstrate the image processing offload capability, the first stream was subjected to a gain-and-offset operation performed by the processing FPGA on the Matrox Radient. Both streams were then transferred to host memory using the Matrox Radient's DMA engine, which operated independently from the host.
The image capture, process and transfer to host activities consumed 8 x 630MB/s or 5,040 MB/s of the Matrox Radient's on-board memory bandwidth. For the first stream, the on-board memory was accessed once for the initial capture, three times for the gain and offset operation (i.e., image, gain values and offset values), one time for writing the result back and one last time for transferring the image to the host. Similarly for the second stream, the on-board memory was accessed once for the initial capture and once for transferring the image to the host.
As previously described, to obtain the minimum system jitter, the benchmarking application relied on MIL's ability to pre-queue the capture and on-board processing commands for each image or frame and perform function callbacks in response to various events (i.e., end-of-grab, end-of-processing and end-of-transfer). In addition to feeding and monitoring board operations, the benchmarking application had the CPU cores perform I/O-bound image processing (i.e., frame averaging) upon notification (i.e., callback function invoked).
To obtain the statistics needed to qualify the performance of the Matrox Radient, both the benchmarking application and MIL were instrumented to capture various timestamps at the end of specific operations on each frame:
End-of-Grab (EoG): Sampled in the device driver's interrupt service routine (ISR) when the capture of an image onto the board's memory is complete.
- End-of-Processing (EoP): Also sampled in the device driver's ISR when the transfer to the host of the image processed by the board is complete.
- Function Callback (FC): Sampled when the callback function is invoked by MIL before the (user-mode) image processing on the host.
The timestamps were then used to determine the variability or jitter (in milliseconds) when responding to these events (see Table 1).
Executing concurrently with the benchmarking application were three applications that simulated a heavy overall application workload for the PC:
- Application 1: Performed buffer copy operations in a loop to consume host memory bandwidth (i.e., 10 GB/s).
- Application 2: Occupied the CPU cores. Each instance of the application occupied one CPU core at 99%. For the study, nine instances were started (along with the benchmarking application) to fully load the twelve CPU cores (see Figure 1).
- Application 3: Generated network activity by starting and monitoring the benchmarking and above applications through a Windows(r) remote desktop session.
Table 1 and Figures 2 through 5 illustrate that although system resources (CPU, memory and networking) were heavily loaded, image capture and on-board processing times were fairly stable at around the average frame time (1/630fps or 1.53ms) and can thus be considered as deterministic operations with no image loss. The worst-case jitter times (see peaks in Figures 2 and 4) are relatively small when compared to the average frame time (i.e., tenths of a milliseconds compared to 1.53ms). Moreover, whenever the capture time exceeded the average, MIL's command pre-queuing mechanism compensated by capturing and processing the subsequent frame in a shorter time (as seen by the host).
When examining the function callback jitter (see Figures 3 and 5), the resulting high variability is explained by the non-deterministic response times of the user mode applications, which were aggravated by the lack of available system resources. These resources were busy with other tasks. This demonstrates the dependency of the function callback on the host, as opposed to image capture, on-board FPGA processing and transfer to host, which were dependent on the Matrox Radient.
The study results show that the Matrox Radient frame grabber and Matrox Imaging Library (MIL), with its command queuing mechanism, work together to deliver reliable high-bandwidth video capture and effective pre-processing on a PC running Windows irrespective of the user-mode activities occupying host system resources.
Direct-Link zu Radient
Phone +49 81 42 / 4 48 41-0
Fax +49 81 42 / 4 48 41-90