For a system to act quickly and intelligently based on what it sees, it needs immediate access to robust data about the 3D size and location of shapes in its environment.
3D sensors produce raw point-cloud data that requires further analysis to be rendered into actionable 3D information. This approach increases application latency by pushing point-cloud calculations into the application software layer. 3D applications need a better approach—an architectural solution that interprets 3D data more quickly and efficiently.
TYZX G2 and G3 Embedded Vision Systems use TYZX ProjectionSpace™ primitives to transform point-cloud data into efficient 2D or 3D geometric representations and rapidly segment a scene into relevant objects. Dedicated TYZX hardware performs ProjectionSpace computations in real time, eliminating latency and processing burden. With TYZX, high-level applications get immediately useful 3D data, so they can work faster and more productively.
How ProjectionSpace Works
Raw 3D-sensor point clouds are most often represented as an image with row and column locations containing metric-distance measurements. To represent an object or obstacle relative to a camera, each point first needs to be transformed into a 3D metric location; that is, each point needs to be assigned X, Y and Z coordinates represented by 3 floating-point numbers. This transformation of every point in a space seen by a camera can be computationally expensive.
Such a transformation produces coordinates that are relative to the sensor’s coordinate system, rather than a useful coordinate system from the real world, such as the dimensions of a room. Or, to take an example from a camera mounted in a vehicle, sensors may be pointed slightly toward the ground or looking to the left or right of a vehicle axis. A common, useful next step then is to perform a rigid 3x3 transform taking the points from the sensor’s coordinate system and placing them in a world coordinate system such as that of a vehicle.
When it comes to applications making sense of a scene and interacting in real time, 3D points positioned in a preferred coordinate system is vastly more useful than a raw point cloud, but the data representation of 3 floating-point values per point is still expensive to operate on, especially with limited compute resources. A more useful representation would be to assign or “project” the points into cells in a regular 2D or 3D grid – a Euclidean 3D quantized volume. If a point’s 3D floating-point coordinates lie within a particular 2D or 3D cell’s volume, the cell’s count can be incremented. In this way evidence can be accumulated that a particular cell is occupied. A high count indicates high confidence that the cell is occupied. Applications can also set a minimum count threshold on cells thereby eliminating spurious “sensor noise” and further reducing application workload.
By performing these operations, TYZX ProjectionSpace changes a point-cloud represented as a large floating-point data structure into an easily searched and segmented array, in the preferred coordinate system of the application. While these operations are valuable in their own right for most applications, perhaps the best part is that they are computed in parallel at 60 frames per second directly in hardware. This means no additional latency for results, no application CPU or memory bandwidth burden and of course power savings.
ProjectionSpace in Action
Obstacle Detection and Obstacle Avoidance (ODOA)
Obstacle detection algorithms attempt to identify objects that will block the passage of a vehicle, since these objects could likely result in the vehicle needing to steer, slow, or stop. 3D images are critical for object detection. To take the appropriate action, a navigation system needs to know an object’s distance from the vehicle, as well as the true size of the object. Real-time, frame-rate updates to this data allow applications to judge trajectories and closing rates. But searching full 3D point clouds for regions containing an obstacle can be time-consuming. In real-time applications such as vehicle control, time is a precious commodity.
ProjectionSpace images greatly assist with the speed and accuracy of obstacle detection. With appropriate parameters, the ProjectionSpace image can be configured to summarize the scene in terms that are important to identify potential obstacles – reducing the size of the image needed for analysis, reducing noise, and reducing clutter not relevant to vehicle navigation. Obstacles with a large surface area perpendicular to the ground will stand out with high pixel counts in a top-down, 2D ProjectionSpace image. Such a high pixel count is a very useful search criterion that is valid even for rough (and non-planar) ground surfaces. It’s much more efficient for an application to search for obstacles in a ProjectionSpace image than it is to search for obstacles in the raw 3D point cloud produced by the sensors. If necessary, an application can perform further computation to verify the obstacles detected, but this computation can be limited to just the portion of the image where the obstacle has been detected. Reducing computation reduces latency, and therefore improves the responsiveness of the application overall.
Configuring the ProjectionSpace image for obstacle detection typically takes into account vehicle-specific parameters, such as the mounting location and orientation of the sensor relative to the ground, and the size of the vehicle. Details about the sensor position and orientation are used to provide the 3D rigid transform applied to each 3D point as part of the ProjectionSpace transform.
Often, the sensor is mounted off the ground and tilted down toward the ground to obtain a good view of the scene directly in front of the vehicle. But this orientation is not optimal for the application guiding the vehicle. Pixels at the top of a tall object will be closer to the sensor than pixels on that object near the ground. It is simpler for obstacle detection algorithms if the distance to all parts of an object are measured along a vector parallel to the ground. Then, if a person is 3 feet away from the vehicle when measured along the ground, there would be many points measuring 3 feet away in the virtual view, and distance to the object is easily determined. The size of the vehicle can also be used to set the bounds of the ProjectionSpace data along each axis.
Setting clipping planes to monitor only data within a certain distance from the ground can eliminate clutter from the ProjectionSpace image. For instance, a short vehicle does not really need to know if there is an overhanging tree branch 10 feet off the ground.
Programmers can configure the cell size used for aggregating ProjectionSpace data in proportion to the vehicle scale and the accuracy required for the ODOA application. For instance, obstacle location requirements might be on the order of 20 centimeters, so cell sizes could be set to 10 centimeters, accumulating many distance pixels in one cell in closer regions of the image. Making cell size configurable reduces the number of cells in the ProjectionSpace image overall. It also makes it straightforward for an application to automatically eliminate spurious erroneous pixels, which will be broadly distributed and produce very small cell pixel counts.
Person-Detection and Person-Tracking
The same properties that make ProjectionSpace so useful for detecting and avoiding obstacles also make it highly useful for detecting and tracking people.
ProjectionSpace can be configured to take into account the orientation and position of the sensor with respect to the floor plane, so that person detection can essentially take place in a virtual top-down view of a space, even when the sensor is mounted obliquely on a wall and configured to look down and out across a room. A virtual top down view is ideal for person detection, since from this vantage people standing or walking, making it straightforward to cluster the data from each person together. People also have a large surface area perpendicular to the floor, creating high cell counts compared to data on unoccupied floor or objects too small to be a person.
The Benefit for 3D Application Developers
Most 3D visions produce 3D point clouds and pass these clouds along to higher level applications, which then must process the point clouds to define representations, such as the shape of a person or vehicle, that are meaningful to the application. Leaving higher-level applications to process 3D point clouds creates several problems, though:
Forcing high-level applications to process raw cloud data increases application latency, making the 3D solution less interactive.
Forcing high-level applications to process raw cloud data typically increases the size, weight, and cost of on-board computing resources.
TYZX ProjectionSpace, available in the TYZX G2 EVS and the TYZX G3 EVS, processes raw 3D data in hardware, delivering actionable, compact data about 3D shapes in an XYZ coordinate space. ProjectionSpace makes 3D applications faster, more interactive, and more efficient.
In competitive markets building on 3D technology, ProjectionSpace gives application developers a significant head-start.
Thanks to TYZX ProjectionSpace™, applications in areas as diverse as robotic navigation and retail management can begin with high-level data representations that are already meaningful, instead of having to sacrifice CPU cycles and application performance to low-level data processing. Both the TYZX G2 and G3 EVS perform ProjectionSpace calculations in hardware, further accelerating performance and reducing 3D vision latency.