Siggraph 2014
Studio Course Notes

Introduction to Optical Triangulation

3D Printing has become a popular subject these days. More and more low cost desktop 3D printers are introduced, and open source projects let hobbyists build their own. Without models, a 3D printer is not really useful. Professionals have access to complete CAD software or modelers that costs thousands and need extensive training. They can also acquire a scene/object using 3D scanners. This course addresses the problem of creating 3D models for 3D printing by copying and modifying existing objects. As is the case for desktop 3D printers this course teaches the mathematical foundations of the various methods used to build 3D scanners, and includes specific instructions to build several low-cost homemade 3D scanners which can produce models of equal or better quality as many commercial products currently in the market.

These course notes are organized into three primary sections, spanning theoretical concepts, practical construction details, and algorithms for constructing high-quality 3D models. Chapters Introduction and Mathematics of Triangulation survey the field and present the unifying concept of triangulation. Chapters Camera and Projector Calibration, The Laser Slit Scanner, and Structured Light document the construction of projector-camera systems, slit-based 3D scanners, and 3D scanners based on structured lighting. The post-processing processes for generating polygon meshes from point clouds are covered in Chapter Points.

3D Scanning Technology

Metrology is an ancient and diverse field, bridging the gap between mathematics and engineering. Efforts at measurement standardization were first undertaken by the Indus Valley Civilization as early as 2600--1900 BCE. Even with only crude units, such as the length of human appendages, the development of geometry revolutionized the ability to measure distance accurately. Around 240 BCE, Eratosthenes estimated the circumference of the Earth from knowledge of the elevation angle of the Sun during the summer solstice in Alexandria and Syene. Mathematics and standardization efforts continued to mature through the Renaissance (1300--1600 CE) and into the Scientific Revolution (1550--1700 CE). However, it was the Industrial Revolution (1750--1850 CE) which drove metrology to the forefront. As automatized methods of mass production became commonplace, advanced measurement technologies ensured interchangeable parts were just that--accurate copies of the original.

Through these historical developments, measurement tools varied with mathematical knowledge and practical needs. Early methods required direct contact with a surface (e.g., callipers and rulers). The pantograph, invented in 1603 by Christoph Scheiner, uses a special mechanical linkage so movement of a stylus (in contact with the surface) can be precisely duplicated by a drawing pen. The modern coordinate measuring machine (CMM) functions in much the same manner, recording the displacement of a probe tip as it slides across a solid surface (see Figure 1.1). While effective, such contact-based methods can harm fragile objects and require long periods of time to build an accurate 3D model. Non-contact scanners address these limitations by observing, and possibly controlling, the interaction of light with the object.

Figure 1.1 Contact-based shape measurement. (Left) A sketch of Sorenson's engraving pantograph patented in 1867. (Right) A modern coordinate measuring machining (from Flickr user hyperbolation). In both devices, deflection of a probe tip is used to estimate object shape, either for transferring engravings or for recovering 3D models, respectively.

Passive Methods

Non-contact optical scanners can be categorized by the degree to which controlled illumination is required. Passive scanners do not require direct control of any illumination source, instead relying entirely on ambient light. Stereoscopic imaging is one of the most widely used passive 3D imaging systems, both in biology and engineering. Mirroring the human visual system, stereoscopy estimates the position of a 3D scene point by triangulation [LN04]; first, the 2D projection of a given point is identified in each camera. Using known calibration objects, the imaging properties of each camera are estimated, ultimately allowing a single 3D line to be drawn from each camera's center of projection through the 3D point. The intersection of these two lines is then used to recover the depth of the point.

Trinocular [VF92] and multi-view stereo [HZ04] systems have been introduced to improve the accuracy and reliability of conventional stereoscopic systems. However, all such passive triangulation methods require \emph{correspondences} to be found among the various viewpoints. Even for stereo vision, the development of matching algorithms remains an open and challenging problem in the field [SCD\∗06]. Today, real-time stereoscopic and multi-view systems are emerging, however certain challenges continue to limit their widespread adoption [MPL04]. Foremost, flat or periodic textures prevent robust matching. While machine learning methods and prior knowledge are being advanced to solve such problems, multi-view 3D scanning remains somewhat outside the domain of hobbyists primarily concerned with accurate, reliable 3D measurement.

Many alternative passive methods have been proposed to sidestep the correspondence problem, often times relying on more robust computer vision algorithms. Under controlled conditions, such as a known or constant background, the external boundaries of foreground objects can be reliably identified. As a result, numerous shape-from-silhouette algorithms have emerged. Laurentini [Lau94] considers the case of a finite number of cameras observing a scene. The visual hull is defined as the union of the generalized viewing cones defined by each camera's center of projection and the detected silhouette boundaries. Recently, free-viewpoint video [CTMS03] systems have applied this algorithm to allow dynamic adjustment of viewpoint [MBR\∗00][SH03]. Cipolla and Giblin [CG00] consider a differential formulation of the problem, reconstructing depth by observing the visual motion of occluding contours (such as silhouettes) as a camera is perturbed.

Optical imaging systems require a sufficiently large aperture so that enough light is gathered during the available exposure time [Hec01]. Correspondingly, the captured imagery will demonstrate a limited depth of field; only objects close to the plane of focus will appear in sharp contrast, with distant objects blurred together. This effect can be exploited to recover depth, by increasing the aperture diameter to further reduce the depth of field. Nayar and Nakagawa [NN94] estimate shape-from-focus, collecting a focal stack by translating a single element (either the lens, sensor, or object). A focus measure operator [Wik] is then used to identify the plane of best focus, and its corresponding distance from the camera.

Other passive imaging systems further exploit the depth of field by modifying the shape of the aperture. Such modifications are performed so that the point spread function (PSF) becomes invertible and strongly depth-dependent. Levin et al. [LFDF07] and Farid [Far97] use such coded apertures to estimate intensity and depth from defocused images. Greengard et al. [GSP06] modify the aperture to produce a PSF whose rotation is a function of scene depth. In a similar vein, shadow moir\'{e} is produced by placing a high-frequency grating between the scene and the camera. The resulting interference patterns exhibit a series of depth-dependent fringes.

While the preceding discussion focused on optical modifications for 3D reconstruction from 2D images, numerous model-based approaches have also emerged. When shape is known \emph{a priori}, then coarse image measurements can be used to infer object translation, rotation, and deformation. Such methods have been applied to human motion tracking [KM00][OSS\∗00][dAST\∗08], vehicle recognition [Sul95] [FWM98], and human-computer interaction [RWLB01]. Additionally, user-assisted model construction has been demonstrated using manual labeling of geometric primitives [Deb97].

Active Methods

Active optical scanners overcome the correspondence problem using controlled illumination. In comparison to non-contact and passive methods, active illumination is often more sensitive to surface material properties. Strongly reflective or translucent objects often violate assumptions made by active optical scanners, requiring additional measures to acquire such problematic subjects. For a detailed history of active methods, we refer the reader to the survey article by Blais [Bla04]. In this section we discuss some key milestones along the way to the scanners we consider in this course.

Many active systems attempt to solve the correspondence problem by replacing one of the cameras, in a passive stereoscopic system, with a controllable illumination source. During the 1970s, single-point laser scanning emerged. In this scheme, a series of fixed and rotating mirrors are used to raster scan a single laser spot across a surface. A digital camera records the motion of this ``flying spot''. The 2D projection of the spot defines, with appropriate calibration knowledge, a line connecting the spot and the camera's center of projection. The depth is recovered by intersecting this line with the line passing from the laser source to the spot, given by the known deflection of the mirrors. As a result, such single-point scanners can be seen as the optical equivalent of coordinate measuring machines.

Figure 1.2 Active methods for 3D scanning. (Left) Conceptual diagram of a 3D slit scanner, consisting of a mechanically translated laser stripe. (Right) A Cyberware scanner, applying laser striping for whole body scanning (from Flickr user NIOSH).

As with CMMs, single-point scanning is a painstakingly slow process. With the development of low-cost, high-quality CCD arrays in the 1980s, slit scanners emerged as a powerful alternative. In this design, a laser projector creates a single planar sheet of light. This ``slit'' is then mechanically-swept across the surface. As before, the known deflection of the laser source defines a 3D plane. The depth is recovered by the intersection of this plane with the set of lines passing through the 3D stripe on the surface and the camera's center of projection.

Effectively removing one dimension of the raster scan, slit scanners remain a popular solution for rapid shape acquisition. A variety of commercial products use swept-plane laser scanning, including the Polhemus FastSCAN [Pol], the NextEngine [Nex], the SLP 3D laser scanning probes from Laser Design [Las], and the HandyScan line of products [Cre]. While effective, slit scanners remain difficult to use if moving objects are present in the scene. In addition, because of the necessary separation between the light source and camera, certain occluded regions cannot be reconstructed. This limitation, while shared by many 3D scanners, requires multiple scans to be merged---further increasing the data acquisition time.

A digital structured light projector can be used to eliminate the mechanical motion required to translate the laser stripe across the surface. Naively, the projector could be used to display a single column (or row) of white pixels translating against a black background to replicate the performance of a slit scanner. However, a simple swept-plane sequence does not fully exploit the projector, which is typically capable of displaying arbitrary 24-bit color images. Structured lighting sequences have been developed which allow the projector-camera correspondences to be assigned in relatively few frames. In general, the identity of each plane can be encoded spatially (i.e., within a single frame) or temporally (i.e., across multiple frames), or with a combination of both spatial and temporal encodings. There are benefits and drawbacks to each strategy. For instance, purely spatial encodings allow a single static pattern to be used for reconstruction, enabling dynamic scenes to be captured. Alternatively, purely temporal encodings are more likely to benefit from redundancy, reducing reconstruction artifacts. We refer the reader to a comprehensive assessment of such codes by Salvi et al. [SPB04].

Both slit scanners and structured lighting are ill-suited for scanning dynamic scenes. In addition, due to separation of the light source and camera, certain occluded regions will not be recovered. In contrast, time-of-flight rangefinders estimate the distance to a surface from a single center of projection. These devices exploit the finite speed of light. A single pulse of light is emitted. The elapsed time, between emitting and receiving a pulse, is used to recover the object distance (since the speed of light is known). Several economical time-of-flight depth cameras are now commercially available . However, the depth resolution and accuracy of such systems (for static scenes) remain below that of slit scanners and structured lighting.

Active imaging is a broad field; a wide variety of additional schemes have been proposed, typically trading system complexity for shape accuracy. As with model-based approaches in passive imaging, several active systems achieve robust reconstruction by making certain simplifying assumptions about the topological and optical properties of the surface. Woodham [Woo89] introduces photometric stereo, allowing smooth surfaces to be recovered by observing their shading under at least three (spatially disparate) point light sources. Hern\'andez et al. [HVB∗07] further demonstrate a real-time photometric stereo system using three colored light sources. Similarly, the complex digital projector required for structured lighting can be replaced by one or more printed gratings placed next to the projector and camera. Like shadow moir\'{e}, such projection moir\'{e} systems create depth-dependent fringes. However, certain ambiguities remain in the reconstruction unless the surface is assumed to be smooth.

% survey of publication venues Active and passive 3D scanning methods continue to evolve, with recent progress reported annually at various computer graphics and vision conferences, including 3-D Digital Imaging and Modeling (3DIM), SIGGRAPH, Eurographics, CVPR, ECCV, and ICCV. Similar advances are also published in the applied optics communities, typically through various SPIE and OSA journals.

3D Scanners Studied in this Course

This course is grounded in the unifying concept of triangulation. At their core, stereoscopic imaging, slit scanning, and structured lighting all attempt to recover the shape of 3D objects in the same manner. First, the correspondence problem is solved, either by a passive matching algorithm or by an active ``space-labeling'' approach (e.g., projecting known lines, planes, or other patterns). After establishing correspondences across two or more views (e.g., between a pair of cameras or a single projector-camera pair), triangulation recovers the scene depth. In stereoscopic and multi-view systems, a point is reconstructed by intersecting two or more corresponding lines. In slit scanning and structured lighting systems, a point is recovered by intersecting corresponding lines and planes.

Figure 1.3Desktop 3D Scanners based on Laser Plane Triangulation. From left to right: MakerBot Digitizer, Matterform Photon, and NextEngine 3D Scanner HD.

To elucidate the principles of such triangulation-based scanners, this course describes how to construct a classic turntable-based slit scanner, and a structured lighting system. The course also covers methods to register and merge multiple scans, to reconstruct polygon mesh surfaces from multi-scan registered point clouds, and to optimize the reconstructed meshes for various purposes. In all 3D scanner designs, the methods used to calibrate the systems are integral part of the design, since they have to be carefully constructed to produce accurate and precise results.

We first study the slit scanner, where a laser line projector iluminates an abject, and a camera captures an image of some or all the illuminated object points. Figure 1.3 shows some commercial desktop 3D scanners based on this method. Image processing techniques are used to detect the pixels corresponding to illuminated points visible by the camera. Ray-plane triangulation equations are used to reconstruct 3D points belonging to the intersection of the plane of laser light and the object. To recover denser sets of 3D points, the laser projector has to be moved while the camera remains static with respect to the object, and the process has to be repeated until a satisfactory number of points has been reconstructed. Alternatively, the object is placed on a linear stage or a turntable, the laser projector is kept static with respect to the camera. The linear stage or turntable is iteratively moved to a new position where an image is captured by the camera. As in the first case, a large number of images must be captured to generate a dense point cloud. In both cases tracking and estimating the motion with precision is required. Computer-controlled motorized linear stages or turntables are normally used for this purpoose. In Chapter The Laser Slit Scanner we describe how to build a low cost turntable-based slit scanner.

Figure 1.4 Industrial 3D Scanners based on Structured Lighting. From left to right: Breuckmann SmartScan, ATOS CompactScan, and Geomagic Capture.

Since slit-based scanning systems are line scan systems, they require capturing and processing large numbers of images to produce dense area scans. Structured lighting systems can be used to significantly reduce the number of images (typically by two or more orders of magnitude) required to generate dense 3D scans. Figure 1.4 show some examples of commercial 3D scanners based on structured lighting. In Chapter Structured Light we describe how to build a low cost structured lighting system using a single LED pico-projector and one or more digital cameras. Many good HD USB web-cameras exist today which can be used for this purpose, but many other options exist today ranging from high end DSLRs to smartphone cameras.

By providing example data sets, open source software, and detailed implementation notes, we hope to enable beginners and hobbyists to replicate our results. We believe the process of building your own 3D scanner to complement your 3D printer will be enjoyable and instructive. Along the way, you'll likely learn a great deal about the practical use of projector-camera systems, hopefully in a manner that supports your own research.