This chapter discusses a number of mathematical issues that arise when creating 3D graphics on a computer. Of course, we cannot hope to cover the vast subject of computer graphics in any amount of detail in a single chapter. Entire books are written that merely survey the topic. This chapter is to graphics what this entire book is to interactive 3D applications: it presents an extremely brief and high level overview of the subject matter, focusing on topics for which mathematics plays a critical role. Just like the rest of this book, we try to pay special attention to those topics that, from our experience, are glossed over in other sources or are a source of confusion in beginners.
To be a bit more direct: this chapter alone is not enough to teach you how to get some pretty pictures on the screen. However, it should be used parallel with (or preceding!) some other course, book, or self-study on graphics, and we hope that it will help you breeze past a few traditional sticky points. Although we present some example snippets in High Level Shading Language (HLSL) at the end of this chapter, you will not find much else to help you figure out which DirectX or OpenGL function calls to make to achieve some desired effect. These issues are certainly of supreme practical importance, but alas, they are also in the category of knowledge that Robert Maynard Hutchins dubbed “rapidly aging facts,” and we have tried to avoid writing a book that requires an update every other year when ATI releases a new card or Microsoft a new version of DirectX. Luckily, up-to-date API references and examples abound on the Internet, which is a much more appropriate place to get that sort of thing. (API stands for application programming interface. In this chapter, API will mean the software that we use to communicate with the rendering subsystem.)
One final caveat is that since this is a book on math for video games, we will have a real-time bias. This is not to say that the book cannot be used if you are interested in learning how to write a raytracer; only that our expertise and focus is in real-time graphics.
This chapter proceeds roughly in order from ivory tower theory to down-and-dirty code snippets.
We begin our discussion of graphics by telling you how things really work, or perhaps more accurately, how they really should work, if we had enough knowledge and processing power to make things work the right way. The beginner student is to be warned that much introductory material (especially tutorials on the Internet) and API documentation suffers from a great lack of perspective. You might get the impression from reading these sources that diffuse maps, Blinn-Phong shading, and ambient occlusion are “The way images in the real world work,” when in fact you are probably reading a description of how one particular lighting model was implemented in one particular language on one particular piece of hardware through one particular API. Ultimately, any down-to-the-details tutorial must choose a lighting model, language, platform, color representation, performance goals, etc.—as we will have to do later in this chapter. (This lack of perspective is usually purposeful and warranted.) However, we think it's important to know which are the fundamental and timeless principles, and which are arbitrary choices based on approximations and trade-offs, guided by technological limitations that might by applicable only to real-time rendering, or are likely to change in the near future. So before we get too far into the details of the particular type of rendering most useful for introductory real-time graphics, we want to take our stab at describing how rendering really works.
We also hasten to add that this discussion assumes that the goal is photorealism, simulating how things work in nature. In fact, this is often not the goal, and it certainly is never the only goal. Understanding how nature works is a very important starting place, but artistic and practical factors often dictate a different strategy than just simulating nature.
We begin with the end in mind. The end goal of rendering is a bitmap, or perhaps a sequence of bitmaps if we are producing an animation. You almost certainly already know that a bitmap is a rectangular array of colors, and each grid entry is known as pixel, which is short for “picture element.” At the time we are producing the image, this bitmap is also known as the frame buffer, and often there is additional post-processing or conversion that happens when we copy the frame buffer to the final bitmap output.
How do we determine the color of each pixel? That is the fundamental question of rendering. Like so many challenges in computer science, a great place to start is by investigating how nature works.
We see light. The image that we perceive is the result of light that bounces around the environment and finally enters the eye. This process is complicated, to say the least. Not only is the physics1 of the light bouncing around very complicated, but so are the physiology of the sensing equipment in our eyes2 and the interpreting mechanisms in our minds. Thus, ignoring a great number of details and variations (as any introductory book must do), the basic question that any rendering system must answer for each pixel is “What color of light is approaching the camera from the direction corresponding to this pixel?”
There are basically two cases to consider. Either we are looking directly at a light source and light traveled directly from the light source to our eye, or (more commonly) light departed from a light source in some other direction, bounced one or more times, and then entered our eye. We can decompose the key question asked previously into two tasks. This book calls these two tasks the rendering algorithm, although these two highly abstracted procedures obviously conceal a great deal of complexity about the actual algorithms used in practice to implement it.
The rendering algorithmAt this point it appears that we have made some gross simplifications, and many of you no doubt are raising your metaphorical hands to ask “What about translucency?” “What about reflections?” “What about refraction?” “What about atmospheric effects?” Please hold all questions until the end of the presentation.
The first step in the rendering algorithm is known as visible surface determination. There are two common solutions to this problem. The first is known as raytracing. Rather than following light rays in the direction that they travel from the emissive surfaces, we trace the rays backward, so that we can deal only with the light rays that matter: the ones that enter our eye from the given direction. We send a ray out from the eye in the direction through the center of each pixel3 to see the first object in the scene this ray strikes. Then we compute the color that is being emitted or reflected from that surface back in the direction of the ray. A highly simplified summary of this algorithm is illustrated by Listing 10.1.
for (each x,y screen pixel) { // Select a ray for this pixel Ray ray = getRayForPixel(x,y); // Intersect the ray against the geometry. This will // not just return the point of intersection, but also // a surface normal and some other information needed // to shade the point, such as an object reference, // material information, local S,T coordinates, etc. // Don't take this pseudocode too literally. Vector3 pos, normal; Object *obj; Material *mtl; if (rayIntersectScene(ray, pos, normal, obj, mtl)) { // Shade the intersection point. (What light is // emitted/reflected from this point towards the camera?) Color c = shadePoint(ray, pos, normal, obj, mtl); // Put it into the frame buffer writeFrameBuffer(x,y, c); } else { // Ray missed the entire scene. Just use a generic // background color at this pixel writeFrameBuffer(x,y, backgroundColor); } }
The other major strategy for visible surface determination, the one used for real-time rendering at the time of this writing, is known as depth buffering. The basic plan is that at each pixel we store not only a color value, but also a depth value. This depth buffer value records the distance from the eye to the surface that is reflecting or emitting the light used to determine the color for that pixel. As illustrated in Listing 10.1, the “outer loop” of a raytracer is the screen-space pixels, but in real-time graphics, the “outer loop” is the geometric elements that make up the surface of the scene.
The different methods for describing surfaces are not important here. What is important is that we can project the surface onto screen-space and map them to screen-space pixels through a process known as rasterization. For each pixel of the surface, known as the source fragment, we compute the depth of the surface at that pixel and compare it to the existing value in the depth buffer, sometimes known as the destination fragment. If the source fragment we are currently rendering is farther away from the camera than the existing value in the buffer, then whatever we rendered before this is obscuring the surface we are now rendering (at least at this one pixel), and we move on to the next pixel. However, if our depth value is closer than the existing value in the depth buffer, then we know this is the closest surface to the eye (at least of those rendered so far) and so we update the depth buffer with this new, closer depth value. At this point we might also proceed to step 2 of the rendering algorithm (at least for this pixel) and update the frame buffer with the color of the light being emitted or reflected from the surface that point. This is known as forward rendering, and the basic idea is illustrated by Listing 10.2.
// Clear the frame and depth buffers fillFrameBuffer(backgroundColor); fillDepthBuffer(infinity); // Outer loop iterates over all the primitives (usually triangles) for (each geometric primitive) { // Rasterize the primitive for (each pixel x,y in the projection of the primitive) { // Test the depth buffer, to see if a closer pixel has // already been written. float primDepth = getDepthOfPrimitiveAtPixel(x,y); if (primDepth > readDepthBuffer(x,y)) { // Pixel of this primitive is obscured, discard it continue; } // Determine primitive color at this pixel. Color c = getColorOfPrimitiveAtPixel(x,y); // Update the color and depth buffers writeFrameBuffer(x,y, c); writeDepthBuffer(x,y, primDepth); } }
Opposed to forward rendering is deferred rendering, an old technique that is becoming popular again due to the current location of bottlenecks in the types of images we are producing and the hardware we are using to produce them. A deferred renderer uses, in addition to the frame buffer and the depth buffer, additional buffers, collectively known as the G-buffer (short for “geometry” buffer), which holds extra information about the surface closest to the eye at that location, such as the 3D location of the surface, the surface normal, and material properties needed for lighting calculations, such as the “color” of the object and how “shiny” it is at that particular location. (Later, we see how those intuitive terms in quotes are a bit too vague for rendering purposes.) Compared to a forward renderer, a deferred renderer follows our two-step rendering algorithm a bit more literally. First we “render” the scene into the G-buffer, essentially performing only visibility determination—fetching the material properties of the point that is “seen” by each pixel but not yet performing lighting calculations. The second pass actually performs the lighting calculations. Listing 10.3 explains deferred rendering in pseudocode.
// Clear the geometry and depth buffers clearGeometryBuffer(); fillDepthBuffer(infinity); // Rasterize all primitives into the G-buffer for (each geometric primitive) { for (each pixel x,y in the projection of the primitive) { // Test the depth buffer, to see if a closer pixel has // already been written. float primDepth = getDepthOfPrimitiveAtPixel(x,y); if (primDepth > readDepthBuffer(x,y)) { // Pixel of this primitive is obscured, discard it continue; } // Fetch information needed for shading in the next pass. MaterialInfo mtlInfo; Vector3 pos, normal; getPrimitiveShadingInfo(mtlInfo, pos, normal); // Save it off into the G-buffer and depth buffer writeGeometryBuffer(x,y, mtlInfo, pos, normal); writeDepthBuffer(x,y, primDepth); } } // Now perform shading in a 2nd pass, in screen space for (each x,y screen pixel) { if (readDepthBuffer(x,y) == infinity) { // No geometry here. Just write a background color writeFrameBuffer(x,y, backgroundColor); } else { // Fetch shading info back from the geometry buffer MaterialInfo mtlInfo; Vector3 pos, normal; readGeometryBuffer(x,y, mtlInfo, pos, normal); // Shade the point Color c = shadePoint(pos, normal, mtlInfo); // Put it into the frame buffer writeFrameBuffer(x,y, c); } }Pseudocode for deferred rendering using the depth buffer
Before moving on, we must mention one important point about why deferred rendering is popular. When multiple light sources illuminate the same surface point, hardware limitations or performance factors may prevent us from computing the final color of a pixel in a single calculation, as was shown in the pseudocode listings for both forward and deferred rendering. Instead, we must using multiple passes, one pass for each light, and accumulate the reflected light from each light source into the frame buffer. In forward rendering, these extra passes involve rerendering the primitives. Under deferred rendering, however, extra passes are in image space, and thus depend on the 2D size of the light in screen space, not on the complexity of the scene! It is in this situation that deferred rendering really begins to have large performance advantages over forward rendering.
Now let's talk about the second step in the rendering algorithm: lighting. Once we have located the surface closest to the eye, we must determine the amount of light emitted directly from that surface, or emitted from some other source and reflected off the surface in the direction of the eye. The light directly transmitted from a surface to the eye—for example, when looking directly at a light bulb or the sun—is the simplest case. These emissive surfaces are a small minority in most scenes; most surfaces do not emit their own light, but rather they only reflect light that was emitted from somewhere else. We will focus the bulk of our attention on the nonemissive surfaces.
Although we often speak informally about the “color” of an object, we know that the perceived color of an object is actually the light that is entering our eye, and thus can depend on many different factors. Important questions to ask are: What colors of light are incident on the surface, and from what directions? From which direction are we viewing the surface? How “shiny” is the object?4 So a description of a surface suitable for use in rendering doesn't answer the question “What color is this surface?” This question is sometimes meaningless—what color is a mirror, for example? Instead, the salient question is a bit more complicated, and it goes something like, “When light of a given color strikes the surface from a given incident direction, how much of that light is reflected in some other particular direction?” The answer to this question is given by the bidirectional reflectance distribution function, or BRDF for short. So rather than “What color is the object?” we ask, “What is the distribution of reflected light?”
Symbolically, we write the BRDF as the function
Although we are particularly interested in the incident directions that come from emissive surfaces and the outgoing directions that point towards our eye, in general, the entire distribution is relevant. First of all, lights, eyes, and surfaces can move around, so in the context of creating a surface description (for example, “red leather”), we don't know which directions will be important. But even in a particular scene with all the surfaces, lights, and eyes fixed, light can bounce around multiple times, so we need to measure light reflections for arbitrary pairs of directions.
Before moving on, it's highly instructive to see how the two intuitive material properties that
were earlier disparaged, color and shininess, can be expressed precisely in the framework of a
BRDF. Consider a green ball. A green object is green and not blue because it reflects incident
light that is green more strongly than incident light of any other color.6 For example, perhaps green
light is almost all reflected, with only a small fraction absorbed, while 95%of the blue and
red light is absorbed and only 5%of light at those wavelengths is reflected in various
directions. White light actually consists of all the different colors of light, so a green object
essentially filters out colors other than green. If a different object responded to green and red
light in the same manner as our green ball, but absorbed 50%of the blue light and reflected the
other 50%, we might perceive the object as teal. Or if most of the light at all wavelengths was
absorbed, except for a small amount of green light, then we would perceive it as a dark shade of
green. To summarize, a BRDF accounts for the difference in color between two objects through the
dependence on
Next, consider the difference between shiny red plastic and diffuse red construction paper. A
shiny surface reflects incident light much more strongly in one particular direction compared to
others, whereas a diffuse surface scatters light more evenly across all outgoing directions. A
perfect reflector, such as a mirror, would reflect all the light from one incoming direction in a
single outgoing direction, whereas a perfectly diffuse surface would reflect light equally in all
outgoing directions, regardless of the direction of incidence. In summary, a BRDF accounts for
the difference in “shininess” of two objects through its dependence on
More complicated phenomena can be expressed by generalizing the BRDF. Translucence and light
refraction can be easily incorporated by allowing the direction vectors to point back into the
surface. We might call this mathematical generalization a bidirectional surface scattering
distribution function (BSSDF). Sometimes light strikes an object, bounces around inside of it,
and then exits at a different point. This phenomenon is known as
subsurface scattering and
is an important aspect of the appearances of many common substances, such as skin and milk. This
requires splitting the single reflection point
By the way, there are certain criteria that a BRDF must satisfy in order to be physically plausible. First, it doesn't make sense for a negative amount of light to be reflected in any direction. Second, it's not possible for the total reflected light to be more than the light that was incident, although the surface may absorb some energy so the reflected light can be less than the incident light. This rule is usually called the normalization constraint. A final, less obvious principle obeyed by physical surfaces is Helmholtz reciprocity: if we pick two arbitrary directions, the same fraction of light should be reflected, no matter which is the incident direction and which is the outgoing direction. In other words,
Helmholtz reciprocityDue to Helmholtz reciprocity, some authors don't label the two directions in the BRDF as “in” and “out” because to be physically plausible the computation must be symmetric.
The BRDF contains the complete description of an object's appearance at a given point, since it describes how the surface will reflect light at that point. Clearly, a great deal of thought must be put into the design of this function. Numerous lighting models have been proposed over the last several decades, and what is surprising is that one of the earliest models, Blinn-Phong, is still in widespread use in real-time graphics today. Although it is not physically accurate (nor plausible: it violates the normalization constraint), we study it because it is a good educational stepping stone and an important bit of graphics history. Actually, describing Blinn-Phong as “history” is wishful thinking—perhaps the most important reason to study this model is that it still is in such widespread use! In fact, it's the best example of the phenomena we mentioned at the start of this chapter: particular methods being presented as if they are “the way graphics work.”
Different lighting models have different goals. Some are better at simulating rough surfaces, others at surfaces with multiple strata. Some focus on providing intuitive “dials” for artists to control, without concern for whether those dials have any physical significance at all. Others are based on taking real-world surfaces and measuring them with special cameras called goniophotometers, essentially sampling the BRDF and then using interpolation to reconstruct the function from the tabulated data. The notable Blinn-Phong model discussed in Section 10.6 is useful because it is simple, inexpensive, and well understood by artists. Consult the sources in the suggested reading for a survey of lighting models.
Graphics is all about measuring light, and you should be aware of some important subtleties, even though we won't have time to go into complete detail here. The first is how to measure the color of light, and the second is how to measure its brightness.
In your middle school science classes you might have learned that every color of light is some mixture of red, green, and blue (RGB) light. This is the popular conception of light, but it's not quite correct. Light can take on any single frequency in the visible band, or it might be a combination of any number of frequencies. Color is a phenomena of human perception and is not quite the same thing as frequency. Indeed different combinations of frequencies of light can be perceived as the same color—these are known as metamers. The infinite combinations of frequencies of light are sort of like all the different chords that can be played on a piano (and also tones between the keys). In this metaphor our color perception is unable to pick out all the different individual notes, but instead, any given chord sounds to us like some combination of middle C, F, and G. Three color channels is not a magic number as far as physics is concerned, it's peculiar to human vision. Most other mammals have only two different types of receptors (we would call them “color blind”), and fish, reptiles, and birds have four types of color receptors (they would call us color blind).
However, even very advanced rendering systems project the continuous spectrum of visible light onto some discrete basis, most commonly, the RGB basis. This is a ubiquitous simplification, but we still wanted to let you know that it is a simplification, as it doesn't account for certain phenomena. The RGB basis is not the only color space, nor is it necessarily the best one for many purposes, but it is a very convenient basis because it is the one used by most display devices. In turn, the reason that this basis is used by so many display devices is due to the similarity to our own visual system. Hall [11] does a good job of describing the shortcomings of the RGB system.
Since the visible portion of the electromagnetic spectrum is continuous, an expression such as
You should also be aware of the different ways that we can measure the intensity of light. If we take a viewpoint from physics, we consider light as energy in the form of electromagnetic radiation, and we use units of measurement from the field of radiometry. The most basic quantity is radiant energy, which in the SI system is measured in the standard unit of energy, the joule (J). Just like any other type of energy, we are often interested in the rate of energy flow per unit time, which is known as power. In the SI system power is measured using the watt (W), which is one joule per second (1 W = 1 J/s). Power in the form of electromagnetic radiation is called radiant power or radiant flux. The term “flux,” which comes from the Latin fluxus for “flow,” refers to some quantity flowing across some cross-sectional area. Thus, radiant flux measures the total amount of energy that is arriving, leaving, or flowing across some area per unit time.
Imagine that a certain amount of radiant flux is emitted from a
Several equivalent terms exist for radiosity. First, note that we can measure the flux density (or total flux, for that matter) across any cross-sectional area. We might be measuring the radiant power emitted from some surface with a finite area, or the surface through which the light flows might be an imaginary boundary that exists only mathematically (for example, the surface of some imaginary sphere that surrounds a light source). Although in all cases we are measuring flux density, and thus the term “radiosity” is perfectly valid, we might also use more specific terms, depending on whether the light being measured is coming or going. If the area is a surface and the light is arriving on the surface, then the term irradiance is used. If light is being emitted from a surface, the term radiant exitance or radiant emittance is used. In digital image synthesis, the word “radiosity” is most often used to refer to light that is leaving a surface, having been either reflected or emitted.
When we are talking about the brightness at a particular point, we cannot use plain old radiant power because the area of that point is infinitesimal (essentially zero). We can speak of the flux density at a single point, but to measure flux, we need a finite area over which to measure. For a surface of finite area, if we have a single number that characterizes the total for the entire surface area, it will be measured in flux, but to capture the fact that different locations within that area might be brighter than others, we use a function that varies over the surface that will measure the flux density.
Now we are ready to consider what is perhaps the most central quantity we need to measure in graphics: the intensity of a “ray” of light. We can see why the radiosity is not the unit for the job by an extension of the ideas from the previous paragraph. Imagine a surface point surrounded by an emissive dome and receiving a certain amount of irradiance coming from all directions in the hemisphere centered on the local surface normal. Now imagine a second surface point experiencing the same amount of irradiance, only all of the illumination is coming from a single direction, in a very thin beam. Intuitively, we can see that a ray along this beam is somehow “brighter” than any one ray that is illuminating the first surface point. The irradiance is somehow “denser.” It is denser per unit solid area.
The idea of a solid area is probably new to some readers, but we can easily understand the idea
by comparing it to angles in the plane. A “regular” angle is measured (in radians) based on
the length of its projection onto the unit circle. In the same way, a solid angle measures the
area as projected onto the unit sphere surrounding the point. The SI unit for solid angle
is the
steradian, abbreviated “sr.” The complete sphere has
By measuring the radiance per unit solid angle, we can express the intensity of light at a certain point as a function that varies based upon the direction of incidence. We are very close to having the unit of measurement that describes the intensity of a ray. There is just one slight catch, illustrated by Figure 10.1, which is a close-up of a very thin pencil of light rays striking a surface. On the top, the rays strike the surface perpendicularly, and on the bottom, light rays of the same strength strike a different surface at an angle. The key point is that the area of the top surface is smaller than the area of the bottom surface; therefore, the irradiance on the top surface is larger than the irradiance on the bottom surface, despite the fact that the two surfaces are being illuminated by the “same number” of identical light rays. This basic phenomenon, that the angle of the surface causes incident light rays to be spread out and thus contribute less irradiance, is known as Lambert's law. We have more to say about Lambert's law in Section 10.6.3, but for now, the key idea is that the contribution of a bundle of light to the irradiance at a surface depends on the angle of that surface.
Due to Lambert's law, the unit we use in graphics to measure the strength of a ray, radiance, is defined as the radiant flux per unit projected area, per unit solid angle. To measure a projected area, we take the actual surface area and project it onto the plane perpendicular to the ray. (In Figure 10.1, imagine taking the bottom surface and projecting it upwards onto the top surface). Essentially this counteracts Lambert's law.
Table 10.1 summarizes the most important radiometric terms.
Quantity | Units | SI unit | Rough translation |
Radiant energy | Energy |
|
Total illumination duringan interval of time |
Radiant flux | Power | Brightness of a finite areafrom all directions | |
Radiant flux density | Power per unit area | Brightness of a single pointfrom all directions | |
Irradiance | Power per unit area | Radiant flux density ofincident light | |
Radiant exitance | Power per unit area | Radiant flux density ofemitted light | |
Radiosity | Power per unit area | Radiant flux density ofemitted or reflected light | |
Radiance | Power per unit projected area, perunit solid angle | Brightness of a ray |
Whereas radiometry takes the perspective of physics by measuring the raw energy of the light, the field of photometry weighs that same light using the human eye. For each of the corresponding radiometric terms, there is a similar term from photometry (Table 10.2). The only real difference is a nonlinear conversion from raw energy to perceived brightness.
Radiometric term | Photometric term | SI Photometric unit |
Radiant energy | Luminous energy | talbot, or lumen second ( |
Radiant flux | Luminous flux, luminous power | lumen ( |
Irradiance | Illuminance | lux ( |
Radiant exitance | Luminous emittance | lux ( |
Radiance | Luminance |
Throughout the remainder of this chapter, we try to use the proper radiometric units when possible. However, the practical realities of graphics make using proper units confusing, for two particular reasons. It is common in graphics to need to take some integral over a “signal”—for example, the color of some surface. In practice we cannot do the integral analytically, and so we must integrate numerically, which boils down to taking a weighted average of many samples. Although mathematically we are taking a weighted average (which ordinarily would not cause the units to change), in fact what we are doing is integrating, and that means each sample is really being multiplied by some differential quantity, such as a differential area or differential solid angle, which causes the physical units to change. A second cause of confusion is that, although many signals have a finite nonzero domain in the real world, they are represented in a computer by signals that are nonzero at a single point. (Mathematically, we say that the signal is a multiple of a Direc delta; see Section 12.4.3.) For example, a real-world light source has a finite area, and we would be interested in the radiance of the light at a given point on the emissive surface, in a given direction. In practice, we imagine shrinking the area of this light down to zero while holding the radiant flux constant. The flux density becomes infinite in theory. Thus, for a real area light we would need a signal to describe the flux density, whereas for a point light, the flux density becomes infinite and we instead describe the brightness of the light by its total flux. We'll repeat this information when we talk about point lights.
Now let's fit the BRDF into the rendering algorithm. In step 2 of our rendering algorithm
(Section 10.1), we're trying to determine the radiance leaving a particular
surface in the direction of our eye. The only way this can happen is for light to arrive from
some direction onto the surface and get reflected in our direction. With the BRDF, we now have a
way to measure this. Consider all the potential directions that light might be incident upon the
surface, which form a hemisphere centered on
As fundamental as Equation (10.1) may be, its development is relatively recent, having been published in SIGGRAPH in 1986 by Kajiya [13]. Furthermore, it was the result of, rather than the cause of, numerous strategies for producing realistic images. Graphics researchers pursued the creation of images through different techniques that seemed to make sense to them before having a framework to describe the problem they were trying to solve. And for many years after that, most of us in the video game industry were unaware that the problem we were trying to solve had finally been clearly defined. (Many still are.)
Now let's convert this equation into English and see what the heck it means. First of all,
notice that
The term
On the right-hand side, we have a sum. The first term in the sum
Now let's dig into that intimidating integral. (By the way, if you haven't had calculus and haven't read Chapter 11 yet, just replace the word “integral” with “sum,” and you won't miss any of the main point of this section.) We've actually already discussed how it works when we talked about the BRDF, but let's repeat it with different words. We might rewrite the integral as
Note that symbol
Now all that is left is to dissect the integrand. It's a product of three factors:
The first factor denotes the radiance incident from the direction of
In purely mathematical terms, the rendering equation is an integral equation: it states a
relationship between some unknown function
To render a scene realistically, we must solve the rendering equation, which requires us to know (in theory) not only the radiance arriving at the camera, but also the entire distribution of radiance in the scene in every direction at every point. Clearly, this is too much to ask for a finite, digital computer, since both the set of surface locations and the set of potential incident/exiting directions are infinite. The real art in creating software for digital image synthesis is to allocate the limited processor time and memory most efficiently, to make the best possible approximation.
The simple rendering pipeline we present in Section 10.10 accounts only for direct light. It doesn't account for indirect light that bounced off of one surface and arrived at another. In other words, it only does “one recursion level” in the rendering equation. A huge component of realistic images is accounting for the indirect light—solving the rendering equation more completely. The various methods for accomplishing this are known as global illumination techniques.
This concludes our high-level presentation of how graphics works. Although we admit we have not yet presented a single practical idea, we believe it's very important to understand what you are trying to approximate before you start to approximate it. Even though the compromises we are forced to make for the sake of real-time are quite severe, the available computing power is growing. A video game programmer whose only exposure to graphics has been OpenGL tutorials or demos made by video card manufacturers or books that focused exclusively on real-time rendering will have a much more difficult time understanding even the global illumination techniques of today, much less those of tomorrow.
Before we render a scene, we must pick a camera and a window. That is, we must decide where to render it from (the view position, orientation, and zoom) and where to render it to (the rectangle on the screen). The output window is the simpler of the two, and so we will discuss it first.
Section 10.2.1 describes how to specify the output window. Section 10.2.2 discusses the pixel aspect ratio. Section 10.2.3 introduces the view frustum. Section 10.2.4 describes field of view angles and zoom.
We don't have to render our image to the entire screen. For example, in split-screen multiplayer games, each player is given a portion of the screen. The output window refers to the portion of the output device where our image will be rendered. This is shown in Figure 10.2.
The position of the window is specified by the coordinates of the upper left-hand pixel
With that said, it is important to realize that we do not necessarily have to be rendering to the screen at all. We could be rendering into a buffer to be saved into a .TGA file or as a frame in an .AVI, or we may be rendering into a texture as a subprocess of the “main” render, to produce a shadow map, or a reflection, or the image on a monitor in the virtual world. For these reasons, the term render target is often used to refer to the current destination of rendering output.
Regardless of whether we are rendering to the screen or an off-screen buffer, we must know the aspect ratio of the pixels, which is the ratio of a pixel's height to its width. This ratio is often 1:1—that is, we have “square” pixels—but this is not always the case! We give some examples below, but it is common for this assumption to go unquestioned and become the source of complicated kludges applied in the wrong place, to fix up stretched or squashed images.
The formula for computing the aspect ratio is
Computing the pixel aspect ratio
The notation
But, as mentioned already, we often deal with square pixels with an aspect ratio of 1:1. For
example, on a desktop monitor with a physical width:height ratio of 4:3, some common resolutions
resulting in square pixel ratios are
Notice that nowhere in these calculations is the size or location of the window used; the location and size of the rendering window has no bearing on the physical proportions of a pixel. However, the size of the window will become important when we discuss field of view in Section 10.2.4, and the position is important when we map from camera space to screen space Section 10.3.5.
At this point, some readers may be wondering how this discussion makes sense in the context of
rendering to a bitmap, where the word “physical” implied by the variable names
The view frustum is the volume of space that is potentially visible to the camera. It is shaped like a pyramid with the tip snipped off. An example of a view frustum is shown in Figure 10.3.
The view frustum is bounded by six planes, known as the clip planes. The first four of
the planes form the sides of the pyramid and are called the top, left, bottom, and right planes,
for obvious reasons. They correspond to the sides of the output window. The near and far clip
planes, which correspond to certain camera-space values of
The reason for the far clip plane is perhaps easier to understand. It prevents rendering of
objects beyond a certain distance. There are two practical reasons why a far clip plane is
needed. The first is relatively easy to understand: a far clip plane can limit the number of
objects that need to be rendered in an outdoor environment. The second reason is slightly more
complicated, but essentially it has to do with how the depth buffer values are assigned. As an
example, if the
depth buffer entries are 16-bit fixed point, then the largest depth value that can
be stored is 65,535. The far clip establishes what (floating point)
Notice that each of the clipping planes are planes, with emphasis on the fact that they extend infinitely. The view volume is the intersection of the six half-spaces defined by the clip planes.
A camera has position and orientation, just like any other object in the world. However, it also has an additional property known as field of view. Another term you probably know is zoom. Intuitively, you already know what it means to “zoom in” and “zoom out.” When you zoom in, the object you are looking at appears bigger on screen, and when you zoom out, the apparent size of the object is smaller. Let's see if we can develop this intuition into a more precise definition.
The field of view (FOV) is the angle that is intercepted by the view frustum. We actually need two angles: a horizontal field of view, and a vertical field of view. Let's drop back to 2D briefly and consider just one of these angles. Figure 10.4 shows the view frustum from above, illustrating precisely the angle that the horizontal field of view measures. The labeling of the axes is illustrative of camera space, which is discussed in Section 10.3.
Zoom measures the ratio of the apparent size of the object relative to a
Zoom can be interpreted geometrically as shown in Figure 10.5. Using some basic trig, we can derive the conversion between zoom and field of view:
Converting between zoom and field of viewNotice the inverse relationship between zoom and field of view. As zoom gets larger, the field of view gets smaller, causing the view frustum to narrow. It might not seem intuitive at first, but when the view frustum gets more narrow, the perceived size of visible objects increases.
Field of view is a convenient measurement for humans to use, but as we discover in Section 10.3.4, zoom is the measurement that we need to feed into the graphics pipeline.
We need two different field of view angles (or zoom values), one horizontal and one vertical. We are certainly free to choose any two arbitrary values we fancy, but if we do not maintain a proper relationship between these values, then the rendered image will appear stretched. If you've ever watched a movie intended for the wide screen that was simply squashed anamorphically to fit on a regular TV, or watched content with a 4:3 aspect on a 16:9 TV in “full”8 mode, then you have seen this effect.
In order to maintain proper proportions, the zoom values must be inversely proportional to the physical dimensions of the output window:
The usual relationship between vertical and horizontal zoom
The variable
In this formula,
Many rendering packages allow you to specify only one field of view angle (or zoom value). When you do this, they automatically compute the other value for you, assuming you want uniform display proportions. For example, you may specify the horizontal field of view, and they compute the vertical field of view for you.
Now that we know how to describe zoom in a manner suitable for consumption by a computer, what do we do with these zoom values? They go into the clip matrix, which is described in Section 10.3.4.
The discussion so far has centered on perspective projection, which is the most commonly used type of projection, since that's how our eyes perceive the world. However, in many situations orthographic projection is also useful. We introduced orthographic projection in Section 5.3; to briefly review, in orthographic projection, the lines of projection (the lines that connect all the points in space that project onto the same screen coordinates) are parallel, rather than intersecting at a single point. There is no perspective foreshortening in orthographic projection; an object will appear the same size on the screen no matter how far away it is, and moving the camera forward or backward along the viewing direction has no apparent effect so long as the objects remain in front of the near clip plane.
Figure 10.6 shows a scene rendered from the same position and orientation, comparing perspective and orthographic projection. On the left, notice that with perspective projection, parallel lines do not remain parallel, and the closer grid squares are larger than the ones in the distance. Under orthographic projection, the grid squares are all the same size and the grid lines remain parallel.
![]() |
![]() |
Perspective projection | Orthographic projection |
Orthographic views are very useful for “schematic” views and other situations where distances and angles need to be measured precisely. Every modeling tool will support such a view. In a video game, you might use an orthographic view to render a map or some other HUD element.
For an orthographic projection, it makes no sense to speak of the “field of view” as an angle,
since the view frustum is shaped like a box, not a pyramid. Rather than defining the
The zoom value has a different meaning in orthographic projection compared to perspective. It is related to the physical size of the frustum box:
Converting between zoom and frustum size in orthographic projection
As with perspective projections, there are two different zoom values, one for
This section reviews several important coordinate spaces related to 3D viewing. Unfortunately, terminology is not consistent in the literature on the subject, even though the concepts are. Here, we discuss the coordinate spaces in the order they are encountered as geometry flows through the graphics pipeline.
The geometry of an object is initially described in object space, which is a coordinate space local to the object being described (see Section 3.2.2). The information described usually consists of vertex positions and surface normals. Object space is also known as local space and, especially in the context of graphics, model space.
From model space, the vertices are transformed into world space (see Section 3.2.1). The transformation from modeling space to world space is often called the model transform. Typically, lighting for the scene is specified in world space, although, as we see in Section 10.11, it doesn't really matter what coordinate space is used to perform the lighting calculations provided that the geometry and the lights can be expressed in the same space.
From world space, vertices are transformed by the view transform into camera space (see Section 3.2.3), also known as eye space and view space (not to be confused with canonical view volume space, discussed later). Camera space is a 3D coordinate space in which the origin is at the center of projection, one is axis parallel to the direction the camera is facing (perpendicular to the projection plane), one axis is the intersection of the top and bottom clip planes, and the other axis is the intersection of the left and right clip planes. If we assume the perspective of the camera, then one axis will be “horizontal” and one will be “vertical.”
In a left-handed world, the most common convention is to point
From camera space, vertices are transformed once again into clip space, also known as the canonical view volume space. The matrix that transforms vertices from camera space into clip space is called the clip matrix, also known as the projection matrix.
Up until now, our vertex positions have been “pure” 3D vectors—that is, they only had three
coordinates, or if they have a fourth coordinate, then
Recall from Section 6.4.1 that a 4D homogeneous vector is mapped
to the corresponding physical 3D vector by dividing by
The first goal of the clip matrix is to get the correct value into
If this was the only purpose of the clip matrix, to place the correct value into
Multiplying a vector of the form
At this point, many readers might very reasonably ask two questions. The first question might be,
“Why is this so complicated? This seems like a lot of work to accomplish what basically amounts
to just dividing by
The second question a reader might have is, “What happened to
To understand why
The vertical line on the left side of each diagram represents the film (or, for modern cameras,
the sensing element), which lies in the infinite plane of projection. Importantly, notice that
the film is the same height in each diagram. As we increase
Now let's look at what happens inside a computer. The “film” inside a computer is the rectangular portion of the projection plane that intersects the view frustum.9 Notice that if we increase the focal distance, the size of the projected image increases, just like it did in a real camera. However, inside a computer, the film actually increases by this same proportion, rather than the view frustum changing in size. Because the projected image and the film increase by the same proportion, there is no change to the rendered image or the apparent size of objects within thisimage.
In summary, zoom is always accomplished by changing the shape of the view frustum, whether we're
talking about a real camera or inside a computer. In a real camera, changing the focal length
changes the shape of the view frustum because the film stays the same size. However, in a
computer, adjusting the focal distance
Some software allow the user to specify the field of view by giving a focal length measured in millimeters. These numbers are in reference to some standard film size, almost always 35 mm film.
What about orthographic projection? In this case, we do not want to divide by
The next section fills in the rest of the clip matrix. But for now, the key point is that a
perspective projection matrix will always have a right-hand column of
Notice that multiplying the entire matrix by a constant factor doesn't have
any effect on the projected values
The second goal of the clip matrix is to scale the
So the points inside the view volume satisfy
Any geometry that does not satisfy these equalities must be clipped to the view frustum. Clipping is discussed in Section 10.10.4.
To stretch things to put the top, left, right, and bottom clip planes in place, we scale the
Let
This clip matrix assumes a coordinate system with
Under these DirectX-style conventions, the points inside the view frustum satisfy the inequality
We can easily tell that the two matrices in
Equations (10.6) and
(10.7) are perspective projection
matrices because the right-hand column is
What about orthographic projection? The first and second columns of the projection matrix don't
change, and we know the fourth column will become
Alternatively, if we are using a DirectX-style range for the clip space
In this book, we prefer a left-handed convention and row vectors on the left, and all the
projection matrices so far assume those conventions. However, both of these choices differ from
the OpenGL convention, and we know that many readers may be working in environments that are
similar to OpenGL. Since this can be very confusing, let's repeat these matrices, but with the
right-handed, column-vector OpenGL conventions. We'll only discuss the
It's instructive to consider how to convert these matrices from one set of conventions to the
other. Because OpenGL uses column vectors, the first thing we need to do is transpose our
matrix. Second, the right-handed conventions have
The above procedure results in the following perspective projection matrix
Clip matrix for perspective projection assuming OpenGL conventionsand the orthographic projection matrix is
Clip matrix for orthographic projection assuming OpenGL conventions
So, for OpenGL conventions, you can tell whether a projection matrix is perspective or
orthographic based on the bottom row. It will be
Now that we know a bit about clip space, we can understand the need for the near clip plane. Obviously, there is a singularity precisely at the origin, where a perspective projection is not defined. (This corresponds to a perspective division by zero.) In practice, this singularity would be extremely rare, and however we wanted to handle it—say, by arbitrarily projecting the point to the center of the screen—would be OK, since putting the camera directly in a polygon isn't often needed in practice.
But projecting polygons onto pixels isn't the only issue. Allowing for arbitrarily small (but
positive) values of
Once we have clipped the geometry to the view frustum, it is projected into screen space, which corresponds to actual pixels in the frame buffer. Remember that we are rendering into an output window that does not necessarily occupy the entire display device. However, we usually want our screen-space coordinates to be specified using coordinates that are absolute to the rendering device (Figure 10.9).
Screen space is a 2D space, of course. Thus, we must project the points from clip space to
screen space to generate the correct 2D coordinates. The first thing that happens is the
standard homogeneous division by
A quick comment is warranted about the negation of the
Speaking of
An alternative strategy, known as
The
On modern graphics APIs at the time of this writing, the conversion of vertex coordinates from clip space to screen space is done for you. Your vertex shader outputs coordinates in clip space. The API clips the triangles to the view frustum and then projects the coordinates to screen space. But that doesn't mean that you will never use the equations in this section in your code. Quite often, we need to perform these calculations in software for visibility testing, level-of-detail selection, and so forth.
Figure 10.10 summarizes the coordinate spaces and matrices discussed in this section, showing the data flow from object space to screen space.
The coordinate spaces we've mentioned are the most important and common ones, but other coordinate spaces are used in computer graphics. For example, a projected light might have its own space, which is essentially the same as camera space, only it is from the perspective that the light “looks” onto the scene. This space is important when the light projects an image (sometimes called a gobo) and also for shadow mapping to determine whether a light can “see” a given point.
Another space that has become very important is tangent space, which is a local space on the surface of an object. One basis vector is the surface normal and the other two basis vectors are locally tangent to the surface, essentially establishing a 2D coordinate space that is “flat” on the surface at that spot. There are many different ways we could determine these basis vectors, but by far the most common reason to establish such a coordinate space is for bump mapping and related techniques. A more complete discussion of tangent space will need to wait until after we discuss texture mapping in Section 10.5, so we'll come back to this subject in Section 10.9.1. Tangent space is also sometimes called surface-localspace.
To render a scene, we need a mathematical description of the geometry in that scene. Several different methods are available to us. This section focuses on the one most important for real-time rendering: the triangle mesh. But first, let's mention a few alternatives to get some context. Constructive solid geometry (CSG) is a system for describing an object's shape using Boolean operators (union, intersection, subtraction) on primitives. Within video games, CSG can be especially useful for rapid prototyping tools, with the Unreal engine being a notable example. Another technique that works by modeling volumes rather than their surfaces is metaballs, sometimes used to model organic shapes and fluids, as was discussed in Section 9.1. CSG, metaballs, and other volumetric descriptions are very useful in particular realms, but for rendering (especially real-time rendering) we are interested in a description of the surface of the object, and seldom need to determine whether a given point is inside or outside this surface. Indeed, the surface need not be closed or even define a coherent volume.
The most common surface description is the polygon mesh, of which you are probably already aware. In certain circumstances, it's useful to allow the polygons that form the surface of the object to have an arbitrary number of vertices; this is often the case in importing and editing tools. For real-time rendering, however, modern hardware is optimized for triangle meshes, which are polygon meshes in which every polygon is a triangle. Any given polygon mesh can be converted into an equivalent triangle mesh by decomposing each polygon into triangles individually, as was discussed briefly in Section 9.7.3. Please note that many important concepts introduced in the context of a single triangle or polygon were covered in Section 9.6 and Section 9.7, respectively. Here, our focus is on how more than one triangle can be connected in a mesh.
One very straightforward way to store a triangle mesh would be to use an array of triangles, as shown in Listing 10.4.
struct Triangle { Vector3 vertPos[3]; // vertex positions }; struct TriangleMesh { int triCount; // number of triangles Triangle *triList; // array of triangles };
For some applications this trivial representation might be adequate. However, the term “mesh” implies a degree of connectivity between adjacent triangles, and this connectivity is not expressed in our trivial representation. There are three basic types of information in a triangle mesh:
A variety of methods exist to represent this information efficiently, depending on the operations to be performed most often on the mesh. Here we will focus on a standard storage format known as an indexed triangle mesh.
An indexed triangle mesh consists of two lists: a list of vertices, and a list of triangles.
Listing 10.5 shows a highly simplified example of how an indexed triangle mesh might be stored in C.
// struct Vertex is the information we store at the vertex level struct Vertex { // 3D position of the vertex Vector3 pos; // Other information could include // texture mapping coordinates, a // surface normal, lighting values, etc. }; // struct Triangle is the information we store at the triangle level struct Triangle { // Indices into the vertex list. In practice, 16-bit indices are // almost always used rather than 32-bit, to save memory and bandwidth. int vertexIndex[3]; // Other information could include // a normal, material information, etc }; // struct TriangleMesh stores an indexed triangle mesh struct TriangleMesh { // The vertices int vertexCount; Vertex *vertexList; // The triangles int triangleCount; Triangle *triangleList; };
Figure 10.11 shows how a cube and a pyramid might be represented as a polygon mesh or a triangle mesh. Note that both objects are part of a single mesh with 13 vertices. The lighter, thicker wires show the outlines of polygons, and the thinner, dark green wires show one way to add edges to triangulate the polygon mesh.
Assuming the origin is on the “ground” directly between the two objects, the vertex coordinates might be as shown in Table 10.3.
0 | 4 | 8 | 12 | ||||
1 | 5 | 9 | |||||
2 | 6 | 10 | |||||
3 | 7 | 11 |
Table 10.4 shows the vertex indices that would form faces of this mesh, either as a polygon mesh or as a triangle mesh. Remember that the order of the vertices is significant; they are listed in clockwise order when viewed from the outside. You should study these figures until you are sure you understand them.
Vertex indices | Vertex indices | |
Description | (Polygon mesh) | (Triangle mesh) |
Cube top | ||
Cube front | ||
Cube right | ||
Cube left | ||
Cube back | ||
Cube bottom | ||
Pyramid front | ||
Pyramid left | ||
Pyramid right | ||
Pyramid back | ||
Pyramid bottom |
The vertices must be listed in clockwise order around a face, but it doesn't matter which one is
considered the “first” vertex; they can be cycled without changing the logical structure of the
mesh. For example, the quad forming the cube top could equivalently have been given as
As indicated by the comments in Listing 10.5, additional data are almost always stored per vertex, such as texture coordinates, surface normals, basis vectors, colors, skinning data, and so on. Each of these is discussed in later sections in the context of the techniques that make use of the data. Additional data can also be stored at the triangle level, such as an index that tells which material to use for that face, or the plane equation (part of which is the surface normal—see Section 9.5) for the face. This is highly useful for editing purposes or in other tools that perform mesh manipulations in software. For real-time rendering, however, we seldom store data at the triangle level beyond the three vertex indices. In fact, the most common method is to not have a struct Triangle at all, and to represent the entire list of triangles simply as an array (e.g. unsigned short triList[] ), where the length of the array is the number of triangles times 3. Triangles with identical properties are grouped into batches so that an entire batch can be fed to the GPU in this optimal format. After we review many of the concepts that give rise to the need to store additional data per vertex, Section 10.10.2 looks at several more specific examples of how we might feed that data to the graphics API. By the way, as a general rule, things are a lot easier if you do not try to use the same mesh class for both rendering and editing. The requirements are very different, and a bulkier data structure with more flexibility is best for use in tools, importers, and the like.
Note that in an indexed triangle mesh, the edges are not stored explicitly, but rather the adjacency information contained in an indexed triangle list is stored implicitly: to locate shared edges between triangles, we must search the triangle list. Our original trivial “array of triangles” format in Listing 10.4 did not have any logical connectivity information (although we could have attempted to detect whether the vertices on an edge were identical by comparing the vertex positions or other properties). What's surprising is that the “extra” connectivity information contained in the indexed representation actually results in a reduction of memory usage in most cases, compared to the flat method. The reason for this is that the information stored at the vertex level, which is duplicated in the trivial flat format, is relatively large compared to a single integer index. (At a minimum, we must store a 3D vector position.) In meshes that arise in practice, a typical vertex has a valence of around 3–6, which means that the flat format duplicates quite a lot of data.
The simple indexed triangle mesh scheme is appropriate for many applications, including the very important one of rendering. However, some operations on triangle meshes require a more advanced data structure in order to be implemented more efficiently. The basic problem is that the adjacency between triangles is not expressed explicitly and must be extracted by searching the triangle list. Other representation techniques exist that make this information available in constant time. One idea is to maintain an edge list explicitly. Each edge is defined by listing the two vertices on the ends. We also maintain a list of triangles that share the edge. Then the triangles can be viewed as a list of three edges rather than a list of three vertices, so they are stored as three indices into the edge list rather than the vertex list. An extension of this idea is known as the winged-edge model [7], which also stores, for each vertex, a reference to one edge that uses the vertex. The edges and triangles may be traversed intelligently to quickly locate all edges and triangles that use the vertex.
Surface normals are used for several different purposes in graphics; for example, to compute proper lighting (Section 10.6), and for backface culling (Section 10.10.5). In general, a surface normal is a unit10 vector that is perpendicular to a surface. We might be interested in the normal of a given face, in which case the surface of interest is the plane that contains the face. The surface normals for polygons can be computed easily by using the techniques from Section 9.5.
Vertex-level normals are a bit trickier. First, it should be noted that, strictly speaking, there is not a true surface normal at a vertex (or an edge for that matter), since these locations mark discontinuities in the surface of the polygon mesh. Rather, for rendering purposes, we typically interpret a polygon mesh as an approximation to some smooth surface. So we don't want a normal to the piecewise linear surface defined by the polygon mesh; rather, we want (an approximation of) the surface normal of the smooth surface.
The primary purpose of vertex normals is lighting. Practically every lighting model takes a surface normal at the spot being lit as an input. Indeed, the surface normal is part of the rendering equation itself (in the Lambert factor), so it is always an input, even if the BRDF does not depend on it. We have normals available only at the vertices, but yet we need to compute lighting values over the entire surface. What to do? If hardware resources permit (as they usually do nowadays), then we can approximate the normal of the continuous surface corresponding to any point on a given face by interpolating vertex normals and renormalizing the result. This technique is illustrated in Figure 10.12, which shows a cross section of a cylinder (black circle) that is being approximated by a hexagonal prism (blue outline). Black normals at the vertices are the true surface normals, whereas the interior normals are being approximated through interpolation. (The actual normals used would be the result of stretching these out to unit length.)
Once we have a normal at a given point, we can perform the full lighting equation per pixel. This is known as per-pixel shading.11 An alternative strategy to per-pixel shading, known as Gouraud12 shading [9], is to perform lighting calculations only at the vertex level, and then interpolate the results themselves, rather than the normal, across the face. This requires less computation, and is still done on some systems, such as the Nintendo Wii.
Figure 10.13 shows per-pixel lighting of cylinders with a different number of sides. Although the illusion breaks down on the ends of the cylinder, where the silhouette edge gives away the low-poly nature of the geometry, this method of approximating a smooth surface can indeed make even a very low-resolution mesh look “smooth.” Cover up the ends of the cylinder, and even the 5-sided cylinder is remarkably convincing.
Now that we understand how normals are interpolated in order to approximately reconstruct a curved surface, let's talk about how to obtain vertex normals. This information may not be readily available, depending on how the triangle mesh was generated. If the mesh is generated procedurally, for example, from a parametric curved surface, then the vertex normals can be supplied at that time. Or you may simply be handed the vertex normals from the modeling package as part of the mesh. However, sometimes the surface normals are not provided, and we must approximate them by interpreting the only information available to us: the vertex positions and the triangles. One trick that works is to average the normals of the adjacent triangles, and then renormalize the result. This classic technique is demonstrated in Listing 10.6.
struct Vertex { Vector3 pos; Vector3 normal; }; struct Triangle { int vertexIndex[3]; Vector3 normal; }; struct TriangleMesh { int vertexCount; Vertex *vertexList; int triangleCount; Triangle *triangleList; void computeVertexNormals() { // First clear out the vertex normals for (int i = 0 ; i < vertexCount ; ++i) { vertexList[i].normal.zero(); } // Now add in the face normals into the // normals of the adjacent vertices for (int i = 0 ; i < triangleCount ; ++i) { // Get shortcut Triangle &tri = triangleList[i]; // Compute triangle normal. Vector3 v0 = vertexList[tri.vertexIndex[0]].pos; Vector3 v1 = vertexList[tri.vertexIndex[1]].pos; Vector3 v2 = vertexList[tri.vertexIndex[2]].pos; tri.normal = cross(v1-v0, v2-v1); tri.normal.normalize(); // Sum it into the adjacent vertices for (int j = 0 ; j < 3 ; ++j) { vertexList[tri.vertexIndex[j]].normal += tri.normal; } } // Finally, average and normalize the results. // Note that this can blow up if a vertex is isolated // (not used by any triangles), and in some other cases. for (int i = 0 ; i < vertexCount ; ++i) { vertexList[i].normal.normalize(); } } };
Averaging face normals to compute vertex normals is a tried-and-true technique that works well in most cases. However, there are a few things to watch out for. The first is that sometimes the mesh is supposed to have a discontinuity, and if we're not careful, this discontinuity will get “smoothed out.” Take the very simple example of a box. There should be a sharp lighting discontinuity at its edges. However, if we use vertex normals computed from the average of the surface normals, then there is no lighting discontinuity, as shown in Figure 10.14.
|
|
The basic problem is that the surface discontinuity at the box edges cannot be properly represented because there is only one normal stored per vertex. The solution to this problem is to “detach” the faces; in other words, duplicate the vertices along the edge where there is a true geometric discontinuity, creating a topological discontinuity to prevent the vertex normals from being averaged. After doing so, the faces are no longer logically connected, but this seam in the topology of the mesh doesn't cause a problem for many important tasks, such as rendering and raytracing. Table 10.5 shows a smoothed box mesh with eight vertices. Compare that mesh to the one in Table 10.6, in which the faces have been detached, resulting in 24 vertices.
|
|
An extreme version of this situation occurs when two faces are placed back-to-back. Such infinitely thin double-sided geometry can arise with foliage, cloth, billboards, and the like. In this case, since the normals are exactly opposite, averaging them produces the zero vector, which cannot be normalized. The simplest solution is to detach the faces so that the vertex normals will not average together. Or if the front and back sides are mirror images, the two “single-sided” polygons can be replaced by one “double-sided” one. This requires special treatment during rendering to disable backface culling (Section 10.10.5) and intelligently dealing with the normal in the lighting equation.
A more subtle problem is that the averaging is biased towards large numbers of triangles with the same normal. For example, consider the vertex at index 1 in Figure 10.11. This vertex is adjacent to two triangles on the top of the cube, but only one triangle on the right side and one triangle on the back side. The vertex normal computed by averaging the triangle normals is biased because the top face normal essentially gets twice as many “votes” as each of the side face normals. But this topology is the result of an arbitrary decision as to where to draw the edges to triangulate the faces of the cube. For example, if we were to triangulate the top face by drawing an edge between vertices 0 and 2 (this is known as “turning” the edge), all of the normals on the top face wouldchange.
Techniques exist to deal with this problem, such as weighing the contribution from each adjacent face based on the interior angle adjacent to the vertex, but it's often ignored in practice. Most of the really terrible examples are contrived ones like this, where the faces should be detached anyway. Furthermore, the normals are an approximation to begin with, and having a slightly perturbed normal is often difficult to tell visually.
Although some modeling packages can deliver vertex normals for you, fewer provide the basis vectors needed for bump mapping. As we see in Section 10.9, techniques used to synthesize vertex basis vectors are similar to those described here.
Before we go on, there is one very important fact about surface normals that we must mention. In certain circumstances, they cannot be transformed by the same matrix that is used to transform positions. (This is an entirely separate issue from the fact that normals should not be translated like positions.) The reason for this is that normals are covariant vectors. “Regular” vectors, such as position and velocity, are said to be contravariant: if we scale the coordinate space used to describe the vector, the coordinates will respond in the opposite direction. If we use a coordinate space with a larger scale (for example, using meters instead of feet) the coordinates of a contravariant vector respond to the contrary, by becoming smaller. Notice that this is all about scale; translation and rotation are not part of the discussion. Normals and other types of gradients, known as dual vectors, do not behave like this.
Imagine that we stretch a 2D object, such as a circle, horizontally, as shown in Figure 10.15. Notice that the normals (shown in light blue in the right figure) begin to turn to point more vertically—the horizontal coordinates of the normals are decreasing in absolute value while the horizontal coordinates of the positions are increasing. A stretching of the object (object getting bigger while coordinate space stays the same) has the same effect as scaling down the coordinate space while holding the object at the same size. The coordinates of the normal change in the same direction as the scale of the coordinate space, which is why they are called covariant vectors.
To properly transform surface normals, we must use the inverse transpose of the matrix
used to transform positions; that is, the result of transposing and inverting the matrix. This is
sometimes denoted
There is much more to the appearance of an object than its shape. Different objects are different colors and have different patterns on their surface. One simple yet powerful way to capture these qualities is through texture mapping. A texture map is a bitmap image that is “pasted” to the surface of an object. Rather than controlling the color of an object per triangle or per vertex, with texture mapping we can control the color at a much finer level—per texel. (A texel is a single pixel in a texture map. This is a handy word to know, since in graphics contexts, there are lots of different bitmaps being accessed, and it's nice to have a short way to differentiate between a pixel in the frame buffer and a pixel in a texture.)
So a texture map is just a regular bitmap that is applied onto the surface of a model. Exactly
how does this work? Actually, there are many different ways to apply a texture map
onto a mesh.
Planar mapping projects the texture orthographically onto the mesh. Spherical,
cylindrical, and cubic mapping are various methods of “wrapping” the texture
around the object. The details of each of these techniques are not important to us at the moment,
since modeling packages such as 3DS Max deal with these user interface issues. The key
idea is that, at each point on the surface of the mesh, we can obtain texture-mapping
coordinates, which define the 2D location in the texture map that corresponds to this 3D
location. Traditionally, these coordinates are assigned the variables
Although bitmaps come in different sizes, UV coordinates are normalized such that the mapping
space ranges from 0 to 1 over the entire width (
In principle, it doesn't matter how we determine the UV coordinates for a given point on the surface. However, even when UV coordinates are calculated dynamically, rather than edited by an artist, we typically compute or assign UV coordinates only at the vertex level, and the UV coordinates at an arbitrary interior position on a face are obtained through interpolation. If you imagine the texture map as a stretchy cloth, then when we assign texture-mapping coordinates to a vertex, it's like sticking a pin through the cloth at those UV coordinates, and then pinning the cloth onto the surface at that vertex. There is one pin per vertex, so the whole surface is covered.
Let's look at some examples. Figure 10.17 shows a single texture-mapped quad, with different UV values assigned to the vertices. The bottom of each diagram shows the UV space of the texture. You should study these examples until you are sure you understand them.
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
UV coordinates outside of the range
Repeat | Clamp |
![]() |
![]() |
There are other options supported on some hardware, such as mirror, which is similar to repeat
except that every other tile is mirrored. (This can be beneficial because it guarantees that no
“seam” will exist between adjacent tiles.) On most hardware, the addressing mode can be set
for the
Figure 10.19 shows one last instructive example: the same mesh is texture mapped two different ways.
![]() |
![]() |
![]() |
![]() |
In the rendering equation, the BRDF describes the scattering distribution for light of a given frequency and direction of incidence. The differences in distributions between different surfaces is precisely what causes those surfaces (or even different surface points on the same object) to look different from one another. Most BRDFs are expressed in a computer by some sort of formula, where certain numbers in the formula are adjusted to match the desired material properties. The formula itself is often called a lighting model, and the particular values going into the formula come from the material assigned to the surface. It is common for a game engine to use only a handful of lighting models, even though the materials in the scene may be quite diverse and there may be thousands of different BRDFs. Indeed, just a few years ago, almost all real-time rendering was done with a single lighting model. In fact, the practice is not uncommon today.
This lighting model was so ubiquitous that it was hardwired into the very rendering APIs of OpenGL and DirectX. Although these older parts of the API have effectively become legacy features on hardware with programmable shaders, the standard model is still commonly used in the more general framework of shaders and generic constants and interpolants. The great diversity and flexibility available is usually used to determine the best way to feed the parameters into the model (for example, by doing multiple lights at once, or doing all the lighting at the end with deferred shading), rather than using different models. But even ignoring programmable shaders, at the time of this writing, the most popular video game console is the Nintendo Wii,13 which has hardwired support for this standard model.
The venerable standard lighting model is the subject of this section. Since its development precedes the framework of the BRDF and the rendering equation by at least a decade, we first present this model in the simplified context that surrounded its creation. This notation and perspective are still predominant in the literature today, which is why we think we should present the idea in its own terms. Along the way, we show how one component of the model (the diffuse component) is modeled as a BRDF. The standard model is important in the present, but you must understand the rendering equation if you want to be prepared for the future.
Bui Tuong Phong [21] introduced the basic concepts behind the standard lighting
model in 1975. Back then, the focus was on a fast way to model direct reflection. While
certainly researchers understood the importance of indirect light, it was a luxury that could not
yet be afforded. Thus while the rendering equation (which, as we noted previously, came into
focus a decade or so after the proposal of the standard model) is an equation for the radiance
outgoing from a point in any particular direction, the only outgoing direction that mattered in
those days were the directions that pointed to the eye. Similarly, while the rendering equation
considers incident light from the entire hemisphere surrounding the surface normal, if we ignore
indirect light, then we need not cast about in all incident directions. We need to consider only
those directions that aim at a light source. We examine some different ways that light sources
are modeled in real-time graphics in more detail in Section 10.7, but for now an
important point is that the light sources are not emissive surfaces in the scene, as they
are in the rendering equation and in the real world. Instead, lights are special entities without
any corresponding geometry, and are simulated as if the light were emitting from a single point.
Thus, rather than including a solid angle of directions corresponding to the projection of the
emissive surface of each light source onto the hemisphere surrounding
Now for the model. The basic idea is to classify light coming into the eye into four distinct categories, each of which has a unique method for calculating its contribution. The four categories are
The letter
The emissive term is the same as in the rendering equation, so there's not much more detail to
say about it. In practice, the emissive contribution is simply a constant color at any given
surface point
The specular component of the standard lighting model accounts for the light that is reflected (mostly) in a “perfect mirror bounce” off the surface. The specular component is what gives surfaces a “shiny” appearance. Rougher surfaces tend to scatter the light in a much broader pattern of directions, which is modeled by the diffuse component described in Section 10.6.3.
Now let's see how the standard model calculates the specular contribution. The important vectors are labeled in Figure 10.20.
For convenience, we assume that all of these vectors are unit vectors. Our convention in this book is to denote unit vectors with hats, but we'll drop the hats to avoid decorating the equations excessively. Many texts on the subject use these standard variable names and, especially in the video game community, they are effectively part of the vernacular. It is not uncommon for job interview questions to be posed in such a way that assumes the applicant is familiar with this framework.
One note about the
Of the four vectors, the first three are inherent degrees of freedom of the problem, and the
reflection vector
As you can see, the reflection vector can be computed by
Computing the reflection vector is a popular job interview question
There are many interviewers for whom this equation is a favorite topic, which is why we have
displayed it on a line by itself, despite the fact that it would have fit perfectly fine inline
in the paragraph. A reader seeking a job in the video game industry is advised to fully digest
Figure 10.21, to be able to produce Equation (10.10)
under pressure. Notice that if we assume
Now that we know
In this formula and elsewhere in this book, the symbol
First, let's consider
Like all of the material properties that are input to the lighting equation, the value for
Another value in Equation (10.11) related to “shininess” is the material's
specular color, denoted
The light specular color, denoted
Figure 10.22 shows how different values of
Blinn [4] popularized a slight modification to the Phong model that produces
very similar visual results, but at the time was a significant optimization. In many cases, it is
still faster to compute today, but beware that vector operations (which are reduced with this
model) are not always the performance bottleneck. The basic idea is this: if the distance to the
viewer is large relative to the size of an object, then
Then, rather than using the angle between
The formula for the Blinn model is quite similar to the original Phong model. Only the dot product portion is changed.
The Blinn model can be faster to implement in hardware than the Phong model, if the viewer and
light source are far enough away from the object to be considered a constant, since then
One detail we have omitted is that in either model,
The next component in the standard lighting model is the diffuse component. Like the specular component, the diffuse component also models light that traveled directly from the light source to the shading point. However, whereas specular light accounts for light that reflects preferentially in a particular direction, diffuse light models light that is reflected randomly in all directions due to the rough nature of the surface material. Figure 10.24 compares how rays of light reflect on a perfectly reflective surface and on a rough surface.
To compute specular lighting, we needed to know the location of the viewer, to see how close the
eye is to the direction of the perfect mirror bounce. For diffuse lighting, in contrast, the
location of the viewer is not relevant, since the reflections are scattered randomly, and
no matter where we position the camera, it is equally likely that a ray will be sent our way.
However, the direction if incidence
Notice that, in both cases, the perpendicular distance between the rays is the same. (Due to an optical illusion in the diagram, the rays on the right may appear to be farther apart, but they are not.) So, the perpendicular distance between the rays is the same, but notice that on the right side of Figure 10.25, they strike the object at points that are farther apart. The surface on the left receives nine light rays, and the surface on the right receives only six, even though the “area” of both surfaces is the same. Thus the number of photons per unit area15 is higher on the left, and it will appear brighter, all other factors being equal. This same phenomenon is responsible for the fact that the climate near the equator is warmer than near the poles. Since Earth is round, the light from the sun strikes Earth at a more perpendicular angle near the equator.
Diffuse lighting obeys Lambert's law: the intensity of the reflected light is proportional to the cosine of the angle between the surface normal and the rays of light. We will compute this cosine with the dot product.
As before,
Just as with specular lighting, we must prevent the dot product from going negative by clamping it to zero. This prevents objects from being lit from behind.
It's very instructive to see how diffuse surfaces are implemented in the framework of the rendering equation.
Note the similarity of Equation (10.12) with the contents of the integral from the rendering equation,
The first factor is the incident light color. The material color
Specular and diffuse lighting both account for light rays that travel directly from the light source to the surface of the object, “bounce” one time, and then arrive in the eye. However, in the real world, light often bounces off one or more intermediate objects before hitting an object and reflecting to the eye. When you open the refrigerator door in the middle of the night, the entire kitchen will get just a bit brighter, even though the refrigerator door blocks most of the direct light.
To model light that is reflected more than one time before it enters the eye, we can use a very crude approximation known as “ambient light.” The ambient portion of the lighting equation depends only on the properties of the material and an ambient lighting value, which is often a global value used for the entire scene. None of the light sources are involved in the computation. (In fact, a light source is not even necessary.) Equation (10.13) is used to compute the ambient component:
Ambient contribution to the lighting equationThe factor
Sometimes a ray of light travels directly from the light source to the eye, without striking any surface in between. The standard lighting equation accounts for such rays by assigning a material an emissive color. For example, when we render the surface of a light bulb, this surface will probably appear very bright, even if there are no other light sources in the scene, because the light bulb is emitting light.
In many situations, the emissive contribution doesn't depend on environmental factors; it is simply the emissive color of the material:
The emissive contribution depends only on the material
Most surfaces don't emit light, so their emissive component is
It's important to understand that in real-time graphics, a self-illuminated surface does not light the other surfaces—you need a light source for that. In other words, we don't actually render light sources, we only render the effects that those light sources have on the surfaces in the scene. We do render self-illuminated surfaces, but those surfaces don't interact with the other surfaces in the scene. When using the rendering equation properly, however, emissive surfaces do light up their surroundings.
We may choose to attenuate the emissive contribution due to atmospheric conditions, such as fog, and of course there may be performance reasons to have objects fade out and disappear in the distance. However, as explained in Section 10.7.2, in general the emissive contribution should not be attenuated due to distance in the same way that light sources are.
We have discussed the individual components of the lighting equation in detail. Now it's time to give the complete equation for the standard lighting model.
The standard lighting equation for one light sourceFigure \caption@xref fig:graphics_lighting_equation_head on input line 3591 shows what the ambient, diffuse, and specular lighting components actually look like in isolation from the others. (We are ignoring the emissive component, assuming that this particular floating head doesn't emit light.) There are several interesting points to be noted:
Speaking of multiple lights, how do multiple light sources work with the lighting equation? We
must sum up the lighting values for all the lights. To simplify the notation, we'll go ahead and
make the almost universal assumption that
Since there is only one ambient light value and one emissive component for any given surface, these components are not summed per light source.
Nowadays we have the freedom of programmable shaders and can choose any lighting model we wish. Since the standard model has some fairly serious shortcomings, you might very well ask, “Why learn about this ancient history?” First, it isn't exactly ancient history; it is alive and well. The reasons that made it a good compromise between realism, usability, and speed of calculation still apply. Yes, we have more processing power; but we also want to render more pixels and more lights, and it currently is very common for the standard lighting model to be the winner when programmers are deciding whether to spend the cycles on more pixels (higher resolution) or more accurate pixels (a more accurate lighting model). Second, the current local lighting model is one that content creators can understand and use. This advantage is not to be taken lightly. Artists have decades of experience with diffuse and specular maps. Switching to a lighting model that replaces those familiar inputs with different ones such as “metalness” (from Strauss's model [26]) for which artists do not have an intuitive grasp is a big price to pay. A final reason to learn the standard lighting model is because many newer models bear similarities to the standard model, and you cannot know when to use more advanced lighting models without understanding the old standard.
If you have read the OpenGL or DirectX documentation for setting material parameters, you are forgiven for thinking that ambient, diffuse, and specular are “how light works” (remember our warning at the beginning of this chapter) as opposed being arbitrary practical constructs peculiar to a particular lighting model. The dichotomy between diffuse and specular is not an inherent physical reality; rather, it arose (and continues to be used) due to practical considerations. These are descriptive terms for two extreme scattering patterns, and by taking arbitrary combinations of these two patterns, many phenomena are able to be approximated to a decent degree.
Because of the near unanimous adoption of this model, it is often used without giving it a name, and in fact there is still some confusion as to exactly what to call it. You might call it the Phong lighting model, because Phong introduced the basic idea of modeling reflection as the sum of diffuse and specular contributions, and also provided a useful empirically based calculation for specular reflection. (The Lambert model for diffuse reflection was already known.) We saw that Blinn's computation for specular reflection is similar but sometimes faster. Because this is the specific calculation most often used, perhaps we should call it the Blinn model? But Blinn's name is also attached to a different microfacet model in which diffuse and specular are at different ends of a continuous spectrum, rather than independent “orthogonal” components being mixed together. Since most implementations use Blinn's optimization for Phong's basic idea, the name Blinn-Phong is the one most often used for this model, and that's the name we use.
A huge part of realistic lighting is, of course, realistic shadows. Although the techniques for producing shadows are interesting and important, alas we will not have time to discuss them here. In the theory of the rendering equation, shadows are accounted for when we determine the radiance incident in a given direction. If a light (more accurately, an emissive surface) exists in a particular direction, and the point can “see” that surface, then its light will be incident upon the point. If, however, there is some other surface that obscures the light source when looking in that direction, then the point is in shadow with respect to that light source. More generally, shadows can be cast not just due to the light from emissive surfaces; the light bouncing off reflective surfaces can cause shadows. In all cases, shadows are an issue of light visibility, not reflectance model.
Finally, we would like to mention several important physical phenomena not properly captured by the Blinn-Phong model. The first is Fresnel16 reflectance, which predicts that the reflectance of nonmetals is strongest when the light is incident at a glancing angle, and least when incident from the normal angle. Some surfaces, such as velvet, exhibit retroreflection; you might guess this means that the surface looks like Madonna's earrings, but it actually means that the primary direction of reflection is not the “mirror bounce” as predicted by Blinn-Phong, but rather back towards the light source. Finally, Blinn-Phong is isotropic, which means that if we rotate the surface while keeping the viewer and light source stationary, the reflectance will not change. Some surfaces have anisotropic reflection, due to grooves or other patterns in the surface. This means that the strength of the reflection varies, based on the direction of incidence relative to the direction of the grooves, which is sometimes called the scratch direction. Classic examples of anisotropic materials are brushed metal, hair, and those little Christmas ornaments made of shiny fibers.
On modern shader-based hardware, lighting calculations are usually done on a per-pixel basis. By this we mean that for each pixel, we determine a surface normal (whether by interpolating the vertex normal across the face or by fetching it from a bump map), and then we perform the full lighting equation using this surface normal. This is per-pixel lighting, and the technique of interpolating vertex normals across the face is sometimes called Phong shading, not to be confused with the Phong calculation for specular reflection. The alternative to Phong shading is to perform the lighting equation less frequently (per face, or per vertex). These two techniques are known as flat shading and Gouraud shading, respectively. Flat shading is almost never used in practice except in software rendering. This is because most modern methods of sending geometry efficiently to the hardware do not provide any face-level data whatsoever. Gouraud shading, in contrast, still has some limited use on some platforms. Some important general principles can be gleaned from studying these methods, so let's examine their results.
When using flat shading, we compute a single lighting value for the entire triangle. Usually the “position” used in lighting computations is the centroid of the triangle, and the surface normal is the normal of the triangle. As you can see in Figure 10.27, when an object is lit using flat shading, the faceted nature of the object becomes painfully apparent, and any illusion of smoothness is lost.
Gouraud shading, also known as vertex shading, vertex lighting, or interpolated shading, is a trick whereby values for lighting, fog, and so forth are computed at the vertex level. These values are then linearly interpolated across the face of the polygon. Figure 10.28 shows the same teapot rendered with Gouraud shading.
As you can see, Gouraud shading does a relatively good job at restoring the smooth nature of the object. When the values being approximated are basically linear across the triangle, then, of course, the linear interpolation used by Gouraud shading works well. Gouraud shading breaks down when the values are not linear, as in the case of specular highlights.
Compare the specular highlights in the Gouraud shaded teapot with the highlights in a Phong (per-pixel) shaded teapot, shown in Figure 10.29. Notice how much smoother the highlights are. Except for the silhouette and areas of extreme geometric discontinuities, such as the handle and spout, the illusion of smoothness is very convincing. With Gouraud shading, the individual facets are detectable due to the specular highlights.
The basic problem with interpolated shading is that no value in the middle of the triangle can be larger than the largest value at a vertex; highlights can occur only at a vertex. Sufficient tessellation can overcome this problem. Despite its limitations, Gouraud shading is still in use on some limited hardware, such as hand-held platforms and the Nintendo Wii.
One question that you should be asking is how the lighting can be computed at the vertex level if
any maps are used to control inputs to the lighting equation. We can't use the lighting equation
as given in Equation (10.14) directly. Most notably, the
diffuse color
Finally, we make the very reasonable assumption that
With the lighting equation in the format of Equation (10.15),
we can see how to use interpolated lighting values computed at the vertex level. At each vertex,
we will compute two values:
Each of these values is computed per vertex and interpolated across the face of the triangle. Then, per pixel, the light contributions are multiplied by the corresponding material colors and summed:
Shading pixels using interpolated lighting values
As mentioned earlier,
What coordinate space should be used for lighting computations? We could perform the lighting computations in world space. Vertex positions and normals would be transformed into world space, lighting would be performed, and then the vertex positions would be transformed into clip space. Or we may transform the lights into modeling space, and perform lighting computations in modeling space. Since there are usually fewer lights than there are vertices, this results in fewer overall vector-matrix multiplications. A third possibility is to perform the lighting computations in camera space.
In the rendering equation, light sources produce their effect when we factor in the emissive component of a surface. As mentioned earlier, in real-time graphics, doing this “properly” with emissive surfaces is usually a luxury we cannot afford. Even in offline situations where it can be afforded, we might have reasons to just emit light out of nowhere, to make it easier to get control of the look of the scene for dramatic lighting, or to simulate the light that would be reflecting from a surface for which we're not wasting time to model geometry since it's off camera. Thus we usually have light sources that are abstract entities within the rendering framework with no surface geometry to call their own. This section discusses some of the most common types of light sources.
Section 10.7.1 covers the classic point, directional, and spot lights. Section 10.7.2 considers how light attenuates in the real world and how deviations from this reality are common for practical reasons. The next two sections move away from the theoretically pure territory and into the messy domain of ad-hoc lighting techniques in use in real-time graphics today. Section 10.7.3 presents the subject of Doom-style volumetric lights. Finally, Section 10.7.4 discusses how lighting calculations can be done offline and then used at runtime, especially for the purpose of incorporating indirect lighting effects.
This section lists some of the most basic light types that are supported by most rendering systems, even older or limited platforms, such as the OpenGL and DirectX fixed-function lighting pipelines or the Nintendo Wii. Of course, systems with programmable shaders often use these light types, too. Even when completely different methods, such as spherical harmonics, are used at runtime, standard light types are usually used as an offline editing interface.
A point light source represents light that emanates from a single point outward in all directions. Point lights are also called omni lights(short for “omnidirectional”) or spherical lights. A point light has aposition and color, which controls not only the hue of the light, but also itsintensity. Figure 10.30 shows how 3DS Max represents point lightsvisually.
As Figure 10.30 illustrates, a point light may have a falloff radius, which controls the size of the sphere that is illuminated by the light. The intensity of the light usually decreases the farther away we are from the center of the light. Although not realistic, it is desirable for many reasons that the intensity drop to zero at the falloff distance, so that the volume of the effect of the light can be bounded. Section 10.7.2 compares real-world attenuation with the simplified models commonly used. Point lights can be used to represent many common light sources, such as light bulbs, lamps, fires, and so forth.
A spot light is used to represent light from a specific location in a specific direction. These are used for lights such as flashlights, headlights, and of course, spot lights! A spot light has a position and an orientation, and optionally a falloff distance. The shape of the lit area is either a cone or a pyramid.
A conical spot light has a circular “bottom.” The width of the cone is defined by a falloff angle (not to be confused with the falloff distance). Also, there is an inner angle that measures the size of the hotspot. A conical spot light is shown in Figure 10.31.
A rectangular spot light forms a pyramid rather than a cone. Rectangular spot lights are especially interesting because they are used to project an image. For example, imagine walking in front of a movie screen while a movie is being shown. This projected image goes by many names, including projected light map, gobo, and even cookie.17 The term gobo originated from the world of theater, where it refers to a mask or filter placed over a spot light used to create a colored light or special effect, and it's the term we use in this book. Gobos are very useful for faking shadows and other lighting effects. If conical spot lights are not directly supported, they can be implemented with an appropriately designed circular gobo.
A directional light represents light emanating from a point in space sufficiently far away that all the rays of light involved in lighting the scene (or at least the object we are currently considering) can be considered as parallel. The sun and moon are the most obvious examples of directional lights, and certainly we wouldn't try to specify the actual position of the sun in world space in order to properly light the scene. Thus directional lights usually do not have a position, at least as far as lighting calculations are concerned, and they usually do not attenuate. For editing purposes, however, it's often useful to create a “box” of directional light that can be moved around and placed strategically, and we might include additional attenuation factors to cause the light to drop off at the edge of the box. Directional lights are sometimes called parallel lights. We might also use a gobo on a directional light, in which case the projection of the image is orthographic rather than perspective, as it is with rectangular spot lights.
As we've said, in the rendering equation and in the real world, lights are emissive surfaces with
finite surface areas. Abstract light types do not have any surface area, and thus require
special handling during integration. Typically in a Monte Carlo integrator, a sample is
specifically chosen to be in the direction of the light source, and the multiplication by
While the light types discussed so far are the classic ones supported by fixed-function real-time pipelines, we certainly are free to define light volumes in any way we find useful. The volumetric lights discussed in Section 10.7.3 are an alternative system that is flexible and also amenable to real-time rendering. Warn [27] and Barzel [3] discuss more flexible systems for shaping lights in greater detail.
Light attenuates with distance. That is, objects receive less illumination from a light as the distance between the light and the object increases. In the real world, the intensity of a light is inversely proportional to the square of the distance between the light and the object, as
Real-world light attenuation
where
Let's pause here to discuss a finer point: the perceived brightness of an object (or light
source) does not decrease with increased distance from the viewer, ignoring atmospheric
effects. As a light or object recedes from the viewer, the irradiance on our eye decreases for
the reasons just described. However, perceived brightness is related to radiance, not
irradiance. Remember that radiance measures power per unit projected area per unit solid
angle, and as the object recedes from view, the decrease in irradiance is compensated for by the
decrease in solid angle subtended by the object. It's particularly educational to understand how
the rendering equation naturally accounts for light attenuation. Inside the integral, for each
direction on the hemisphere surrounding the shading point
In practice, Equation (10.16) can be unwieldy for two reasons.
First, the light intensity theoretically increases to infinity at
Instead of the real-world model, a simpler model based on falloff distance is often used.
Section 10.7 mentioned that the falloff distance controls the distance beyond which
the light has no effect. It's common to use a simple linear interpolation formula such that the
light gradually fades with the distance
As Equation (10.17) shows, there are actually two distances used to control
the attenuation. Within
Distance attenuation can be applied to point and spot lights; directional lights are usually not attenuated. An additional attenuation factor is used for spot lights. Hotspot falloff attenuates light as we move closer to the edge of the cone.
In the theoretical framework of the rendering equation as well as HLSL shaders doing lighting
equations using the standard Blinn-Phong model, all that is required of a light source for it to
be used in shading calculations at a particular point
The most creative aspect of Doom-style volumetric lights is how they determine the
intensity at a given point. It is controlled through two texture maps. One map is essentially a
gobo, which can be projected by either orthographic or perspective projection, similar to a spot
or directional light. The other map is a one-dimensional map, known as the
falloff map, which controls the falloff. The procedure for determining the light
intensity at point
Omni | Spot | Fake spot | |
![]() |
![]() |
![]() |
|
Example | |||
[-12pt] | ![]() |
![]() |
![]() |
Gobo | |||
[-12pt] Falloff | ![]() |
![]() |
![]() |
Projection | Orthographic | Perspective | Orthographic |
![]() |
![]() |
![]() |
|
Bounding box | |||
[-12pt] |
The examples in Figure 10.33 will make this clear. Let's look at each of the
examples in more detail. The omni light projects the circular gobo orthographically across the box,
and places the “position” of the light (which is used to compute the
where
Next, let's look at the spot light. It uses a perspective projection, where the center of
projection is at one end of the box. The position of the light used for calculating the
The “fake spot” on the right is perhaps the most interesting. Here, projection is orthographic,
and it is sideways. The conical nature of the light as well as its falloff (what we
ordinarily think of as the falloff, that is) are both encoded in the gobo. The falloff map used
for this light is the same as for the omni light: it is brightest in the center of the box, and
causes the light to fade out as we approach the
You should study these examples until you are sure you know how they work.
Doom-style volumetric lights can be attractive for real-time graphics for several reasons:
One of the greatest sources of error in the images produced in real time (those positive thinkers among you might say the greatest opportunity for improvement) is indirect lighting: light that has “bounced” at least one time before illuminating the pixel being rendered. This is an extremely difficult problem. A first important step to making it tractable is to break up the surfaces in the scene into discrete patches or sample points. But even with a relatively modest number of patches, we still have to determine which patches can “see” each other and have a conduit of radiance, and which cannot see each other and do not exchange radiance. Then we must solve for the balance of light in the rendering equation. Furthermore, when any object moves, it can potentially alter which patches can see which. In other words, practically any change will alter the distribution of light in the entire scene.
However, it is usually the case that certain lights and geometry in the scene are not moving. In this case, we can perform more detailed lighting calculations (solve the rendering equation more fully), and then use those results, ignoring any error that results due to the difference in the current lighting configuration and the one that was used during the offline calculations. Let's consider several examples of this basic principle.
One technique is lightmapping. In this case, an extra UV channel is used to arrange the polygons of the scene into a special texture map that contains precalculated lighting information. This process of finding a good way to arrange the polygons within the texture map is often called atlasing. In this case, the discrete “patches” that we mentioned earlier are the lightmap texels. Lightmapping works well on large flat surfaces, such as floors and ceilings, which are relatively easy to arrange within the lightmap effectively. But more dense meshes, such as staircases, statues, machinery, and trees, which have much more complicated topology, are not so easily atlased. Luckily, we can just as easily store precomputed lighting values in the vertices, which often works better for relatively dense meshes.
What exactly is the precomputed information that is stored in lightmaps (or vertices)? Essentially, we store incident illumination, but there are many options. One option is the number of samples per patch. If we have only a single lightmap or vertex color, then we cannot account for the directional distribution of this incident illumination and must simply use the sum over the entire hemisphere. (As we have shown in Section 10.1.3, this “directionless” quantity, the incident radiant power per unit area, is properly known as radiosity, and for historical reasons algorithms for calculating lightmaps are sometimes confusingly known as radiosity techniques, even if the lightmaps include a directional component.) If we can afford more than one lightmap or vertex color, then we can more accurately capture the distribution. This directional information is then projected onto a particular basis. We might have each basis correspond to a single direction. A technique known as spherical harmonics [24][15] uses sinusoidal basis functions similar to 2D Fourier techniques. The point in any case is that the directional distribution of incident light does matter, but when saving precomputed incident light information, we are usually forced to discard or compress this information.
Another option is whether the precalculated illumination includes direct lighting, indirect light, or both. This decision can often be made on a per-light basis. The earliest examples of lightmapping simply calculated the direct light from each light in the scene for each patch. The primary advantage of this was that it allowed for shadows, which at the time were prohibitively expensive to produce in real time. (The same basic idea is still useful today, only now the goal is usually to reduce the total number of real-time shadows that must be generated.) Then the view could be moved around in real time, but obviously, any lights that were burned into the lightmaps could not move, and if any geometry moved, the shadows would be “stuck” to them and the illusion would break down. An identical runtime system can be used to render lightmaps that also include indirect lighting, although the offline calculations require much more finesse. It is possible for certain lights to have both their direct and indirect lighting baked into the lightmaps, while other lights have just the indirect portion included in the precalculated lighting and direct lighting done at runtime. This might offer advantages, such as shadows with higher precision than the lightmap texel density, improved specular highlights due to the correct modeling of the direction of incidence (which is lost when the light is burned into the lightmaps), or some limited ability to dynamically adjust the intensity of the light or turn it off or change its position. Of course, the presence of precalculated lighting for some lights doesn't preclude the use of completely dynamic techniques for other lights.
The lightmapping techniques just discussed work fine for static geometry, but what about dynamic objects such as characters, vehicles, platforms, and items? These must be lit dynamically, which makes the inclusion of indirect lighting challenging. One technique, popularized by Valve's Half Life 2 [10][16], is to strategically place light probes at various locations in the scene. At each probe, we render a cubic environment map offline. When rendering a dynamic object, we locate the closest nearby probe and use this probe to get localized indirect lighting. There are many variations on this technique—for example, we might use one environment map for diffuse reflection of indirect light, where each sample is prefiltered to contain the entire cosine-weighted hemisphere surrounding this direction, and a different cubic map for specular reflection of indirect light, which does not have this filtering.
The animation of human creatures is certainly of great importance in video games and in computer graphics in general. One of the most important techniques for animating characters is skeletal animation, although it is certainly not limited to this purpose. The easiest way to appreciate skeletal animation is to compare it to other alternatives, so let's review those first.
Let's say we have created a model of a humanoid creature such as a robot. How do we animate it? Certainly, we could treat it like a chess piece and move it around just like a box of microwavable herring sandwiches or any other solid object—this is obviously not very convincing. Creatures are articulated, meaning they are composed of connected, movable parts. The simplest method of animating an articulated creature is to break the model up into a hierarchy of connected parts—left forearm, left upper arm, left thigh, left shin, left foot, torso, head, and so on—and animate this hierarchy. An early example of this was Dire Straits' Money for Nothing music video. Newer examples include practically every PlayStation 2 game, such as the first Tomb Raider. The common feature here is that each part is still rigid; it does not be