Chapter 6
More on Matrices
Man's mind stretched to a new idea
never goes back to its original dimensions.
— Oliver Wendell Holmes Jr. (1841–1935)
Chapter 4 presented a few of the most of the important properties and operations of
matrices and discussed how matrices can be used to express geometric transformations in general.
Chapter 5 considered matrices and geometric transforms in detail. This chapter
completes our coverage of matrices by discussing a few more interesting and useful matrixoperations.
-
Section 6.1 covers the determinant of a matrix.
-
Section 6.2 covers the inverse of a matrix.
-
Section 6.3 discusses orthogonal matrices.
-
Section 6.4 introduces
homogeneous vectors and
matrices, and
shows how they can be used to perform affine transformations
in3D.
-
Section 6.5 discusses
perspective projection and shows how to do it
with a
matrix.
6.1Determinant of a Matrix
For square matrices, there is a special scalar called the
determinant of the matrix. The determinant has many useful properties in linear algebra,
and it also has interesting geometric interpretations.
As is our custom, we first discuss some math, and then make some geometric interpretations.
Section 6.1.1 introduces the notation for determinants and gives the linear
algebra rules for computing the determinant of a
or
matrix.
Section 6.1.2 discusses minors and cofactors. Then,
Section 6.1.3 shows how to compute the determinant of an arbitrary
matrix, by using minors and cofactors. Finally, Section 6.1.4
interprets the determinant from a geometric perspective.
6.1.1
Determinants of
and
matrices
The determinant of a square matrix
is denoted
or, in some
other books, as “
.” The determinant of a nonsquare matrix is
undefined. This section shows how to compute determinants of
and
matrices. The determinant of a general
matrix, which is fairly complicated, is
discussed in Section 6.1.3
The determinant of a
matrix is given by
Determinant of a
matrix
Notice that when we write the determinant of a matrix, we replace the brackets with vertical
lines.
Equation (6.1) can be remembered easier with the following diagram. Simply multiply
entries along the diagonal and back-diagonal, then subtract the back-diagonal term from the
diagonal term.
Some examples help to clarify the simple calculation:
The determinant of a
matrix is given by
Determinant of a
matrix
A similar diagram can be used to memorize Equation (6.2). We
write two copies of the matrix
side by side and multiply entries along the diagonals
and back-diagonals, adding the diagonal terms and subtracting the back-diagonal terms.
For example,
If we interpret the rows of a
matrix as three vectors, then the determinant of the
matrix is equivalent to the so-called
“triple product” of the three vectors:
determinant vs. 3D vector triple product
6.1.2Minors and Cofactors
Before we can look at determinants in the general case, we need to introduce some other
constructs: minors and cofactors.
Assume
is a matrix with
rows and
columns. Consider the matrix obtained by
deleting row
and column
from
. This matrix will obviously have
rows and
columns. The determinant of this submatrix, denoted
is known as a minor
of
. For example, the minor
is the determinant of the
matrix that is the result of deleting row 1 and column 2 from the
matrix
:
A minor of a
matrix
The cofactor of a square matrix
at a given row and column is the same as the
corresponding minor, but with alternating minors negated:
Matrix cofactor
As shown in Equation (6.4), we use the notation
to denote the
cofactor of
in row
, column
. The
term has the effect of
negating every other cofactor in a checkerboard pattern:
In the next section, we use minors and cofactors to compute determinants of an arbitrary
dimension
, and again in Section 6.2 to compute the inverse of a matrix.
6.1.3Determinants of Arbitrary
Matrices
Several equivalent definitions exist for the determinant of a matrix of arbitrary dimension
. The definition we consider here expresses a determinant in terms of its cofactors.
This definition is recursive, since cofactors are themselves signed determinants. First, we
arbitrarily select a row or column from the matrix. Now, for each element in the row or column,
we multiply this element by the corresponding cofactor. Summing these products yields the
determinant of the matrix. For example, arbitrarily selecting row
, the determinant can be
computed by
Computing an
determinant by using cofactors of row
As it turns out, it doesn't matter which row or column we choose; they all will produce the same
result.
Let's look at an example. We'll rewrite the equation for
determinant using
Equation (6.5):
Recursive definition of determinant applied to
case
Now, let's derive the
matrix determinant:
Recursive definition of determinant applied to
case
Expanding the cofactors, we have
Determinant of a
matrix in expanded form
As you can imagine, the complexity of explicit formulas for determinants of higher degree grows
rapidly. Luckily, we can perform an operation known as
“pivoting,” which doesn't affect the value
of the determinant, but causes a particular row or column to be filled with zeroes except for a
single element (the “pivot” element). Then only one cofactor has to be evaluated. Since we
won't need determinants of matrices higher than the
case, anyway, a complete
discussion of pivoting is outside the scope of this book.
Let's briefly state some important characteristics concerning determinants.
-
The determinant of an identity matrix of any dimension is 1:
Determinant of identity matrix
-
The determinant of a matrix product is equal to the
product of the determinants:
Determinant of matrix product
This extends to more than two matrices:
-
The determinant of the transpose of a matrix is equal to the original
determinant:
Determinant of matrix transpose
-
If any row or column in a matrix contains all 0s, then the determinant
of that matrix is 0:
Determinant of matrix with a row/column full of 0s
-
Exchanging any pair of rows negates the determinant:
Swapping rows negates the determinant
This same rule applies for exchanging a pair of columns.
-
Adding any multiple of a row (column) to another row (column) does not change the
value of the determinant!
Adding one row to
another doesn't
change the
determinant
This explains why our shear matrices from
Section 5.5 have a determinant
of 1.
6.1.4Geometric Interpretation of Determinant
The determinant of a matrix has an interesting geometric interpretation. In 2D, the determinant
is equal to the signed area of the parallelogram or
skew box that has the basis vectors as two sides (see Figure 6.1). (We
discussed how we can use skew boxes to visualize coordinate space transformations in
Section 4.2.) By signed area, we mean that the area is
negative if the skew box is “flipped” relative to its original orientation.
In 3D, the determinant is the volume of the
parallelepiped that has the transformed basis vectors as three edges. It will be negative
if the object is reflected (“turned inside out”) as a result of the transformation.
The determinant is related to the change in size that results from
transforming by the matrix. The absolute value of the determinant
is related to the change in area (in 2D) or volume (in 3D) that
will occur as a result of transforming an object by the matrix,
and the sign of the determinant indicates whether any reflection
or projection is contained in the matrix.
The determinant of the matrix can also be used to help classify the
type of transformation represented by a matrix. If the determinant
of a matrix is zero, then the matrix contains a projection. If the
determinant of a matrix is negative, then reflection is contained in
the matrix. See
Section 5.7 for more
about different classes of transformations.
6.2Inverse of a Matrix
Another important operation that applies only to square matrices is the
inverse of a matrix. This section discusses the matrix inverse from a mathematical and
geometric perspective.
The inverse of a square matrix
, denoted
is the matrix such that
when we multiply
by
on either side, the result is the identity
matrix. In other words,
Matrix inverse
Not all matrices have an inverse. An obvious example is a matrix with a row or column filled
with 0s—no matter what you multiply this matrix by, the corresponding row or column in the result will also be full of 0s.
If a matrix has an inverse, it is said to be invertible or nonsingular. A matrix
that does not have an inverse is said to be noninvertible or singular. For any
invertible matrix
, the vector equality
is true
only when
. Furthermore, the rows of an invertible matrix are linearly
independent, as are the columns. The rows (and columns) of a singular matrix are linearly
dependent.
The determinant of a singular matrix is zero and the determinant of a nonsingular matrix is
nonzero. Checking the magnitude of the determinant is the most commonly used test for
invertibility because it's the easiest and quickest. In ordinary circumstances, this is OK, but
please note that the method can break down. An example is an extreme shear matrix with basis
vectors that form a very long, thin parallelepiped with unit volume. This
ill conditioned
matrix is nearly singular, even though its determinant is 1. The condition number is the
proper tool for detecting such cases, but this is an advanced topic slightly beyond the scope of
this book.
There are several ways to compute the inverse of a matrix. The one we use is based on the
classical adjoint, which is the subject of the next section.
6.2.1The Classical Adjoint
Our method for computing the inverse of a matrix is based on the classical adjoint. The
classical adjoint of a matrix
, denoted “
,” is defined as the
transpose of the matrix of cofactors of
.
Let's look at an example. Take the
matrix
given earlier:
First, we compute the cofactors of
, as discussed in Section 6.1.2:
The classical adjoint of
is the transpose of the matrix of cofactors:
The classical adjoint
6.2.2Matrix Inverse—Official Linear Algebra Rules
To compute the inverse of a matrix, we divide the classical adjoint by the determinant:
Computing matrix inverse from classical adjoint and determinant
If the determinant is zero, the division is undefined, which jives with our earlier statement
that matrices with a zero determinant are noninvertible.
Let's look at an example. In the previous section we calculated the classical adjoint of a
matrix
; now let's calculate its inverse:
Here the value of
comes from Equation (6.6), and
is from Equation (6.3).
There are other techniques that can be used to compute the inverse of a matrix, such as
Gaussian elimination. Many linear algebra textbooks assert that such techniques are better
suited for implementation on a computer because they require fewer arithmetic operations, and
this assertion is true for larger matrices and matrices with a structure that may be exploited.
However, for arbitrary matrices of smaller order, such as the
,
, and
matrices encountered most often in geometric applications, the classical adjoint
method is generally the method of choice. The reason is that the classical adjoint method
provides for a branchless implementation, meaning there are no if statements, or
loops that cannot be unrolled statically. On today's superscalar architectures and dedicated
vector processors, this is a big win.
We close this section with a quick list of several important properties concerning matrix
inverses.
6.2.3Matrix Inverse—Geometric Interpretation
The inverse of a matrix is useful geometrically because it allows us to compute the “reverse”
or “opposite” of a transformation—a transformation that “undoes” another transformation if
they are performed in sequence. So, if we take a vector, transform it by a matrix
,
and then transform it by the inverse
, then we will get the original vector
back. We can easily verify this algebraically:
6.3Orthogonal Matrices
Previously we made reference to a special class of square matrices known as orthogonal
matrices. This section investigates orthogonal matrices a bit more closely. As usual, we first
introduce some pure math (Section 6.3.1), and then give some geometric
interpretations (Section 6.3.2). Finally, we discuss how to adjust
an arbitrary matrix to make it orthogonal (Section 6.3.3).
6.3.1Orthogonal Matrices—Official Linear Algebra Rules
A square matrix
is orthogonal if and only if the product of the matrix and its transpose is the identity
matrix:
Definition of orthogonal matrix
Recall from Section 6.2.2 that, by definition, a matrix times its inverse is the
identity matrix (
). Thus, if a matrix is orthogonal, its
transpose and inverse are equal:
Equivalent definition of orthogonal matrix
This is extremely powerful information, because the inverse of a matrix is often needed, and
orthogonal matrices arise frequently in practice in 3D graphics. For example, as mentioned
in Section 5.7.5, rotation and reflection matrices
are orthogonal. If we know that our matrix is orthogonal, we can essentially avoid computing the
inverse, which is a relatively costly computation.
6.3.2Orthogonal Matrices—Geometric Interpretation
Orthogonal matrices are interesting to us primarily because their inverse is trivial to compute.
But how do we know if a matrix is orthogonal in order to exploit its structure?
In many cases, we may have information about the way the matrix was constructed and therefore
know a priori that the matrix contains only rotation and/or reflection. This is a very
common situation, and it's very important to take advantage of this when using matrices to
describe rotation. We return to this topic in Section 8.2.1.
But what if we don't know anything in advance about the matrix? In other words, how can we tell
if an arbitrary matrix
is orthogonal? Let's look at the
case, which is
the most interesting one for our purposes. The conclusions we draw in this section can be
extended to matrices of any dimension.
Let
be an orthogonal
matrix. Expanding the definition of orthogonality
given by Equation (6.7), we have
This gives us nine equations, all of which must be true for
to be orthogonal:
Conditions satisfied by an orthogonal matrix
Let the vectors
,
, and
stand for the rows of
:
Now we can rewrite the nine equations more compactly:
Conditions satisfied by an orthogonal matrix
This notational changes makes it easier for us to make some interpretations.
-
First, the dot product of a vector with itself is 1 if
and only if the vector is a unit vector. Therefore, the equations
with a 1 on the right-hand side of the equals sign
(Equations (6.8),
(6.9),
and (6.10))
will be true only when
,
,
and
are unit vectors.
-
Second, recall from Section 2.11.2
that the dot product of two vectors is 0 if and only if they are
perpendicular. Therefore, the other six equations (with 0 on the
right-hand side of the equals sign) are true when
,
, and
are mutually perpendicular.
So, for a matrix to be orthogonal, the following must be true:
-
Each row of the matrix must be a unit vector.
-
The rows of the matrix must be mutually perpendicular.
Similar statements can be made regarding the columns of the matrix, since if
is orthogonal, then
must be orthogonal as well.
Notice that these criteria are precisely those that we said in
Section 3.3.3 were satisfied by an orthonormal set of basis
vectors. In that section, we also noted that an orthonormal basis was particularly useful
because we could perform, by using the dot product, the “opposite” coordinate transform from
the one that is always available. When we say that the transpose of an orthogonal matrix is
equal to its inverse, we are just restating this fact in the formal language of linear algebra.
Also notice that three of the orthogonality equations are duplicates, because the dot product is
commutative. Thus, these nine equations actually express only six constraints. In an arbitrary
matrix there are nine elements and thus nine degrees of freedom, but in an
orthogonal matrix, six degrees of freedom are removed by the constraints, leaving three degrees
of freedom. It is significant that three is also the number of degrees of freedom inherent in 3D
rotation. (However, rotation matrices cannot contain a reflection, so there is “slightly more
freedom” in the set of orthogonal matrices than in the set of orientations in 3D.)
When computing a matrix inverse, we will usually only take advantage of orthogonality if we know
a priori that a matrix is orthogonal. If we don't know in advance, it's probably a waste
of time to check. In the best case, we check for orthogonality and find that the matrix is
indeed orthogonal, and then we transpose the matrix. But this may take almost as much time as
doing the inversion. In the worst case, the matrix is not orthogonal, and any time we spent
checking was definitely wasted. Finally, even matrices that are orthogonal in the abstract may
not be exactly orthogonal when represented in floating point, and so we must use tolerances,
which have to be tuned.
One important note is needed here on terminology that can be slightly confusing. In linear algebra, we
describe a set of basis vectors as orthogonal if they are mutually perpendicular. It is
not required that they have unit length. If they do have unit length, they are an
orthonormal basis. Thus the rows and columns of an orthogonal matrix are
orthonormal basis vectors. However, constructing a matrix from a set of orthogonal basis
vectors does not necessarily result in an orthogonal matrix (unless the basis vectors are also
orthonormal).
6.3.3Orthogonalizing a Matrix
It is sometimes the case that we encounter a matrix that is slightly out of orthogonality. We
may have acquired bad data from an external source, or we may have accumulated floating point
error (which is called matrix creep). For basis vectors used for bump mapping (see
Section 10.9), we will often adjust the basis to be orthogonal, even if
the texture mapping gradients aren't quite perpendicular. In these situations, we would like to
orthogonalize the matrix, resulting in a matrix that has mutually perpendicular unit
vector axes and is (hopefully) as close to the original matrix as possible.
The standard algorithm for constructing a set of orthogonal basis vectors (which is what the rows
of an orthogonal matrix are) is Gram-Schmidt orthogonalization. The basic idea is to go
through the basis vectors in order. For each basis vector, we subtract off the portion of that
vector that is parallel to the proceeding basis vectors, which must result in a perpendicular
vector.
Let's look at the
case as an example. As before, let
,
, and
stand for the
rows of a
matrix
. (Remember, you can
also think of these as the
-,
-, and
-axes of a coordinate
space.) Then an orthogonal set of row vectors,
,
, and
, can be computed according to
the following algorithm:
Gram-Schmidt orthogonalization of 3D basis vectors
After applying these steps, the vectors
,
, and
are
guaranteed to be mutually perpendicular, and thus will form an orthogonal basis. However, they
may not necessarily be unit vectors. We need an orthonormal basis to form an orthogonal matrix,
and so we must normalize the vectors. (Again, the terminology can be confusing, see the note at
the end of the previous section.) Notice that if we normalize the vectors as we go, rather than
in a second pass, then we can avoid all of the divisions. Also, a trick that works in 3D (but
not in higher dimensions) is to compute the third basis vector using the cross product:
The Gram-Schmidt algorithm is biased, depending on the order in which the basis vectors are
listed. For instance,
never changes, and
is likely to change the
most. A variation on the algorithm that is not biased towards any particular axis is to abandon
the attempt to completely orthogonalize the entire matrix in one pass. We select some fraction
, and instead of subtracting off all of the projection, we subtract off only
of it. We
also subtract the projection onto the original axis, not the adjusted one. In this way, the
order in which we perform the operations does not matter and we have no dimensional bias. This
algorithm is summarized by
Nonbiased incremental orthogonalization algorithm
One iteration of this algorithm results in a set of basis vectors
that are slightly “more orthogonal” than the original vectors, but
possibly not completely orthogonal. By repeating this procedure
multiple times, we can eventually converge on an orthogonal basis.
Selecting an appropriately small value for
(say,
) and
iterating a sufficient number of times (say, ten) gets us fairly
close. Then, we can use the standard Gram-Schmidt algorithm to
guarantee a perfectly orthogonal basis.
6.4
Homogeneous Matrices
Up until now, we have used only 2D and 3D vectors. In this
section, we introduce 4D vectors and the so-called “homogeneous” coordinate. There is nothing
magical about 4D vectors and matrices (and no, the fourth coordinate in this case isn't
“time”). As we will see, 4D vectors and
matrices are nothing more than a
notational convenience for what are simple 3D operations.
This section introduces 4D homogeneous space and
transformation matrices and their
application to affine 3D geometry. Section 6.4.1 discusses the nature of 4D
homogeneous space and how it is related to physical 3D space.
Section 6.4.2 explains how
transformation matrices can be
used to express translations. Section 6.4.3 explains how
transformation matrices can be used to express affine transformations.
6.4.14D Homogeneous Space
As was mentioned in Section 2.1, 4D vectors have four components, with
the first three components being the standard
,
, and
components. The fourth component
in a 4D vector is
, sometimes referred to as the homogeneous coordinate.
To understand how the standard physical 3D space is extended into 4D, let's first examine
homogeneous coordinates in 2D, which are of the form
. Imagine the standard 2D plane
as existing in 3D at the plane
, such that physical 2D point
is represented in
homogeneous space
. For all points that are not in the plane
, we can compute
the corresponding 2D point by projecting the point onto the plane
, by dividing by
. So
the homogeneous coordinate
is mapped to the physical 2D point
. This is
shown inFigure 6.2.
For any given physical 2D point
there are an infinite number of corresponding points in
homogeneous space, all of the form
, provided that
. These points form a
line through the (homogeneous) origin.
When
, the division is undefined and there is no corresponding physical point in 2D space.
However, we can interpret a 2D homogeneous point of the form
as a “point at
infinity,” which defines a direction rather than a location. When we make the
conceptual distinction between “points” and “vectors” (see
Section 2.4), then the “locations” where
are “points“
and the “directions” with
are are “vectors.” There is more on this in the next section.
The same basic idea applies when extending physical 3D space to 4D homogeneous space (although
it's a lot harder to visualize). The physical 3D points can be thought of as living in the
hyperplane in 4D at
. A 4D point is of the form
, and we project a 4D point
onto this hyperplane to yield the corresponding physical 3D point
. When
, the 4D point represents a “point at infinity,” which defines a direction rather than a
location.
Homogeneous coordinates and projection by division by
are interesting, but why on earth would
we want to use 4D space? There are two primary reasons for using 4D vectors and
matrices. The first reason, which we discuss in the next section, is actually nothing more than
a notational convenience. The second reason is that if we put the proper value into
, the
homogenous division will result in a perspective projection, as we discuss in
Section 6.5.
6.4.2
Translation Matrices
Recall from Section 4.2 that a
transformation
matrix represents a linear transformation, which does not contain translation. Due to the
nature of matrix multiplication, the zero vector is always transformed into the zero vector, and
therefore any transformation that can be represented by a matrix multiplication cannot contain
translation. This is unfortunate, because matrix multiplication and inversion are very
convenient tools for composing complicated transformations out of simple ones and manipulating
nested coordinate space relationships. It would be nice if we could find a way to somehow extend
the standard
transformation matrix to be able to handle transformations with
translation;
matrices provide a mathematical “kludge” that allows us to do this.
Assume for the moment that
is always 1. Thus, the standard 3D vector
will always
be represented in 4D as
. Any
transformation matrix can by represented
in 4D by using the conversion
Extending a
transform matrix into 4D
When we multiply a 4D vector of the form
by a
matrix of this form, we
get the same result as the standard
case, the only difference being the additional
coordinate
:
Now for the interesting part. In 4D, we can also express translation as a matrix multiplication,
something we were not able to do in 3D:
Using a
matrix to perform translation in 3D
It is important to understand that this matrix multiplication is still a linear
transformation. Matrix multiplication cannot represent “translation” in 4D, and the 4D zero
vector will always be transformed back into the 4D zero vector. The reason this trick works to
transform points in 3D is that we are actually shearing 4D space. (Compare
Equation (6.11) with the shear matrices from
Section 5.5.) The 4D hyperplane that corresponds to physical 3D
space does not pass through the origin in 4D. Thus, when we shear 4D space, we are able to
translate in 3D.
Let's examine what happens when we perform a transformation without translation followed by a
transformation with only translation. Let
be a rotation matrix. (In fact,
could possibly contain other 3D linear transformations, but for now, let's assume
only contains rotation.) Let
be a translation matrix of the form in
Equation (6.11):
Then we could rotate and then translate a point
to compute a new point
by
Remember that the order of transformations is important, and since
we have chosen to use row vectors, the order of transformations
coincides with the order that the matrices are multiplied, from left
to right. We are rotating first and then translating.
Just as with
matrices, we can concatenate the two matrices into a single
transformation matrix, which we assign to the matrix
:
Let's now examine the contents of
:
Notice that the upper
portion of
contains the rotation portion, and the bottom row contains the
translation portion. The rightmost column (for now) will be
.
Applying this information in reverse, we can take any
matrix and separate it into
a linear transformation portion, and a translation portion. We can express this succinctly with
block matrix notation, by assigning the translation vector
to
the vector
:
For the moment, we are assuming that the rightmost column of
a
transformation matrix is always
. We will begin to encounter situations
where this is not be the case in
Section 6.5.
Let's see what happens with the so-called “points at infinity” (those vectors with
).
Multiplying by a “standard”
linear transformation matrix extended into 4D (a
transformation that does not contain translation), we get
Multiplying a “point at infinity” by a
matrix without
translation
In other words, when we transform a point-at-infinity vector of the form
by a
transformation matrix containing rotation, scale, etc., the expected transformation occurs, and
the result is another point-at-infinity vector of the form
.
When we transform a point-at-infinity vector by a transformation that does contain
translation, we get the following result:
Multiplying a “point at infinity” by a
matrix with translation
Notice that the result is the same—that is, no translation occurs.
In other words, the
component of a 4D vector can be used to selectively “switch off” the
translation portion of a
matrix. This is useful because some vectors represent
“locations” and should be translated, and other vectors represent “directions,” such as
surface normals, and should not be translated. In a geometric sense, we can think of the first
type of data, with
, as “points,” and the second type of data, the “points at infinity”
with
, as “vectors.”
So, one reason why
matrices are useful is that a
transformation matrix
can contain translation. When we use
matrices solely for this purpose, the
right-most column of the matrix will always be
. Since this is the case, why
don't we just drop the column and use a
matrix? According to linear algebra rules,
matrices are undesirable for several reasons:
-
We cannot multiply a
matrix by another
matrix.
-
We cannot invert a
matrix, since the matrix
is not square.
-
When we multiply a 4D vector by a
matrix, the
result is a 3D vector.
Strict adherence to linear algebra rules forces us to add the fourth column. Of course, in our
code, we are not bound by linear algebra rules. It is a common technique to write a
matrix class that is useful for representing transformations that contain translation. Basically,
such a matrix is a
matrix, where the right-most column is assumed to be
and therefore isn't explicitly stored.
Chapter 5 presented
matrices for
many primitive transformations. Because a
matrix can represent only linear
transformations in 3D, translation was not considered. Armed with
transform
matrices, though, we can now create more general affine transformations that contain
translation, such as:
-
rotation about an axis that does not pass through the origin,
-
scale about a plane that does not pass through the origin,
-
reflection about a plane that does not pass through the origin, and
-
orthographic projection onto a plane that does not pass through
the origin.
The basic idea is to translate the “center” of the transformation to the origin, perform the
linear transformation by using the techniques developed in Chapter 5, and
then transform the center back to its original location. We start with a translation matrix
that translates the point
to the origin, and a linear transform matrix
from Chapter 5 that performs the linear transformation. The
final affine transformation matrix
will be equal to the matrix product
, where
is the translation matrix with
the opposite translation amount as
.
It is interesting to observe the general form of such a matrix. Let's first write
,
, and
in the partitioned form we used earlier:
Evaluating the matrix multiplication, we get
Thus, the extra translation in an affine transformation changes only the last row of the
matrix. The upper
portion, which contains the linear transformation, is
not affected.
Our use of “homogeneous” coordinates so far has really been nothing more than a mathematical
kludge to allow us to include translation in our transformations. We use quotations around
“homogeneous” because the
value was always 1 (or 0, in the case of points at infinity). In
the next section, we will remove the quotations, and discuss meaningful ways to use 4D
coordinates with other
values.
6.5
Matrices and Perspective Projection
Section 6.4.1 showed that when we interpret
a 4D homogeneous vector in 3D, we divide by
. This division is a mathematical tool that we
did not really take advantage of in the previous section, since
was always 1 or 0. However,
if we play our cards right, we can use the division by
to encapsulate very succinctly the
important geometric operation of perspective projection.
We can learn a lot about perspective projection by comparing it to another type of projection we
have already discussed, orthographic projection.
Section 5.3 showed how to project 3D space onto a 2D
plane, known as the projection plane, by using orthographic projection. Orthographic
projection is also known as parallel projection, because the projectors are parallel. (A
projector is a line from the original point to the resulting projected point on the
plane). The parallel projectors used in orthographic projection are shown in
Figure 6.3.
Perspective projection in 3D also projects onto a 2D plane. However, the projectors are not
parallel. In fact, they intersect at a point, known as the center of projection. This is
shown in Figure 6.4.
Because the center of projection is in front of the projection plane, the projectors cross before
striking the plane, and thus the image is inverted. As we move an object farther away from the
center of projection, its orthographic projection remains constant, but the perspective
projection gets smaller, as illustrated in Figure 6.5. The teapot on
the right is further from the projection plane, and the projection is (slightly) smaller than the
closer teapot. This is a very important visual cue known as perspective foreshortening.
6.5.1A Pinhole Camera
Perspective projection is important in graphics because it models the way the human visual system
works. Actually, the human visual system is more complicated because we have two eyes, and for
each eye, the projection surface (our retina) is not flat; so let's look at the simpler example
of a pinhole camera. A pinhole camera is a box with a tiny hole on one end. Rays of light enter the pinhole (thus
converging at a point), and then strike the opposite end of the box, which is the projection
plane. This is shown in Figure 6.6.
In this view, the left and back sides of the box have been removed so you can see the inside.
Notice that the image projected onto the back of the box is inverted. This is because the rays
of light (the projectors) cross as they meet at the pinhole (the center of projection).
Let's examine the geometry behind the perspective projection of a pinhole camera. Consider a 3D
coordinate space with the origin at the pinhole, the
-axis perpendicular to the projection
plane, and the
- and
-axes parallel to the plane of projection, as shown in
Figure 6.7.
Let's see if we can't compute, for an arbitrary point
, the 3D coordinates of
, which is
projected through the pinhole onto the projection plane.
First, we need to know the distance from the pinhole to the projection plane. We assign this
distance to the variable
. Thus, the plane is defined by the equation
. Now let's
view things from the side and solve for
(see Figure 6.8).
By similar triangles, we can see that
Notice that since a pinhole camera flips the image upside down, the signs of
and
are opposite. The value of
is computed in a similar manner:
The
values of all the projected points are the same:
. Thus, the result of projecting a
point
through the origin onto a plane at
is
Projecting onto the plane
In practice, the extra minus signs create unnecessary complexities, and so we move the plane of
projection to
, which is in front of the center of projection, as shown in
Figure 6.9. Of course, this would never work
for a real pinhole camera, since the purpose of the pinhole in the first place is to allow in
only light that passes through a single point. However, in the mathematical universe inside a
computer, it works just fine.
As expected, moving the plane of projection in front of the center of projection removes the
annoying minus signs:
Projecting a point onto the plane
6.5.2Perspective Projection Matrices
Because the conversion from 4D to 3D space implies a division, we can encode a perspective
projection in a
matrix. The basic idea is to come up with an equation for
with a common denominator for
,
, and
, and then set up a
matrix that will set
equal to this denominator. We assume that the original points have
.
First, we manipulate Equation (6.12) to have a common denominator:
To divide by this denominator, we put the denominator into
, so the 4D point will be of the
form
So we need a
matrix that multiplies a homogeneous vector
to produce
. The matrix that does this is
Projecting onto the plane
using a
matrix
Thus, we have derived a
projection matrix.
There are several important points to be made here:
-
Multiplication by this matrix doesn't actually perform the perspective
transform, it just computes the proper denominator into
. Remember
that the perspective division actually occurs when we convert from 4D
to 3D by dividing by
.
-
There are many variations. For example, we can place the plane of
projection at
, and the center of projection at
. This
results in a slightly different equation.
-
This seems overly complicated. It seems like it would be simpler to
just divide by
, rather than bothering with matrices. So why is
homogeneous space interesting? First,
matrices provide a
way to express projection as a transformation that can be concatenated
with other transformations. Second, projection onto nonaxially aligned
planes is possible. Basically, we don't need homogeneous
coordinates, but
matrices provide a compact way to represent
and manipulate projection transformations.
-
The projection matrix in a real graphics geometry pipeline
(perhaps more accurately known as the “clip matrix”) does more than
just copy
into
. It differs from the one we derived
in two important respects:
-
Most graphics systems apply a normalizing scale factor
such that
at the far clip plane. This ensures
that the values used for depth buffering are distributed
appropriately for the scene being rendered,
to maximize precision of depth buffering.
-
The projection matrix in most graphics systems
also scales the
and
values according to the
field of view of the camera.
We'll get into these details in Section 10.3.2, when
we show what a projection matrix looks like in practice,
using both DirectX and OpenGL as examples.
Exercises
-
Compute the determinant of the following matrix:
-
Compute the determinant, adjoint, and inverse of the
following matrix:
-
Is the following matrix orthogonal?
-
Invert the matrix from the previous exercise.
-
Invert the
matrix
-
Construct a
matrix to translate by
.
-
Construct a
matrix to rotate 20° about the
-axis
and then translate by
.
-
Construct a
matrix to translate by
and then
rotate 20° about the
-axis.
-
Construct a
matrix to perform a perspective
projection onto the plane
. (Assume the origin is the
center of projection.)
-
Use the matrix from the previous exercise to compute the 3D
coordinates of the projection of the point
onto the plane
.
An attempt at visualizing the Fourth Dimension:
Take a point, stretch it into a line, curl it into a circle,
twist it into a sphere, and punch through the sphere.
— Albert Einstein (1879–1955)