Chapter 8
Rotation in Three Dimensions
If you do not change direction,
you may end up where you are heading.
— Lao Tzu (600–531 BCE)

This chapter tackles the difficult problem of describing the orientation of an object in 3D. It also discusses the closely related concepts of rotation and angular displacement. There are several different ways we can express orientation and angular displacement in 3D. Here we discuss the three most important methods—matrices, Euler angles, and quaternions—as well as two lesser known forms—axis-angle and exponential map. For each method, we define precisely how the representation method works, and discuss the peculiarities, advantages, and disadvantages of the method.

Different techniques are needed in different circumstances, and each technique has its advantages and disadvantages. It is important to know not only how each method works, but also which technique is most appropriate for a particular situation and how to convert between representations.

The discussion of orientation in 3D is divided into the following sections:

• Section 8.1 discusses the subtle differences between terms like “orientation,” “direction,” and “angular displacement.”
• Section 8.2 describes how to express orientation using a matrix.
• Section 8.3 describes how to express angular displacement using Euler angles.
• Section 8.4 describes the axis-angle and exponential map forms.
• Section 8.5 describes how to express angular displacement using a quaternion.
• Section 8.6 compares and contrasts the different methods.
• Section 8.7 explains how to convert an orientation from one form to another.
This chapter makes extensive use of the terms object space and upright space. If you aren't familiar with these terms, you should flip back to Section 3.2, where the terms were first introduced.

# 8.1What Exactly is “Orientation”?

Before we can begin to discuss how to describe orientation in 3D, let us first define exactly what it is that we are attempting to describe. The term orientation is related to other similar terms, such as

• direction
• angular displacement
• rotation.

Intuitively, we know that the “orientation” of an object basically tells us what direction the object is facing. However, “orientation” is not exactly the same as “direction.”

For example, a vector has a direction, but not an orientation. The difference is that when a vector points in a certain direction, you can twist the vector along its length (see Figure 8.1), and there is no real change to the vector, since a vector has no thickness or dimension other than its length.

Figure 8.1 Twisting a vector results in no appreciable change to the vector

In contrast to a simple vector, consider an object, such as a jet, facing a certain direction. If we twist the jet (see Figure 8.2) in the same way that we twisted the vector, we will change the orientation of the jet. In Section 8.3, we refer to this twisting component of an object's orientation as bank.

Figure 8.2Twisting an object changes its orientation

The fundamental difference between direction and orientation is seen concretely by the fact that we can parameterize a direction in 3D with just two numbers (the spherical coordinate angles—see Section 7.3.2), whereas an orientation requires a minimum of three numbers (Euler angles—see Section 8.3).

Section 2.4.1 discussed that it's impossible to describe the position of an object in absolute terms—we must always do so within the context of a specific reference frame. When we investigated the relationship between “points” and “vectors,” we noticed that specifying a position is actually the same as specifying an amount of translation from some other given reference point (usually the origin of some coordinate system).

In the same way, orientation cannot be described in absolute terms. Just as a position is given by a translation from some known point, an orientation is given by a rotation from some known reference orientation (often called the “identity” or “home” orientation). The amount of rotation is known as an angular displacement. In other words, describing an orientation is mathematically equivalent to describing an angular displacement.

We say “mathematically equivalent” because in this book, we make a subtle distinction between “orientation” and terms such as “angular displacement” and “rotation.” It is helpful to think of an “angular displacement” as an operator that accepts an input and produces an output. A particular direction of transformation is implied; for example, the angular displacement from the old orientation to the new orientation, or from upright space to object space. An example of an angular displacement is, “Rotate $90{}^{\mathrm{o}}$ about the $z$ -axis.” It's an action that we can perform on a vector.

However, we frequently encounter state variables and other situations in which this operator framework of input/output is not helpful and a parent/child relationship is more natural. We tend to use the word “orientation” in those situations. An example of an orientation is, “Standing upright and facing east.” It describes a state of affairs.

Of course, we can describe the orientation “standing upright and facing east” as an angular displacement by saying, “Stand upright, facing north, and then rotate $90{}^{\mathrm{o}}$ about the $z$ -axis.” This distinction between orientation and angular displacement is similar to the distinction between points and vectors, which are two other terms that are equivalent mathematically but not identical conceptually. In both cases, the first term is used primarily to describe a single state, and the second term primarily used to describe a difference between two states. Of course, these conventions are purely a matter of preference, but they can be helpful.

You might also hear the word “attitude” used to refer the orientation of an object, especially if that object is an aircraft.

# 8.2Matrix Form

One way to describe the orientation of a coordinate space in 3D is to tell which way the basis vectors of that coordinate space (the $+x$ , $+y$ , and $+z$ axes) point. Of course, we don't measure these vectors in the coordinate space we are attempting to describe—by definition, they are $\left[1,0,0\right]$ , $\left[0,1,0\right]$ , and $\left[0,0,1\right]$ no matter what orientation the coordinate space is in. We must describe the basis vectors using some other coordinate space. By doing so, we've established the relative orientation of the two coordinate spaces.

When these basis vectors are used to form the rows of a $3×3$ matrix, we have expressed the orientation in matrix form.1 Another way of saying all this is that we can express the relative orientation of two coordinate spaces by giving a rotation matrix that can be used to transform vectors from one coordinate space to the other.

Figure 8.3 Defining an orientation using a matrix

## 8.2.1Which Matrix?

We have already seen how a matrix can be used to transform points from one coordinate space to another. In Figure 8.3, the matrix in the upper right-hand corner can be used to rotate points from the object space of the jet into upright space. We've pulled out the rows of this matrix to emphasize their direct relationship to the coordinates for the jet's body axes. The rotation matrix contains the object axes, expressed in upright space. Simultaneously, it is a rotation matrix: we can multiply row vectors by this matrix to transform those vectors from object-space coordinates to upright-space coordinates.

Legitimate question to ask are: Why does the matrix contain the body axes expressed using upright-space coordinates? Why not the upright axes expressed in object-space coordinates? Another way to phrase this is, Why did we choose to give a rotation matrix that transforms vectors from object space to upright space? Why not from upright space to object space?

From a mathematical perspective, this question is a bit ridiculous. Because rotation matrices are orthogonal, their inverse is the same as their transpose (see Section 6.3.2). Thus, the decision is entirely a cosmetic one.

But practically speaking, in our opinion, it is quite important. At issue is whether you can write code that is intuitive to read and works the first time, or whether it requires a lot of work to decipher, or a knowledge of conventions that are not stated because they are “obvious” to everyone but you. So please allow us a brief digression to continue a line of thought begun when we introduced the term “upright space” in Section 3.2.4 concerning the practical aspects of what happens when the math of coordinate space transformations gets translated into code. Also please allow some latitude to express some opinions based on our observations watching programmers grapple with rotation matrices. We don't expect that everyone will agree with our assertions, but we hope that every reader will at least appreciate the value in considering these issues.

Certainly every good math library will have a $3×3$ matrix class that can represent any arbitrary transformation, which is to say that it makes no assumptions about the value of the matrix elements. (Or perhaps it is a $4×4$ matrix that can do projection, or a $4×3$ which can do translation but not projection—those distinctions are not important here.) For a matrix like this, the operations inherently are in terms of some input coordinate space and an output coordinate space. This is just implicit in the idea of matrix multiplication. If you need to go from output to input, then you must obtain the inverse of the matrix.

It is common practice to use the generic transform matrix class to describe the orientation of an object. In this case, rotation is treated just like any other transformation. The interface remains in terms of a source and destination space. Unfortunately, it is our experience that the following two matrix operations are by far the most commonly used:2

• Take an object-space vector and express it in upright coordinates.
• Take an upright-space vector and express it in object coordinates.

Notice that we need to be able to go in both directions. We have no experience or evidence that either direction is significantly more common than the other. But more important, the very nature of the operations and the way programmers think about the operations is in terms of “object space” and “upright space” (or some other equivalent terminology, such as “parent space” and “child space”). We do not think of them in terms of a source space and a destination space. It is in this context that we wish to consider the question posed at the beginning of this section: Which matrix should we use?

First, we should back up a bit and remind ourselves of the mathematically moot but yet conceptually important distinction between orientation and angular displacement. (See the notes on terminology at the end of Section 8.1.) If your purpose is to create a matrix that performs a specific angular displacement (for example, “rotate 30 degrees about the $x$ -axis”), then the two operations above are not really the ones you probably have in your head, and using a generic transform matrix with its implied direction of transformation is no problem, and so this discussion does not apply. Right now, we are focusing on the situation in which the orientation of some object is stored as a state variable.

Let's assume that we adopt the common policy and store orientation using the generic transformation matrix. We are forced to arbitrarily pick a convention, so let's decide that multiplication by this matrix will transform from object to upright space. If we have a vector in upright space and we need to express it in object-space coordinates, we must multiply this vector by the inverse3 of the matrix.

Now let's see how our policy affects the code that is written and read hundreds of times by average game programmers.

• Rotate some vector from object space to upright space is translated into code as multiplication by the matrix.
• Rotate a vector from upright space to object space is translated into code as multiplication by the inverse (or transpose) of the matrix.

Notice that the code does not match one-to-one with the high-level intentions of the programmer. It forces every user to remember what the conventions are every time they use the matrix. It is our experience that this coding style is a contributing factor to the difficulty that beginning programmers have in learning how to use matrices; they often end up transposing and negating things randomly when things don't look right.

We have found it helpful to have a special $3×3$ matrix class that is used exclusively for storing the orientation of an object, not for arbitrary transforms. The class assumes, as an invariant, that the matrix is orthogonal, meaning it contains only rotation. (We also would probably assume that the matrix does not contain a reflection, even though that is possible in an orthogonal matrix.) With these assumptions in place, we are now free to perform rotations using the matrix at a higher level of abstraction. Our interface functions match exactly the high-level intentions of the programmer. Furthermore, we have removed the confusing linear algebra details having to do with row vectors versus column vectors, which space is on the left or right, which way is the regular way and which is the inverse, and so forth. Or rather, we have confined such details to the class internals—the person implementing the class certainly needs to pick a convention (and hopefully document it). In fact, in this specialized matrix class, the operations of “multiply a vector” and “invert this matrix” really are not that useful. We would advocate keeping this dedicated matrix class confined to operations in terms of upright space and object space, rather than multiply a vector.

So, back to the question posed at the start of this section: Which matrix should we use? Our answer is, “It shouldn't matter.” By that we mean there is a way to design your matrix code in such a way that it can be used without knowing what choice was made. As far as the C++ code goes, this is purely a cosmetic change. For example, perhaps we just replace the function name multiply() with objectToUpright(), and likewise we replace multiplyByTranspose() with uprightToObject(). The version of the code with descriptive, named coordinate spaces is easier to read and write.

## 8.2.2Direction Cosines Matrix

You might come across the (very old school) term direction cosines in the context of using a matrix to describe orientation. A direction cosines matrix is the same thing as a rotation matrix; the term just refers to a special way to interpret (or construct) the matrix, and this interpretation is interesting and educational, so let's pause for a moment to take a closer look. Each element in a rotation matrix is equal to the dot product of a cardinal axis in one space with a cardinal axis in the other space. For example, the center element ${m}_{22}$ in a $3×3$ matrix gives the dot product that the $y$ -axis in one space makes with the $y$ -axis in the other space.

More generally, let's say that the basis vectors of a coordinate space are the mutually orthogonal unit vectors $\mathbf{p}$ , $\mathbf{q}$ , and $\mathbf{r}$ , while a second coordinate space with the same origin has as its basis a different (but also orthonormal) basis ${\mathbf{p}}^{\prime }$ , ${\mathbf{q}}^{\prime }$ , and ${\mathbf{r}}^{\prime }$ . (Please allow us to break from convention by dropping all the hats from the unit vectors in this section, to avoid distracting clutter in the equations.) The rotation matrix that rotates row vectors from the first space to the second can be constructed from the cosines of the angles between each pair of basis vectors. Of course, the dot product of two unit vectors is exactly equal to the cosine of the angle between them, so the matrix product is

These axes can be interpreted as geometric rather than numeric entities, so it really does not matter what coordinates are used to describe the axes (provided we use the same coordinate space to describe all of them), the rotation matrix will be the same.

For example, let's say that our axes are described using coordinates relative to the first basis. Then $\mathbf{p}$ , $\mathbf{q}$ , and $\mathbf{r}$ have the trivial forms $\left[1,0,0\right]$ , $\left[0,1,0\right]$ and $\left[0,0,1\right]$ , respectively. The basis vectors of the second space, ${\mathbf{p}}^{\prime }$ , ${\mathbf{q}}^{\prime }$ , and ${\mathbf{r}}^{\prime }$ have arbitrary coordinates. When we substitute the trivial vectors $\mathbf{p}$ , $\mathbf{q}$ , and $\mathbf{r}$ into the matrix in Equation (8.1) and expand the dot products, we get

In other words, the rows of the rotation matrix are the basis vectors of the output coordinate space, expressed by using the coordinates of the input coordinate space. Of course, this fact is not just true for rotation matrices, it's true for all transformation matrices. This is the central idea of why a transformation matrix works, which was developed in Section 4.2.

Now let's look at the other case. Instead of using coordinates relative to the first basis, we'll measure everything using the second coordinate space (the output space). This time, ${\mathbf{p}}^{\prime }$ , ${\mathbf{q}}^{\prime }$ , and ${\mathbf{r}}^{\prime }$ have trivial forms, and $\mathbf{p}$ , $\mathbf{q}$ , and $\mathbf{r}$ are arbitrary. Putting these into the direction cosines matrix produces

This says that the columns of the rotation matrix are formed from the basis vectors of the input space, expressed using the coordinates of the output space. This is not true of transformation matrices in general; it applies only to orthogonal matrices such as rotation matrices.

Also, remember that our convention is to use row vectors on the left. If you are using column vectors on the right, things will be transposed.

## 8.2.3Advantages of Matrix Form

Matrix form is a very explicit form of representing orientation. This explicit nature provides some benefits.

• Rotation of vectors is immediately available. The most important property of matrix form is that you can use a matrix to rotate vectors between object and upright space. No other representation of orientation allows this4—to rotate vectors, we must convert the orientation to matrix form.
• Format used by graphics APIs. Partly due to reasons in the previous item, graphics APIs use matrices to express orientation. (API stands for Application Programming Interface. Basically, this is the code we use to communicate with the graphics hardware.) When we are communicating with the API, we are going to have to express our transformations as matrices. How we store transformations internally in our program is up to us, but if we choose another representation, we are going to have to convert them into matrices at some point in the graphics pipeline.
• Concatenation of multiple angular displacements. A third advantage of matrices is that it is possible to “collapse” nested coordinate space relationships. For example, if we know the orientation of object A relative to object B, and we know the orientation of object B relative to object C, then by using matrices, we can determine the orientation of object A relative to object C. We encountered these concepts before when we discussed nested coordinate spaces in Chapter 3, and then we discussed how matrices could be concatenated in Section 5.6.
• Matrix inversion. When an angular displacement is represented in matrix form, it is possible to compute the “opposite” angular displacement by using matrix inversion. What's more, since rotation matrices are orthogonal, this computation is a trivial matter of transposing the matrix.

## 8.2.4Disadvantages of Matrix Form

The explicit nature of a matrix provides some advantages, as we have just discussed. However, a matrix uses nine numbers to store an orientation, and it is possible to parameterize orientation with only three numbers. The “extra” numbers in a matrix can cause some problems.

• Matrices take more memory. If we need to store many orientations (for example, keyframes in an animation sequence), that extra space for nine numbers instead of three can really add up. Let's take a modest example. Let's say we are animating a model of a human that is broken up into 15 pieces for different body parts. Animation is accomplished strictly by controlling the orientation of each part relative to its parent part. Assume we are storing one orientation for each part, per frame, and our animation data is stored at a modest rate, say, 15 Hz. This means we will have 225 orientations per second. Using matrices and 32-bit floating point numbers, each frame will take 8,100 bytes. Using Euler angles (which we will meet next in Section 8.3), the same data would take only 2,700 bytes. For a mere 30 seconds of animation data, matrices would take 162K more than the same data stored using Euler angles!
• Difficult for humans to use. Matrices are not intuitive for humans to work with directly. There are just too many numbers, and they are all between $-1$ and $+1$ . What's more, humans naturally think about orientation in terms of angles, but a matrix is expressed in terms of vectors. With practice, we can learn how to decipher the orientation from a given matrix. (The techniques from Section 4.2 for visualizing a matrix help a lot for this.) But still, this is much more difficult than Euler angles. And going the other way is much more difficult—it would take forever to construct the matrix for a nontrivial orientation by hand. In general, matrices just aren't the way people naturally think about orientation.
• Matrices can be ill-formed. As we have said, a matrix uses nine numbers, when only three are necessary. In other words, a matrix contains six degrees of redundancy. There are six constraints that must be satisfied for a matrix to be “valid” for representing an orientation. The rows must be unit vectors, and they must be mutually perpendicular (see Section 6.3.2).

Let's consider this last point in more detail. If we take any nine numbers at random and create a $3×3$ matrix, it is very unlikely that these six constraints will be satisfied, and thus the nine numbers will not form a valid rotation matrix. In other words, matrices can be ill-formed, at least for purposes of representing an orientation. Ill-formed matrices can be a problem because they can lead to numerical exceptions, weird stretched graphics, and other unexpected behavior.

How could we ever end up with a bad matrix? There are several ways:

• We may have a matrix that contains scale, skew, reflection, or projection. What is the “orientation” of an object that has been affected by such operations? There really isn't a clear definition for this. Any nonorthogonal matrix is not a well-defined rotation matrix. (See Section 6.3 for a complete discussion on orthogonal matrices.) And reflection matrices (which are orthogonal) are not valid rotation matrices, either.
• We may just get bad data from an external source. For example, if we are using a physical data acquisition system, such as motion capture, there could be errors due to the capturing process. Many modeling packages are notorious for producing ill-formed matrices.
• We can actually create bad data due to floating point round off error. For example, suppose we apply a large number of incremental changes to an orientation, which could routinely happen in a game or simulation that allows a human to interactively control the orientation of an object. The large number of matrix multiplications, which are subject to limited floating point precision, can result in an ill-formed matrix. This phenomenon is known as matrix creep. We can combat matrix creep by orthogonalizing the matrix, as we already discussed in Section 6.3.3.

## 8.2.5Summary of Matrix Form

Let's summarize what Section 8.2 has said about matrices.

• Matrices are a “brute force” method of expressing orientation: we explicitly list the basis vectors of one space in the coordinates of some different space.
• The term direction cosines matrix alludes to the fact that each element in a rotation matrix is equal to the dot product of one input basis vector with one output basis vector. Like all transformation matrices, the rows of the matrix are the output-space coordinates of the input-space basis vectors. Furthermore, the columns of a rotation matrix are the input-space coordinates of the output-space basis vectors, a fact that is only true by virtue of the orthogonality of a rotation matrix.
• The matrix form of representing orientation is useful primarily because it allows us to rotate vectors between coordinate spaces.
• Modern graphics APIs express orientation by using matrices.
• We can use matrix multiplication to collapse matrices for nested coordinate spaces into a single matrix.
• Matrix inversion provides a mechanism for determining the “opposite” angular displacement.
• Matrices can take two to three times as much memory as other techniques. This can become significant when storing large numbers of orientations, such as animation data.
• The numbers in a matrix aren't intuitive for humans to work with.
• Not all matrices are valid for describing an orientation. Some matrices contain mirroring or skew. We can end up with a ill-formed matrix either by getting bad data from an external source or through matrix creep.

# 8.3Euler Angles

Another common method of representing orientation is known as Euler angles. (Remember, Euler is pronounced “oiler,” not “yoolur.”) The technique is named after the famous mathematician who developed them, Leonhard Euler (1707–1783). Section 8.3.1 describes how Euler angles work and discusses the most common conventions used for Euler angles. Section 8.3.2 discusses other conventions for Euler angles, including the important fixed axis system. We consider the advantages and disadvantages of Euler angles in Section 8.3.3 and Section 8.3.4. Section 8.3.5 summarizes the most important concepts concerning of Euler angles.

This section utilizes many ideas, terms, and conventions from Section 7.3.2 concerning spherical coordinates.

## 8.3.1What Are Euler Angles?

The basic idea behind Euler angles is to define an angular displacement as a sequence of three rotations about three mutually perpendicular axes. This sounds complicated, but actually it is quite intuitive. (In fact, its ease of use by humans is one of its primary advantages.)

So Euler angles describe orientation as three rotations about three mutually perpendicular axes. But which axes? And in what order? As it turns out, any three axes in any order will work, but most people have found it practical to use the cardinal axes in a particular order. The most common convention, and the one we use in this book, is the so-called “heading-pitch-bank” convention for Euler angles. In this system, an orientation is defined by a heading angle, a pitch angle, and a bank angle.

Before we define the terms heading, pitch, and bank precisely, let us briefly review the coordinate space conventions we use in this book. We use a left-handed system, where $+x$ is to the right, $+y$ is up, and $+z$ is forward. (Check out Figure 1.15 for an illustration.) Also, if you have forgotten how positive rotation is defined according to the left-hand rule, you might want to flip back to Figure 1.14 to refresh your memory.

Given heading, pitch, and bank angles, we can determine the orientation described by these Euler angles using a simple four-step process.

Step 1.Begin in the “identity” orientation—that is, with the object-space axes aligned with the upright axes.

Figure 8.4Step 1: An object in its identity orientation

Step 2.Perform the heading rotation, rotating about the $y$ -axis, as shown in Figure 8.5. Positive rotation rotates to the right (clockwise when viewed from above).

Figure 8.5 Step 2: Heading is the first rotation and rotates about the vertical axis

Step 3.After heading has been applied, pitch measures the amount of rotation about the $x$ -axis. This is the object-space $x$ -axis, not the upright $x$ -axis. Staying consistent with the left-hand rule, positive rotation rotates downward. In other words, pitch actually measures the angle of declination. This is illustrated in Figure 8.6.

Figure 8.6 Step 3: Pitch is the second rotation and rotates about the object lateral axis

Step 4.After heading and pitch angles have been applied, bank measures the amount of rotation about the $z$ -axis. Again, this is the object-space $z$ -axis, not the original upright-space $z$ -axis. The left-hand rule dictates that positive bank rotates counterclockwise when viewed from the origin looking towards $+z$ . This is illustrated in Figure 8.7.

Figure 8.7 Step 4: Bank is the third and final rotation and rotates about the object longitudinal axis

It may seem contradictory that positive bank is counterclockwise, since positive heading is clockwise. But notice that positive heading is clockwise when viewed from the positive end of the axis towards the origin, the opposite perspective from the one used when judging clockwise/counterclockwise for bank. If we look from the origin to the positive end of the $y$ -axis, then positive heading does rotate counterclockwise. Or if we look from the positive end of the $z$ -axis towards the origin (looking backward from in front of the object), then positive bank appears to rotate the object clockwise. In either case, the left-hand rule prevails.

Now you have reached the orientation described by the Euler angles. Notice the similarity of Steps 1–3 to the procedure used in Section 7.3.2 to locate the direction described by the spherical coordinate angles. In other words, we can think of heading and pitch as defining the basic direction that the object is facing, and bank defining the amount of twist.

## 8.3.2Other Euler Angle Conventions

The heading-pitch-bank system described in the previous section isn't the only way to define a rotation using three angles about mutually perpendicular axes. There are many variations on this theme. Some of these differences turn out to be purely nomenclature; others are more meaningful. Even if you like our conventions, we encourage you to not skip this section, as some very important concepts are discussed; these topics are the source of much confusion, which we hope to dispel.

First of all, there is the trivial issue of naming. The most common variation you will find was made popular by the field of aerospace, the yaw-pitch-roll method.5 The term “roll” is completely synonymous with bank, and for all purposes they are identical. Similarly, within the limited context of yaw-pitch-roll, the term “yaw” is practically identical to the term heading. (However, in a broader sense, the word “yaw” actually has a subtly different meaning, and it is this subtle difference that drives our preference for the term heading. We discuss this rather nit-picky distinction in just a moment, but for the moment yaw and heading are the same.) So essentially yaw-pitch-roll is the same system as heading-pitch-bank.

Other less common terms are often used. Heading also goes by the name azimuth. The vertical angle that we call pitch is also called attitude or elevation. The final angle of rotation, which we call “bank,” is sometimes called tilt or twist.

And, of course, there are those perverse mathematicians who (motivated by the need to save space when writing on a chalkboard?) insist on assaulting your eyeballs with a slew of Greek letters. You may see any of the following:

It's all Greek to us
 $\left(\varphi ,\theta ,\psi \right)$ $\left(\psi ,\theta ,\varphi \right)$ $\left(\mathrm{\Omega },i,\omega \right)$ $\left(\alpha ,\beta \phantom{\rule{thinmathspace}{0ex}}\gamma \right)$ .

Of course, these are cosmetic differences. Perhaps more interesting is that fact that you will often hear these same three words listed in the opposite order: roll-pitch-yaw. (A quick Google search for “roll pitch yaw” or “yaw pitch roll” yields plenty of results for both forms, with neither appearing more predominant.) Considering how the order of rotations is so critical, are people really that perverse that they choose to list them in the reverse order? We're not just dwelling on terminology here; the distinctions in thinking hinted at by the differences in terminology will actually become useful when we consider how to convert Euler angles to a rotation matrix. As it turns out, there is a perfectly reasonable explanation for this “backwards” convention: it's the order in which we actually do the rotations inside a computer!

The fixed-axis system is very closely related to the Euler angle system. In an Euler angle system, the rotation occurs about the body axes, which change after each rotation. Thus, for example, the physical axis for the bank angle is always the longitudinal body space axis, but in general it is arbitrarily oriented in upright space. In a fixed-axis system, in contrast, the axes of rotation are always the fixed, upright axes. But as it turns out, the fixed-axis system and the Euler angle system are actually equivalent, provided that we take the rotations in the opposite order.

You should visualize the following example to convince yourself this is true. Let's say we have a heading (yaw) of $h$ and a pitch of $p$ . (We'll ignore bank/roll for the moment.) According to the Euler angle convention, we first do the heading axis and rotate about the vertical axis (the $y$ -axis) by $h$ . Then we rotate about the object-space lateral axis (the $x$ -axis) by the angle $p$ . Using a fixed-axis scheme, we arrive at this same ending orientation by doing the rotations in the opposite order. First, we do the pitch, rotating about the upright $x$ -axis by $p$ . Then, we perform the heading rotation, rotating about the upright $y$ -axis by $h$ . Although we might visualize Euler angles, inside a computer when rotating vectors from upright space to object space, we actually use a fixed-axis system. We discuss this in greater detail in Section 8.7.1, when we show how to convert Euler angles to a rotation matrix. The fixed-axis conventions are also called extrinsic, the typical Euler angle conventions being referred to as intrinsic.

Euler angles rotate about the body axes, so the axis of rotation for a given step depends on the angles used in prior rotations. In the fixed-axis system, the axes of rotation are always the same—the upright axes. The two systems are equivalent, provided that the rotations are performed in the opposite order.

Now we'd like to make a brief but humble campaign for a more precise use of the term “yaw.” A lot of aeronautical terminology is inherited nautical terminology.6 In a nautical context, the original meaning of the word “yaw” was essentially the same thing as heading, both in terms of absolute angle and also a change in that angle. In the context of airplanes and other freely rotating bodies, however, we don't feel that yaw and heading are the same thing. A yawing motion produces a rotation about the object $y$ -axis, whereas a change in heading produces a rotation about the upright $y$ -axis. For example, when the pilot of an airplane uses the pedals to control the rudder, he is performing a yaw rotation, because the rotation caused by the rudder is always about the object-space $y$ -axis of the plane. Imagine a plane diving straight down. If the pilot performs a $90{}^{\mathrm{o}}$ yaw, the plane will end up “on its ear,” no longer looking downward, but looking towards the horizon, banked $90{}^{\mathrm{o}}$ . This is illustrated in Figure 8.8.

Figure 8.8Heading versus yaw

In contrast, when players navigating a first-person shooter move the mouse from left to right, they are performing a heading rotation. The rotation is always about the vertical axis (the upright $y$ -axis). If players are looking downward and move the mouse horizontally to perform a heading rotation, they continue to look downward and spin in place. The point is certainly not that heading is better than yaw because that's what we do in first-person shooters. The point is that a yawing motion cannot be accomplished by adjusting a single Euler angle, but a heading motion can. That's why we think “heading” is a better term: it's the action that results when you make an incremental change to the first Euler angle.

Alas, the same argument can be leveled against the term “pitch.” If bank is nonzero, an incremental change to the middle Euler angle does not produce a rotation about the object's lateral axis. But then, there isn't really a simple, good word to describe the angle that the object's longitudinal axis makes with the horizontal, which is what the middle Euler angle really specifies. (“Inclination” is no good as it is specific to the right-handed conventions.)

We hope you have read our opinions with the humility we intended, and also have received the more important message: investigating (seemingly cosmetic) differences in convention can sometimes lead us to a deeper understanding of the finer points. And then sometimes it's just plain nit-picking. Generations of aerospace engineers have been putting men on the moon and robots on Mars, and building airplanes that safely shuttle the authors to and from distant cities, all the while using the terms yaw and roll. Would you believe that some of these guys don't even know who we are!? Given the choice to pick your own terminology, we say to use the word “heading” when you can, but if you hear the word “yaw,” then for goodness sake don't make as big of a deal out of it as we have in these pages, especially if the person you are talking to is smarter than you.

Although in this book we do not follow the right-handed aerospace coordinate conventions (and we have a minor quibble about terminology), when it comes to the basic strategy of Euler angles, in a physical sense, we believe complete compliance with the wisdom of the aerospace forefathers is the only way to go, at least if your universe has some notion of “ground.” Remember that, in theory, any three axes can be used as the axes of rotation, in any order. But really, the conventions they chose are the only ones that make any practical sense, if you want the individual angles to be useful and meaningful. No matter how you label your axes, the first angle needs to rotate about the vertical, the second about the body lateral axis, and the third about the body longitudinal axis.

As if these weren't enough complications, let us throw in a few more. In the system we have been describing, each rotation occurs about a different body axes. However, Euler's own original system was a “symmetric” system in which the first and last rotations are performed around the same axis. These methods are more convenient in certain situations, such as describing the motion of a top, where the three angles correspond to precession, nutation, and spin. You may encounter some purists who object to the name “Euler angles” being attached to an asymmetric system, but this usage is widespread in many fields, so rest assured that you outnumber them. To distinguish between the two systems, the symmetric Euler angles are sometimes called “proper” Euler angles, with the more common conventions being called Tait-Bryan angles, first documented by the aerospace forefathers we mentioned [1]. O'Reilly [10] discusses proper Euler angles, even more methods of describing rotation, such as the Rodrigues vector, Cayley-Klein parameters, and interesting historical remarks. James Diebel's summary [3] compares different Euler angle conventions and the other major methods for describing rotation, much as this chapter does, but assumes a higher level of mathematical sophistication.

If you have to deal with Euler angles that use a different convention from the one you prefer, we offer two pieces of advice:

• First, make sure you understand exactly how the other Euler angle system works. Little details such as the definition of positive rotation and order of rotations make a big difference.
• Second, the easiest way to convert the Euler angles to your format is to convert them to matrix form and then convert the matrix back to your style of Euler angles. We will learn how to perform these conversions in Section 8.7. Fiddling with the angles directly is much more difficult than it would seem. See [12] for more information.

## 8.3.3Advantages of Euler Angles

Euler angles parameterize orientation using only three numbers, and these numbers are angles. These two characteristics of Euler angles provide certain advantages over other forms of representing orientation.

• Euler angles are easy for humans to use—considerably easier than matrices or quaternions. Perhaps this is because the numbers in an Euler angle triple are angles, which is naturally how people think about orientation. If the conventions most appropriate for the situation are chosen, then the most practical angles can be expressed directly. For example, the angle of declination is expressed directly by the heading-pitch-bank system. This ease of use is a serious advantage. When an orientation needs to be displayed numerically or entered at the keyboard, Euler angles are really the only choice.
• Euler angles use the smallest possible representation. Euler angles use three numbers to describe an orientation. No system can parameterize 3D orientation using fewer than three numbers. If memory is at a premium, then Euler angles are the most economical way to represent an orientation.
Another reason to choose Euler angles when you need to save space is that the numbers you are storing are more easily compressed. It's relatively easy to pack Euler angles into a smaller number of bits using a trivial fixed-precision system. Because Euler angles are angles, the data loss due to quantization is spread evenly. Matrices and quaternions require using very small numbers, because the values stored are sines and cosines of the angles. The absolute numeric difference between two values is not proportionate to the perceived difference, however, as it is with Euler angles. In general, matrices and quaternions don't pack into a fixed-point system easily.
Bottom line: if you need to store a lot of 3D rotational data in as little memory as possible, as is very common when handling animation data, Euler angles (or the exponential map format—to be discussed in Section 8.4) are the best choices.
• Any set of three numbers is valid. If we take any three numbers at random, they form a valid set of Euler angles that we can interpret as an expression of an orientation. In other words, there is no such thing as an invalid set of Euler angles. Of course, the numbers may not be correct but at least they are valid. This is not the case with matrices and quaternions.

## 8.3.4Disadvantages of Euler Angles

This section discusses some disadvantages of the Euler angle method of representing orientation; primarily,

• The representation for a given orientation is not unique.
• Interpolating between two orientations is problematic.

Let's address these points in detail. First, we have the problem that for a given orientation, there are many different Euler angle triples that can be used to describe that orientation. This is known as aliasing and can be somewhat of an inconvenience. These irritating problems are very similar to those we met dealing with spherical coordinates in Section 7.3.4. Basic questions such as “Do two Euler angle triples represent the same angular displacement?” are difficult to answer due to aliasing.

We've seen one trivial type of aliasing before with polar coordinates: adding a multiple of $360{}^{\mathrm{o}}$ does not change the orientation expressed, even though the numbers are different.

A second and more troublesome form of aliasing occurs because the three angles are not completely independent of each other. For example, pitching down 135° is the same as heading 180°, pitching down 45°, and then banking 180°.

To deal with aliasing of spherical coordinates, we found it useful to establish a canonical set; any given point has a unique representation in the canonical set that is the “official” way to describe that point using polar coordinates. We use a similar technique for Euler angles. In order to guarantee a unique Euler angle representation for any given orientation, we restrict the ranges of the angles. One common technique is to limit heading and bank to $\left(-180{}^{\mathrm{o}},+180{}^{\mathrm{o}}\right]$ and to limit pitch to $\left[-90{}^{\mathrm{o}},+90{}^{\mathrm{o}}\right]$ . For any orientation, there is only one Euler angle triple in the canonical set that represents that orientation. (Actually, there is one more irritating singularity that must be handled, which we describe in just a moment.) Using canonical Euler angles simplifies many basic tests such as “am I facing approximately east?”

The most famous (and irritating) type of aliasing problem suffered by Euler angles is illustrated by this example: if we head right $45{}^{\mathrm{o}}$ and then pitch down $90{}^{\mathrm{o}}$ , this is the same as pitching down 90°and then banking 45°. In fact, once we chose $±90{}^{\mathrm{o}}$ as the pitch angle, we are restricted to rotating about the vertical axis. This phenomenon, in which an angle of $±90{}^{\mathrm{o}}$ for the second rotation can cause the first and third rotations to rotate about the same axis, is known as Gimbal lock. To remove this aliasing from the canonical set of Euler angle triples, we assign all rotation about the vertical axis to heading in the Gimbal lock case. In other words, in the canonical set, if pitch is $±90{}^{\mathrm{o}}$ , then bank is zero.

This last rule for Gimbal lock completes the rules for the canonical set of Euler angles:

Conditions satisfied by Euler angles in the canonical set

When writing C++ that accepts Euler angle arguments, it's usually best to ensure that they work given Euler angles in any range. Luckily this is usually easy; things frequently just work without taking any extra precaution, especially if the angles are fed into trig functions. However, when writing code that computes or returns Euler angles, it's good practice to return the canonical Euler angle triple. The conversion methods shown in Section 8.7 demonstrate these principles.

Figure 8.9 Naïve interpolation can cause excessive rotation
A common misconception is that, because of Gimbal lock, certain orientations cannot be described using Euler angles. Actually, for the purposes of describing an orientation, aliasing doesn't pose any problems. To be clear, any orientation in 3D can be described by using Euler angles, and that representation is unique within the canonical set. Also, as we mentioned in the previous section, there is no such thing as an “invalid” set of Euler angles. Even if the angles are outside the usual range, we can always agree on what orientation is described by the Euler angles.

So for purposes of simply describing orientation, aliasing isn't a huge problem, especially when canonical Euler angles are used. So what's so bad about aliasing and Gimbal lock? Let's say we wish to interpolate between two orientations ${\mathbf{R}}_{0}$ and ${\mathbf{R}}_{1}$ . In other words, for a given parameter $t$ , $0\le t\le 1$ , we wish to compute an intermediate orientation $\mathbf{R}\left(t\right)$ that interpolates smoothly from ${\mathbf{R}}_{0}$ to ${\mathbf{R}}_{1}$ as $t$ varies from $0$ to $1$ . This is an extremely useful operation for character animation and camera control, for example.

Figure 8.10 Naïve interpolation can rotate the long way around.

The naïve approach to this problem is to apply the standard linear interpolation formula (“lerp”) to each of the three angles independently:

Simple linear interpolation between two angles
$\begin{array}{rl}\mathrm{\Delta }\theta & ={\theta }_{1}-{\theta }_{0},\\ {\theta }_{t}& ={\theta }_{0}+t\phantom{\rule{thinmathspace}{0ex}}\mathrm{\Delta }\theta .\end{array}$

This is fraught with problems.

First, if canonical Euler angles are not used, we may have large angle values. For example, imagine the heading of ${\mathbf{R}}_{0}$ , denoted ${h}_{0}$ , is 720°. Assume ${h}_{1}=45{}^{\mathrm{o}}$ . Now, $720{}^{\mathrm{o}}=2×360{}^{\mathrm{o}}$ , which is the same as 0°, so basically the ${h}_{1}$ and ${h}_{2}$ are only 45° apart. However, naïve interpolation will spin around nearly twice in the wrong direction, as shown in Figure 8.9.

Of course, the solution to this problem is to use canonical Euler angles. We could assume that we will always be interpolating between two sets of canonical Euler angles. Or we could attempt to enforce this by converting to canonical values inside our interpolation routine. (Simply wrapping angles within the $\left(-180{}^{\mathrm{o}},+180{}^{\mathrm{o}}\right]$ range is easy, but dealing with pitch values outside the $\left[-90{}^{\mathrm{o}},+90{}^{\mathrm{o}}\right]$ range is more challenging.)

However, even using canonical angles doesn't completely solve the problem. A second type of interpolation problem can occur because of the cyclic nature of rotation angles. Suppose ${h}_{0}=-170{}^{\mathrm{o}}$ and ${h}_{1}=170{}^{\mathrm{o}}$ . Notice that these are canonical values for heading, both in the range $\left(-180{}^{\mathrm{o}},+180{}^{\mathrm{o}}\right]$ . The two heading values are only $20{}^{\mathrm{o}}$ apart, but again, naïve interpolation will not behave correctly, rotating the long way around by a clockwise rotation of $340{}^{\mathrm{o}}$ instead of taking the shorter counterclockwise path of $20{}^{\mathrm{o}}$ , as shown in Figure 8.10.

The solution to this second type of problem is to wrap the differences between angles used in the interpolation equation in the range $\left(-180{}^{\mathrm{o}},+180{}^{\mathrm{o}}\right]$ in order to find the shortest arc. To do this, we introduce the notation

Wrapping an angle between $\mathbf{±}\mathbf{180}{\mathbf{}}^{\mathbf{o}}$
$\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}\left(x\right)=x-360{}^{\mathrm{o}}⌊\left(x+180{}^{\mathrm{o}}\right)/360{}^{\mathrm{o}}⌋,$

where $⌊\cdot ⌋$ denotes the floor function.

The $\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}$ function is a small, sharp tool that every game programmer should have in their toolbox. It elegantly handles common situations in which we must account for the cyclic nature of angles. It works by adding or subtracting the appropriate multiple of $360{}^{\mathrm{o}}$ . Listing 8.1 shows how it would be implemented in C.

float wrapPi(float theta) {

// Check if already in range.  This is not strictly necessary,
// but it will be a very common situation.  We don't want to
// incur a speed hit and perhaps floating precision loss if
// it's not necessary
if (fabs(theta) <= PI) {

// One revolution is 2 PI.
const float TWOPPI = 2.0f*PI;

// Out of range.  Determine how many "revolutions"
// we need to add.
float revolutions = floor((theta + PI) * (1.0f/TWOPPI));

// Subtract it off
theta -= revolutions*TWOPPI;
}

return theta;
}


Let's go back to Euler angles. As expected, using $\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}\left(\right)$ makes it easy to take the shortest arc when interpolating between two angles:

Taking the shortest arc when interpolating between two angles
$\begin{array}{rl}\mathrm{\Delta }\theta & =\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}\left({\theta }_{1}-{\theta }_{0}\right),\\ {\theta }_{t}& ={\theta }_{0}+t\phantom{\rule{thinmathspace}{0ex}}\mathrm{\Delta }\theta .\end{array}$

But even with these two Band-Aids, Euler angle interpolation still suffers from Gimbal lock, which in many situations causes a jerky, unnatural course. The object whips around suddenly and appears to be hung somewhere. The basic problem is that the angular velocity is not constant during the interpolation. If you have never experienced what Gimbal lock looks like, you may be wondering what all the fuss is about. Unfortunately, it is very difficult to fully appreciate the problem from illustrations in a book—you need to experience it in real time. Fortunately, though, it's easy to find an animation demonstrating the problem: just do a youtube.com search for “gimbal lock.”

The first two problems with Euler angle interpolation were irritating, but certainly not insurmountable. Canonical Euler angles and $\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}$ provide relatively simple workarounds. Gimbal lock, unfortunately, is more than a minor nuisance; it's a fundamental problem. Could we perhaps reformulate our rotations and devise a system that does not suffer from these problems? Unfortunately, this is not possible. There is simply an inherent problem with using three numbers to describe 3D orientation. We could change our problems, but not eliminate them. Any system that parameterizes 3-space orientation using three numbers is guaranteed to have singularities in the parameterization space and therefore be subject to problems such as Gimbal lock. The exponential map form (see Section 8.4), a different scheme for parameterizing 3D rotation with three numbers, manages to consolidate the singularities to a single point: the antipodes. This behavior is more benign for certain practical situations, but it does not remove the singularities completely. To do that, we must use quaternions, which are discussed in Section 8.5.

## 8.3.5Summary of Euler Angles

Let's summarize our findings from Section 8.3 about Euler angles.

• Euler angles store orientation by using three angles. These angles are ordered rotations about the three object-space axes.
• The most common system of Euler angles is the heading-pitch-bank system. Heading and pitch tell which way the object is facing—heading gives a “compass reading” and pitch measures the angle of declination. Bank measures the amount of “twist.”
• In a fixed-axis system, the rotations occur about the upright axes rather than the moving body axes. This system is equivalent to Euler angles, provided that we perform the rotations in the opposite order.
• Lots of smart people use lots of different terms for Euler angles, and they can have good reasons for using different conventions.7 It's best not to rely on terminology when using Euler angles. Always make sure you get a precise working definition, or you're likely to get very confused.
• In most situations, Euler angles are more intuitive for humans to work with compared to other methods of representing orientation.
• When memory is at a premium, Euler angles use the minimum amount of data possible for storing an orientation in 3D, and Euler angles are more easily compressed than quaternions.
• There is no such thing as an invalid set of Euler angles. Any three numbers have a meaningful interpretation.
• Euler angles suffer from aliasing problems due to the cyclic nature of rotation angles and because the rotations are not completely independent of one another.
• Using canonical Euler angles can simplify many basic queries on Euler angles. An Euler angle triple is in the canonical set if heading and bank are in the range $\left(-180{}^{\mathrm{o}},+180{}^{\mathrm{o}}\right]$ and pitch is in the range $\left[-90{}^{\mathrm{o}},+90{}^{\mathrm{o}}\right]$ . What's more, if pitch is $±90{}^{\mathrm{o}}$ , then bank is zero.
• Gimbal lock occurs when pitch is $±90{}^{\mathrm{o}}$ . In this case, one degree of freedom is lost because heading and bank both rotate about the vertical axis.
• Contrary to popular myth, any orientation in 3D can be represented by using Euler angles, and we can agree on a unique representation for that orientation within the canonical set.
• The $\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}\mathrm{P}\mathrm{i}$ function is a very handy tool that simplifies situations in which we have to deal with the cyclic nature of angles. Such situations arise frequently in practice, especially in the context of Euler angles, but at other times as well.
• Simple forms of aliasing are irritating, but there are workarounds. Gimbal lock is a more fundamental problem with no easy solution. Gimbal lock is a problem because the parameter space of orientation has a discontinuity. This means small changes in orientation can result in large changes in the individual angles. Interpolation between orientations using Euler angles can freak out or take a wobbly path.

# 8.4Axis-Angle and Exponential MapRepresentations

Euler's name is attached to all sorts of stuff related to rotation (we just discussed Euler angles in Section 8.3). His name is also attached to Euler's rotation theorem, which basically says that any 3D angular displacement can be accomplished via a single rotation about a carefully chosen axis. To be more precise, given any two orientations ${\mathbf{R}}_{1}$ and ${\mathbf{R}}_{2}$ , there exists an axis $\stackrel{^}{\mathbf{n}}$ such that we can get from ${\mathbf{R}}_{1}$ to ${\mathbf{R}}_{2}$ by performing one rotation about $\stackrel{^}{\mathbf{n}}$ . With Euler angles, we need three rotations to describe any orientation, since we are restricted to rotate about the cardinal axis. However, when we are free to choose the axis of rotation, its possible to find one such that only one rotation is needed. Furthermore, as we will show in this section, except for a few minor details, this axis of rotation is uniquely determined.

Euler's rotation theorem leads to two closely related methods for describing orientation. Let's begin with some notation. Assume we have chosen a rotation angle $\theta$ and an axis of rotation that passes through the origin and is parallel to the unit vector $\stackrel{^}{\mathbf{n}}$ . (In this book, positive rotation is defined according to the left-hand rule; see Section 1.3.3.)

Taking the two values $\stackrel{^}{\mathbf{n}}$ and $\theta$ as is, we have described an angular displacement in the axis-angle form. Alternatively, since $\stackrel{^}{\mathbf{n}}$ has unit length, we can multiply it by $\theta$ without loss of information, yielding the single vector $\mathbf{e}=\theta \stackrel{^}{\mathbf{n}}$ . This scheme for describing rotation goes by the rather intimidating and obscure name of exponential map.8 The rotation angle can be deduced from the length of $\mathbf{e}$ ; in other words, $\theta =\parallel \mathbf{e}\parallel$ , and the axis is obtained by normalizing $\mathbf{e}$ . The exponential map is not only more compact than the axis-angle (three numbers instead of four), it elegantly avoids certain singularities and has better interpolation and differentiation properties.

We're not going to discuss the axis-angle and exponential map forms in quite as much detail as the other methods of representing orientation because in practice their use is a bit specialized. The axis-angle format is primarily a conceptual tool. It's important to understand, but the method gets relatively little direct use compared to the other formats. It's one notable capability is that we can directly obtain an arbitrary multiple of the displacement. For example, given a rotation in axis-angle form, we can obtain a rotation that represents one third of the rotation or 2.65 times the rotation, simply by multiplying $\theta$ by the appropriate amount. Of course, we can do this same operation with the exponential map just as easily. Quaternions can do this through exponentiation, but an inspection of the math reveals that it's really using the axis-angle format under the hood. (Even though quaternions claim to be using the exponential map under the hood!) Quaternions can also do a similar operation using slerp, but in a more roundabout way and without the ability for intermediate results to store rotations beyond 180 degrees. We look at quaternions in Section 8.5.

The exponential map gets more use than the axis-angle. First of all, its interpolation properties are nicer than Euler angles. Although it does have singularities (discussed next), they are not as troublesome as for Euler angles. Usually, when one thinks of interpolating rotations, one immediately thinks of quaternions; however, for some applications, such as storage of animation data, the underappreciated exponential map can be a viable alternative [5]. But the most important and frequent use of the exponential map is to store not angular displacement, but rather angular velocity. This is because the exponential map differentiates nicely (which is somewhat related to its nicer interpolation properties) and can represent multiple rotations easily.

Like Euler angles, the axis-angle and exponential map forms exhibit aliasing and singularities, although of a slightly more restricted and benign manner. There is an obvious singularity at the identity orientation, or the quantity “no angular displacement.” In this case, $\theta =0$ , and our choice of axis is irrelevant—any axis may be used. Notice, however, that the exponential map nicely tucks this singularity away, since multiplication by $\theta$ causes $\mathbf{e}$ to vanish, no matter what rotation axis $\stackrel{^}{\mathbf{n}}$ is chosen. Another trivial form of aliasing in axis-angle space can be produced by negating both $\theta$ and $\stackrel{^}{\mathbf{n}}$ . However, the exponential map dodges this issue as well, since negating both $\theta$ and $\stackrel{^}{\mathbf{n}}$ leaves $\mathbf{e}=\theta \stackrel{^}{\mathbf{n}}$ unchanged!

The other aliases cannot be dispatched so easily. As with Euler angles, adding a multiple of $360{}^{\mathrm{o}}$ to $\theta$ produces an angular displacement that results in the same ending orientation, and this form of aliasing affects both the axis-angle and exponential map. However, this is not always a shortcoming—for describing angular velocity, this ability to represent such “extra” rotation is an important and useful property. For example, it's quite important to be able to distinguish between rotation about the $x$ -axis at a rate of $720{}^{\mathrm{o}}$ per second versus rotation about the same axis at a rate of $1080{}^{\mathrm{o}}$ per second, even though these displacements result in the same ending orientation if applied for an integral number of seconds. It is not possible to capture this distinction in quaternion format.

As it turns out, given any angular displacement that can be described by a rotation matrix, the exponential map representation is uniquely determined. Although more than one exponential map may produce the same rotation matrix, it is possible to take a subset of the exponential maps (those for which $\parallel \mathbf{e}\parallel <2\pi$ ) and form a one-to-one correspondence with the rotation matrices. This is the essence of Euler's rotation theorem.

Now let's consider concatenating multiple rotations. Let's say ${\mathbf{e}}_{1}$ and ${\mathbf{e}}_{2}$ are two rotations in exponential map format. The result of performing the rotations in sequence, for example, ${\mathbf{e}}_{1}$ and then ${\mathbf{e}}_{2}$ , is not the same as performing a single rotation ${\mathbf{e}}_{1}+{\mathbf{e}}_{2}$ . We know this cannot be true, because ordinary vector addition is commutative, but three-space rotations are not. Assume that ${\mathbf{e}}_{1}=\left[90{}^{\mathrm{o}},0,0\right]$ , and ${\mathbf{e}}_{2}=\left[0,90{}^{\mathrm{o}},0\right]$ . With our conventions, this is a $90{}^{\mathrm{o}}$ downward pitch rotation, and a $90{}^{\mathrm{o}}$ heading rotation to the east. Performing ${\mathbf{e}}_{1}$ followed by ${\mathbf{e}}_{2}$ , we would end up looking downward with our head pointing east, but doing them in the opposite order, we end up “on our ear” facing east. But what if the angles were much smaller, say $2{}^{\mathrm{o}}$ instead of $90{}^{\mathrm{o}}$ ? Now the ending rotations are more similar. As we take the magnitude of the rotation angles down, the importance of the order decreases, and at the extreme, for “infinitesimal” rotations, the order is completely irrelevant. In other words, for infinitesimal rotations, exponential maps can be added vectorially. Infinitesimals are important topics from calculus, and they are at the heart of defining rate of change. We look at these topics in Chapter 11, but for now, the basic idea is that exponential maps do not add vectorially when used to define an amount of rotation (an angular displacement or an orientation), but they do properly add vectorially when they describe a rate of rotation. This is why exponential maps are perfectly suited for describing angular velocity.

Before we leave this topic, a regretful word of warning regarding terminology. Alternative names for these two simple concepts abound. We have tried to choose the most standard names possible, but it was difficult to find strong consensus. Some authors use the term “axis-angle” to describe both of these (closely related) methods and don't really distinguish between them. Even more confusing is the use of the term “Euler axis” to refer to either form (but not to Euler angles!). “Rotation vector” is another term you might see attached to what we are calling exponential map. Finally, the term “exponential map,” in the broader context of Lie algebra, from whence the term originates, actually refers to an operation (a “map”) rather than a quantity. We apologize for the confusion, but it's not ourfault.

# 8.5Quaternions

The term quaternion is somewhat of a buzzword in 3D math. Quaternions carry a certain mystique—which is a euphemismistic way of saying that many people find quaternions complicated and confusing. We think the way quaternions are presented in most texts contributes to their confusion, and we hope that our slightly different approach will help dispel quaternions' “mystique.”

There is a mathematical reason why using only three numbers to represent a 3-space orientation is guaranteed to cause the problems we discussed with Euler angles, such as Gimbal lock. It has something to do with some fairly advanced9 math terms such as “manifolds.” A quaternion avoids these problems by using four numbers to express an orientation (hence the name quaternion).

This section describes how to use a quaternion to define an angular displacement. We're going to deviate somewhat from the traditional presentation, which emphasizes the interesting (but, in our opinion, nonessential) interpretation of quaternions as complex numbers. Instead, we will be developing quaternions from a primarily geometric perspective. Here's what's in store: First, Section 8.5.1 introduces some basic notation. Section 8.5.2 is probably the most important section—it explains how a quaternion may be interpreted geometrically. Sections 8.5.3 through Section 8.5.11 review the basic quaternion properties and operations, examining each from a geometric perspective. Section 8.5.12 discusses the important slerp operation, which is used to interpolate between two quaternions and is one of the primary advantages of quaternions. Section 8.5.13 discusses the advantages and disadvantages of quaternions. Section 8.5.14 is an optional digression into how quaternions may be interpreted as 4D complex numbers. Section 8.5.15 summaries the properties of quaternions.

Don't be scared off by what seems like a lot of hairy math in this section. The most important things to remember about quaternions are the high-level concepts that are summarized in Section 8.5.15. The nuts and bolts of quaternions are given here to show that everything about quaternions can be derived, and you don't have to take our word on faith. A detailed understanding of quaternions is not really needed in order to use them,9 but you need to understand what quaternions can do.

## 8.5.1Quaternion Notation

A quaternion contains a scalar component and a 3D vector component. We usually refer to the scalar component as $w$ . We may refer to the vector component as a single entity $\mathbf{v}$ or as individual components $x$ , $y$ , and $z$ . Here are examples of both notations:

Two types of quaternion notation
$\left[\begin{array}{cc}w& \mathbf{v}\end{array}\right],\phantom{\rule{.5in}{0ex}}\left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right].$

In some cases it will be convenient to use the shorter notation, using $\mathbf{v}$ , and in some cases the “expanded” version is clearer. This chapter presents most equations in both forms.

We also may write expanded quaternions vertically:

$\left[\begin{array}{c}w\\ \left(\begin{array}{c}x\\ y\\ z\end{array}\right)\end{array}\right].$

Unlike regular vectors, there is no significant distinction between “row” and “column” quaternions. We are free to make the choice strictly for aesthetic purposes.

We denote quaternion variables with the same typeface conventions used for vectors: lowercase letters in bold (e.g., $\mathbf{q}$ ). When vectors and quaternions appear together, the context (and the letters chosen for the variables!) usually make clear which are which.

## 8.5.2What Do Those Four Numbers Mean?

The quaternion form is closely related to the axis-angle and exponential map forms from Section 8.4. Let's briefly review the notation from that section, as the same notation will be used here. The unit vector $\stackrel{^}{\mathbf{n}}$ defines an axis of rotation, and the scalar $\theta$ is the amount of rotation about this axis. Thus, the pair $\left(\theta ,\stackrel{^}{\mathbf{n}}\right)$ define an angular displacement using the axis-angle system. You need a left or right hand11 to determine which way is positive rotation.

A quaternion also contains an axis and angle, but $\stackrel{^}{\mathbf{n}}$ and $\theta$ aren't simply stored in the four numbers of the quaternion directly, as they are in axis angle (that would be too easy!). Instead, they are encoded in a way that at first might seem weird, but turns out to be highly practical. Equation (8.2) shows how the values of a quaternion are related to $\theta$ and $\stackrel{^}{\mathbf{n}}$ , using both forms of quaternion notation:

Geometric meaning of the four values of a quaternion
$\begin{array}{rl}\text{(8.2)}& \left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]& =\left[\begin{array}{cc}\mathrm{cos}\left(\theta /2\right)& \mathrm{sin}\left(\theta /2\right)\stackrel{^}{\mathbf{n}}\end{array}\right],\left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right]& =\left[\begin{array}{cc}\mathrm{cos}\left(\theta /2\right)& \left(\begin{array}{ccc}\mathrm{sin}\left(\theta /2\right){n}_{x}& \mathrm{sin}\left(\theta /2\right){n}_{y}& \mathrm{sin}\left(\theta /2\right){n}_{z}\end{array}\right)\end{array}\right].\end{array}$

Keep in mind that $w$ is related to $\theta$ , but they are not the same thing. Likewise, $\mathbf{v}$ and $\stackrel{^}{\mathbf{n}}$ are related, but not identical.

The next several sections discuss a number of quaternion operations from mathematical and geometric perspectives.

## 8.5.3Quaternion Negation

Quaternions can be negated. This is done in the obvious way of negating each component:

Quaternion negation
$\begin{array}{r}\begin{array}{rl}-\mathbf{q}& =-\left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right]=\left[\begin{array}{cc}-w& \left(\begin{array}{ccc}-x& -y& -z\end{array}\right)\end{array}\right]\\ & =-\left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]=\left[\begin{array}{cc}-w& \mathbf{-}\mathbf{v}\end{array}\right].\end{array}\end{array}$

The surprising fact about negating a quaternion is that it really doesn't do anything, at least in the context of angular displacement.

The quaternions $\mathbf{q}$ and $-\mathbf{q}$ describe the same angular displacement. Any angular displacement in 3D has exactly two distinct representations in quaternion format, and they are negatives of each other.

It's not too difficult to see why this is true. If we add $360{}^{\mathrm{o}}$ to $\theta$ , it doesn't change the angular displacement represented by $\mathbf{q}$ , but it negates all four components of $\mathbf{q}$ .

## 8.5.4Identity Quaternion(s)

Geometrically, there are two “identity” quaternions that represent “no angular displacement.” They are

Identity quaternions
$\left[\begin{array}{cc}1& \mathbf{0}\end{array}\right]\phantom{\rule{.25in}{0ex}}\text{and}\phantom{\rule{.25in}{0ex}}\left[\begin{array}{cc}-1& \mathbf{0}\end{array}\right].$

(Note the boldface zero, which indicates the zero vector.) When $\theta$ is an even multiple of 360°, then $\mathrm{cos}\left(\theta /2\right)=1$ , and we have the first form. If $\theta$ is an odd multiple of 360°, then $\mathrm{cos}\left(\theta /2\right)=-1$ , and we have the second form. In both cases, $\mathrm{sin}\left(\theta /2\right)$ = 0, so the value of $\stackrel{^}{\mathbf{n}}$ is irrelevant. This makes intuitive sense; if the rotation angle $\theta$ is a whole number of complete revolutions about any axis, then no real change is made to the orientation.

Algebraically, there is really only one identity quaternion: $\left[1,\mathbf{0}\right]$ . When we multiply any quaternion $\mathbf{q}$ by the identity quaternion, the result is $\mathbf{q}$ . (We present quaternion multiplication in Section 8.5.7.) When we multiply a quaternion $\mathbf{q}$ by the other “geometric identity” quaternion $\left[-1,\mathbf{0}\right]$ , we get $-\mathbf{q}$ . Geometrically, this results in the same quaternion, since $\mathbf{q}$ and $-\mathbf{q}$ represent the same angular displacement. Mathematically, however, $\mathbf{q}$ and $-\mathbf{q}$ are not equal, so $\left[-1,\mathbf{0}\right]$ is not a “true” identity quaternion.

## 8.5.5Quaternion Magnitude

We can compute the magnitude of a quaternion, just as we can for vectors and complex numbers. The notation and formula shown in Equation (8.4) are similar to those used for vectors:

Quaternion magnitude
$\begin{array}{}\text{(8.4)}& \begin{array}{rl}\parallel \mathbf{q}\parallel & =\parallel \left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right]\parallel =\sqrt{{w}^{2}+{x}^{2}+{y}^{2}+{z}^{2}}\\ & =\parallel \left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]\parallel =\sqrt{{w}^{2}+{\parallel \mathbf{v}\parallel }^{2}}.\end{array}\end{array}$

Let's see what this means geometrically for a rotation quaternion:

Rotation quaternions have unit magnitude

This is an important observation.

For our purposes of using quaternions to represent orientation, all quaternions are so-called unit quaternions, which have a magnitude equal to unity.

For information concerning nonnormalized quaternions, see the technical report by Dam et al. [2].

## 8.5.6Quaternion Conjugate and Inverse

The conjugate of a quaternion, denoted ${\mathbf{q}}^{\ast }$ , is obtained by negating the vector portion of the quaternion:

Quaternion conjugate
$\begin{array}{rl}\text{(8.5)}& {\mathbf{q}}^{\ast }& ={\left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]}^{\ast }=\left[\begin{array}{cc}w& -\mathbf{v}\end{array}\right]& ={\left[\begin{array}{cc}w& \left(\begin{array}{ccc}x& y& z\end{array}\right)\end{array}\right]}^{\ast }=\left[\begin{array}{cc}w& \left(\begin{array}{ccc}-x& -y& -z\end{array}\right)\end{array}\right].\end{array}$

The term “conjugate” is inherited from the interpretation of a quaternion as a complex number. We look at this interpretation in more detail in Section 8.5.14.

The inverse of a quaternion, denoted ${\mathbf{q}}^{-1}$ , is defined as the conjugate of a quaternion divided by its magnitude:

Quaternion inverse
$\begin{array}{}\text{(8.6)}& {\mathbf{q}}^{-1}=\frac{{\mathbf{q}}^{\ast }}{\parallel \mathbf{q}\parallel }.\end{array}$

The quaternion inverse has an interesting correspondence with the multiplicative inverse for real numbers (scalars). For real numbers, the multiplicative inverse ${a}^{-1}$ is $1/a$ . In other words, $a\left({a}^{-1}\right)={a}^{-1}a=1$ . The same applies to quaternions. When we multiply a quaternion $\mathbf{q}$ by its inverse ${\mathbf{q}}^{-1}$ , we get the identity quaternion $\left[1,\mathbf{0}\right]$ . (We discuss quaternion multiplication in Section 8.5.7.)

Equation (8.6) is the official definition of quaternion inverse. However, if you are interested only in quaternions that represent pure rotations, like we are in this book, then all the quaternions are unit quaternions and so the conjugate and inverse are equivalent.

The conjugate (inverse) is interesting because $\mathbf{q}$ and ${\mathbf{q}}^{\ast }$ represent opposite angular displacements. It is easy to see why this is the case. By negating $\mathbf{v}$ , we are negating the axis of rotation $\stackrel{^}{\mathbf{n}}$ . This doesn't change the axis in the physical sense, since $\stackrel{^}{\mathbf{n}}$ and $-\stackrel{^}{\mathbf{n}}$ are parallel. However, it does flip the direction that we consider to be positive rotation. Thus, $\mathbf{q}$ rotates about an axis by an amount $\theta$ , and ${\mathbf{q}}^{\ast }$ rotates in the opposite direction by the same amount.

For our purposes, an alternative definition of quaternion conjugate could have been to negate $w$ , leaving $\mathbf{v}$ (and thus $\stackrel{^}{\mathbf{n}}$ ) unchanged. This would negate the amount of rotation $\theta$ , rather than reversing what is considered positive rotation by flipping the axis of rotation. This would have been equivalent to the definition given in Equation (8.5) (for our geometric purposes, at least) and provided for a slightly more intuitive geometric interpretation. However, the term conjugate has a special significance in the context of complex numbers, so let's stick with the original definition.

## 8.5.7Quaternion Multiplication

Quaternions can be multiplied. The result is similar to the cross product for vectors, in that it yields another quaternion (not a scalar), and it is not commutative. However, the notation is different: we denote quaternion multiplication simply by placing the two operands side-by-side. The formula for quaternion multiplication can be easily derived based upon the definition of quaternions as complex numbers (see Exercise 6), but we state it here without development, using both quaternion notations:

Quaternion product
$\begin{array}{rl}{\mathbf{q}}_{1}{\mathbf{q}}_{2}& =\left[\begin{array}{cc}{w}_{1}& \left(\begin{array}{ccc}{x}_{1}& {y}_{1}& {z}_{1}\end{array}\right)\end{array}\right]\left[\begin{array}{cc}{w}_{2}& \left(\begin{array}{ccc}{x}_{2}& {y}_{2}& {z}_{2}\end{array}\right)\end{array}\right]\\ \text{(8.5.7)}& & =\left[\begin{array}{c}{w}_{1}{w}_{2}-{x}_{1}{x}_{2}-{y}_{1}{y}_{2}-{z}_{1}{z}_{2}\\ \left(\begin{array}{c}{w}_{1}{x}_{2}+{x}_{1}{w}_{2}+{y}_{1}{z}_{2}-{z}_{1}{y}_{2}\\ {w}_{1}{y}_{2}+{y}_{1}{w}_{2}+{z}_{1}{x}_{2}-{x}_{1}{z}_{2}\\ {w}_{1}{z}_{2}+{z}_{1}{w}_{2}+{x}_{1}{y}_{2}-{y}_{1}{x}_{2}\end{array}\right)\end{array}\right]& =\left[\begin{array}{cc}{w}_{1}& {\mathbf{v}}_{1}\end{array}\right]\left[\begin{array}{cc}{w}_{2}& {\mathbf{v}}_{2}\end{array}\right]\\ & =\left[\begin{array}{cc}{w}_{1}{w}_{2}-{\mathbf{v}}_{1}\cdot {\mathbf{v}}_{2}& {w}_{1}{\mathbf{v}}_{2}+{w}_{2}{\mathbf{v}}_{1}+{\mathbf{v}}_{1}×{\mathbf{v}}_{2}\end{array}\right].\end{array}$

The quaternion product is also known as the Hamilton product; you'll understand why after reading about the history of quaternions in Section 8.5.14.

Let's quickly mention three properties of quaternion multiplication, all of which can be easily shown by using the definition given above. First, quaternion multiplication is associative, but not commutative:

Quaternion multiplication is associative, but not commutative
$\begin{array}{rl}\left(\mathbf{a}\mathbf{b}\right)\mathbf{c}& =\mathbf{a}\left(\mathbf{b}\mathbf{c}\right),\\ \mathbf{a}\mathbf{b}& \ne \mathbf{b}\mathbf{a}.\end{array}$

Second, the magnitude of a quaternion product is equal to the product of the magnitudes (see Exercise 9):

Magnitude of quaternion product
$\parallel {\mathbf{q}}_{1}{\mathbf{q}}_{2}\parallel =\parallel {\mathbf{q}}_{1}\parallel \parallel {\mathbf{q}}_{2}\parallel .$

This is very significant because it guarantees us that when we multiply two unit quaternions, the result is a unit quaternion.

Finally, the inverse of a quaternion product is equal to the product of the inverses taken in reverse order:

Inverse of quaternion product
$\begin{array}{rl}\left(\mathbf{a}\mathbf{b}{\right)}^{-1}& ={\mathbf{b}}^{-1}{\mathbf{a}}^{-1},\\ \left({\mathbf{q}}_{1}{\mathbf{q}}_{2}\cdots {\mathbf{q}}_{n-1}{\mathbf{q}}_{n}{\right)}^{-1}& ={{\mathbf{q}}_{n}}^{-1}{{\mathbf{q}}_{n-1}}^{-1}\cdots {{\mathbf{q}}_{2}}^{-1}{{\mathbf{q}}_{1}}^{-1}.\end{array}$

Now that we know some basic properties of quaternion multiplication, let's talk about why the operation is actually useful. Let us “extend” a standard 3D point $\left(x,y,z\right)$ into quaternion space by defining the quaternion $\mathbf{p}=\left[0,\left(x,y,z\right)\right]$ . In general, $\mathbf{p}$ is not a valid rotation quaternion, since it can have any magnitude. Let $\mathbf{q}$ be a rotation quaternion in the form we have been discussing, $\left[\mathrm{cos}\theta /2,\stackrel{^}{\mathbf{n}}\mathrm{sin}\theta /2\right]$ , where $\stackrel{^}{\mathbf{n}}$ is a unit vector axis of rotation, and $\theta$ is the rotation angle. It is surprising to realize that we can rotate the 3D point $\mathbf{p}$ about $\stackrel{^}{\mathbf{n}}$ by performing the rather odd-looking

quaternion multiplication

Using quaternion multiplication to rotate a 3D vector
$\begin{array}{}\text{(8.7)}& {\mathbf{p}}^{\prime }=\mathbf{q}\mathbf{p}{\mathbf{q}}^{-1}.\end{array}$

We could prove this by expanding the multiplication, substituting in $\stackrel{^}{\mathbf{n}}$ and $\theta$ , and comparing the result to the matrix we derived to rotate about an arbitrary axis (Equation (5.1.3)), and indeed this is the approach taken in most texts on quaternions. While this certainly is an effective way to verify that the trick works, it leaves us wondering how the heck somebody could have ever stumbled upon it in the first place. In Section 8.7.3, we derive the conversion from quaternion to matrix form in a straightforward way, solely from the geometry of the rotations and without referring to $\mathbf{q}\mathbf{p}{\mathbf{q}}^{-1}$ . As for how the association was discovered, we cannot say for sure, but we will offer a train of thought that can lead a person to discover the connection between this strange product and rotations in Section 8.5.14. This discussion also explains how a person might have discovered that it would be fruitful to use half of the rotation angle for the components.

As it turns out, the correspondence between quaternion multiplication and 3D vector rotations is more of a theoretical interest than a practical one. Some people (“quaternio-philes?”) like to attribute quaternions with the useful property that vector rotations are immediately accessible by using Equation (8.7). To the quaternion lovers, we admit that this compact notation is an advantage of sorts, but its practical benefit in computations is dubious. If you actually work through this math, you will find that it is just about the same number of operations involved as converting the quaternion to the equivalent rotation matrix (by using Equation (8.20), which is developed in Section 8.7.3) and then multiplying the vector by this matrix. Because of this, we don't consider quaternions to possess any direct ability to rotate vectors, at least for practical purposes in acomputer.

Although the correspondence between $\mathbf{q}\mathbf{p}{\mathbf{q}}^{-1}$ and rotation is not of direct practical importance, it is of supreme theoretical importance. It leads us to a slightly different use of quaternion multiplication, and this use is highly practical in programming. Examine what happens when multiple rotations are applied to a vector. We'll rotate the vector $\mathbf{p}$ by the quaternion $\mathbf{a}$ , and then rotate that result by another quaternion $\mathbf{b}$ :

Concatenating multiple rotations with quaternion algebra
$\begin{array}{rl}{\mathbf{p}}^{\prime }& =\mathbf{b}\left(\mathbf{a}\mathbf{p}{\mathbf{a}}^{-1}\right){\mathbf{b}}^{-1}\\ & =\left(\mathbf{b}\mathbf{a}\right)\mathbf{p}\left({\mathbf{a}}^{-1}{\mathbf{b}}^{-1}\right)\\ & =\left(\mathbf{b}\mathbf{a}\right)\mathbf{p}\left(\mathbf{b}\mathbf{a}{\right)}^{-1}.\end{array}$

Notice that rotating by $\mathbf{a}$ and then by $\mathbf{b}$ is equivalent to performing a single rotation by the quaternion product $\mathbf{b}\mathbf{a}$ . This is a key observation.

Quaternion multiplication can be used to concatenate multiple rotations, just like matrix multiplication.

We say “just like matrix multiplication,” but in fact there is a slightly irritating difference. With matrix multiplication, our preference to use row vectors puts the vectors on the left, resulting in the nice property that concatenated rotations read left-to-right in the order of transformation. With quaternions, we don't have this flexibility: concatenation of multiple rotations will always read “inside out” from right to left.12

## 8.5.8Quaternion “Difference”

Using the quaternion multiplication and inverse, we can compute the difference between two quaternions, with “difference” meaning the angular displacement from one orientation to another. In other words, given orientations $\mathbf{a}$ and $\mathbf{b}$ , we can compute the angular displacement $\mathbf{d}$ that rotates from $\mathbf{a}$ to $\mathbf{b}$ . This can be expressed compactly as

$\mathbf{d}\mathbf{a}=\mathbf{b}.$

(Remember that quaternion multiplication performs the rotations from right-to-left.)

Let's solve for $\mathbf{d}$ . If the variables in the equation represented scalars, we could simply divide by $\mathbf{a}$ . However, we can't divide quaternions; we can only multiply them. Perhaps multiplication by the inverse will achieve the desired effect? Multiplying both sides by ${\mathbf{a}}^{-1}$ on the right (we have to be careful since quaternion multiplication is not commutative) gives us

The quaternion “difference”
$\begin{array}{rl}\left(\mathbf{d}\mathbf{a}\right){\mathbf{a}}^{-1}& =\mathbf{b}{\mathbf{a}}^{-1},\\ \mathbf{d}\left(\mathbf{a}{\mathbf{a}}^{-1}\right)& =\mathbf{b}{\mathbf{a}}^{-1},\\ \mathbf{d}\phantom{\rule{thinmathspace}{0ex}}\left[\begin{array}{cc}1& \mathbf{0}\end{array}\right]& =\mathbf{b}{\mathbf{a}}^{-1},\\ \mathbf{d}& =\mathbf{b}{\mathbf{a}}^{-1}.\end{array}$

Now we have a way to generate a quaternion that represents the angular displacement from one orientation to another. We use this in Section 8.5.12, when we explore slerp.

Mathematically, the angular difference between two quaternions is actually more similar to a division than a true difference (subtraction).

## 8.5.9Quaternion Dot Product

The dot product operation is defined for quaternions. The notation and definition for this operation is very similar to the vector dot product:

Quaternion dot product
$\begin{array}{r}\begin{array}{rl}{\mathbf{q}}_{1}\cdot {\mathbf{q}}_{2}& =\left[\begin{array}{cc}{w}_{1}& {\mathbf{v}}_{1}\end{array}\right]\cdot \left[\begin{array}{cc}{w}_{2}& {\mathbf{v}}_{2}\end{array}\right]\\ & ={w}_{1}{w}_{2}+{\mathbf{v}}_{1}\cdot {\mathbf{v}}_{2}\\ & =\left[\begin{array}{cc}{w}_{1}& \left(\begin{array}{ccc}{x}_{1}& {y}_{1}& {z}_{1}\end{array}\right)\end{array}\right]\cdot \left[\begin{array}{cc}{w}_{2}& \left(\begin{array}{ccc}{x}_{2}& {y}_{2}& {z}_{2}\end{array}\right)\end{array}\right]\\ & ={w}_{1}{w}_{2}+{x}_{1}{x}_{2}+{y}_{1}{y}_{2}+{z}_{1}{z}_{2}.\end{array}\end{array}$

Like the vector dot product, the result is a scalar. For unit quaternions $\mathbf{a}$ and $\mathbf{b}$ , $-1\le \mathbf{a}\cdot \mathbf{b}\le 1$ .

The dot product is perhaps not one of the most frequently used quaternion operators, at least in video game programming, but it does have an interesting geometric interpretation. In Section 8.5.8, we considered the difference quaternion $\mathbf{d}=\mathbf{b}{\mathbf{a}}^{\ast }$ , which describes the angular displacement from orientation $\mathbf{a}$ to orientation $\mathbf{b}$ . (We assume unit quaternions and replace the quaternion inverse with the conjugate.) If we expand the product and examine the contents of $\mathbf{d}$ , we find that the $w$ component is equal to the dot product $\mathbf{a}\cdot \mathbf{b}$ !

What does this mean geometrically? Remember Euler's rotation theorem: we can rotate from the orientation $\mathbf{a}$ into the orientation $\mathbf{b}$ via a single rotation about a carefully chosen axis. This uniquely determined (up to a reversal of sign) axis and angle are precisely the ones encoded in $\mathbf{d}$ . Remembering the relationship between the $w$ component and the rotation angle $\theta$ , we see that $\mathbf{a}\cdot \mathbf{b}=\mathrm{cos}\left(\theta /2\right)$ , where $\theta$ is the amount of rotation needed to go from the orientation $\mathbf{a}$ to the orientation $\mathbf{b}$ .

In summary, the quaternion dot product has an interpretation similar to the vector dot product. The larger the absolute value of the quaternion dot product $\mathbf{a}\cdot \mathbf{b}$ , the more “similar” are the angular displacements represented by $\mathbf{a}$ and $\mathbf{b}$ . While the vector dot product gives the cosine of the angle between vectors, the quaternion dot product gives the cosine of half of the angle needed to rotate one quaternion into the other. For the purpose of measuring similarity, usually we are interested only in the absolute value of $\mathbf{a}\cdot \mathbf{b}$ , since $\mathbf{a}\cdot \mathbf{b}=-\left(\mathbf{a}\cdot -\mathbf{b}\right)$ , even though $\mathbf{b}$ and $-\mathbf{b}$ represent the same angular displacement.

Although direct use of the dot product is infrequent in most video game code, the dot product is the first step in the calculation of the slerp function, which we discuss in Section 8.5.12.

## 8.5.10Quaternion log, exp, and Multiplication by a Scalar

This section discusses three operations on quaternions that, although they are seldom used directly, are the basis for several important quaternion operations. These operations are the quaternion logarithm, exponential, and multiplication by a scalar.

First, let us reformulate our definition of a quaternion by introducing a variable $\alpha$ to equal the half-angle, $\theta /2$ :

Defining a quaternion in terms of the half-angle $\mathbit{\alpha }$
$\begin{array}{rlrl}\alpha & =\theta /2,& \mathbf{q}& =\left[\begin{array}{cc}\mathrm{cos}\alpha & \stackrel{^}{\mathbf{n}}\mathrm{sin}\alpha \end{array}\right].\end{array}$

The logarithm of $\mathbf{q}$ is defined as

The logarithm of a quaternion
$\begin{array}{rl}\mathrm{log}\mathbf{q}& =\mathrm{log}\left(\left[\begin{array}{cc}\mathrm{cos}\alpha & \stackrel{^}{\mathbf{n}}\mathrm{sin}\alpha \end{array}\right]\right)\equiv \left[\begin{array}{cc}0& \alpha \stackrel{^}{\mathbf{n}}\end{array}\right].\end{array}$

We use the notation $\equiv$ to mean equal by definition. In general, $\mathrm{log}\mathbf{q}$ is not a unit quaternion. Note the similarity between taking the logarithm of a quaternion, and the exponential map format (see Section 8.4).

The exponential function is defined in the exact opposite manner. First we define the quaternion $\mathbf{p}$ to be of the form $\left[0,\alpha \stackrel{^}{\mathbf{n}}\right]$ , with $\stackrel{^}{\mathbf{n}}$ a unit vector:

$\begin{array}{rlrl}\mathbf{p}& =\left[\begin{array}{cc}0& \alpha \stackrel{^}{\mathbf{n}}\end{array}\right],& & \left(\parallel \stackrel{^}{\mathbf{n}}\parallel =1\right).\end{array}$

Then the exponential function is defined as

The exponential function of a quaternion
$\begin{array}{rl}\mathrm{exp}\mathbf{p}& =\mathrm{exp}\left(\left[\begin{array}{cc}0& \alpha \stackrel{^}{\mathbf{n}}\end{array}\right]\right)\equiv \left[\begin{array}{cc}\mathrm{cos}\alpha & \stackrel{^}{\mathbf{n}}\mathrm{sin}\alpha \end{array}\right].\end{array}$

Note that, by definition, $\mathrm{exp}\mathbf{p}$ always returns a unit quaternion.

The quaternion logarithm and exponential are related to their scalar analogs. For any scalar $a$ ,

${e}^{\mathrm{ln}a}=a.$

In the same way, the quaternion exp function is defined to be the inverse of the quaternion log function:

$\mathrm{exp}\left(\phantom{\rule{thinmathspace}{0ex}}\mathrm{log}\mathbf{q}\phantom{\rule{thinmathspace}{0ex}}\right)=\mathbf{q}.$

Finally, quaternions can be multiplied by a scalar, with the result computed in the obvious way of multiplying each component by the scalar. Given a scalar $k$ and a quaternion $\mathbf{q}$ ,

Multiplying a quaternion by a scalar
$\begin{array}{rl}k\mathbf{q}& =k\left[\begin{array}{cc}w& \mathbf{v}\end{array}\right]=\left[\begin{array}{cc}kw& k\mathbf{v}\end{array}\right].\end{array}$

This will not usually result in a unit quaternion, which is why multiplication by a scalar is not a very useful operation in the context of representing angular displacement. (But we will find a use for it in Section 8.5.11.)

## 8.5.11Quaternion Exponentiation

Quaternions can be exponentiated, which means that we can raise a quaternion to a scalar power. Quaternion exponentiation, denoted ${\mathbf{q}}^{t}$ , should not be confused with the exponential function $\mathrm{exp}\mathbf{q}$ . The exponential function accepts only one argument: a quaternion. Quaternion exponentiation has two arguments: the quaternion $\mathbf{q}$ and the scalar exponent $t$ .

The meaning of quaternion exponentiation is similar to that of real numbers. Recall that for any scalar $a$ , besides zero, ${a}^{0}=1$ and ${a}^{1}=a$ . As the exponent $t$ varies from $0$ to $1$ the value of ${a}^{t}$ varies from $1$ to $a$ . A similar statement holds for quaternion exponentiation: as $t$ varies from $0$ to $1$ the quaternion exponentiation ${\mathbf{q}}^{t}$ varies from $\left[1,\mathbf{0}\right]$ to $\mathbf{q}$ .

Quaternion exponentiation is useful because it allows us to extract a “fraction” of an angular displacement. For example, to compute a quaternion that represents one third of the angular displacement represented by the quaternion $\mathbf{q}$ , we would compute ${\mathbf{q}}^{1/3}$ .

Exponents outside the $\left[0,1\right]$ range behave mostly as expected—with one major caveat. For example, ${\mathbf{q}}^{2}$ represents twice the angular displacement as $\mathbf{q}$ . If $\mathbf{q}$ represents a clockwise rotation of 30° about the $x$ -axis, then ${\mathbf{q}}^{2}$ represents a clockwise rotation of 60° about the $x$ -axis, and ${\mathbf{q}}^{-1/3}$ represents a counterclockwise rotation of 10° about the $x$ -axis. Notice in particular that the inverse notation ${\mathbf{q}}^{-1}$ can also be interpreted in this context and the result is the same: the quaternion that performs the opposite rotation.

The caveat we mentioned is this: a quaternion represents angular displacements using the shortest arc. Multiple spins cannot be represented. Continuing our example above, ${\mathbf{q}}^{8}$ is not a 240° clockwise rotation about the $x$ -axis as expected; it is a 120°\ counterclockwise rotation. Of course, rotating 240° in one direction produces the same end result as rotating 120° in the opposite direction, and this is the point: quaternions really capture only the end result. In general, many of the algebraic identities concerning exponentiation of scalars, such as $\left({a}^{s}{\right)}^{t}={a}^{st}$ , do not apply to quaternions.

In some situations, we do care about the total amount of rotation, not just the end result. (The most important example is that of angular velocity.) In these situations, quaternions are not the correct tool for the job; use the exponential map (or its cousin, the axis-angle format) instead.

Now that we understand what quaternion exponentiation is used for, let's see how it is mathematically defined. Quaternion exponentiation is defined in terms of the “utility” operations we learned in the previous section. The definition is given by

Raising a quaternion to a power
$\begin{array}{}\text{(8.8)}& {\mathbf{q}}^{t}=\mathrm{exp}\left(t\mathrm{log}\mathbf{q}\right).\end{array}$

Notice that a similar statement is true regarding exponentiation of a scalar:

${a}^{t}={e}^{\left(t\mathrm{ln}a\right)}.$

It is not too difficult to understand why ${\mathbf{q}}^{t}$ interpolates from identity to $\mathbf{q}$ as $t$ varies from $0$ to $1$ . Notice that the $\mathrm{log}$ operation essentially converts the quaternion to exponential map format (except for a factor of 2). Then, when we perform the scalar multiplication by the exponent $t$ , the effect is to multiply the angle by $t$ . Finally, the $\mathrm{exp}$ “undoes” what the $\mathrm{log}$ operation did, recalculating the new $w$ and $\mathbf{v}$ from the exponential vector. At least this is how it works academically in an equation. Although Equation (8.8) is the official mathematical definition and works elegantly in theory, direct translation into code is more complicated than necessary. Listing 8.2 shows how we could compute the value of ${\mathbf{q}}^{t}$ in C. Essentially, instead of working with a single exponential-map-like quantity as the formula tells us to, we break out the axis and half-angle separately.

// Quaternion (input and output)
float w,x,y,z;

// Input exponent
float exponent;

// Check for the case of an identity quaternion.
// This will protect against divide by zero
if (fabs(w) < .9999f) {

// Extract the half angle alpha (alpha = theta/2)
float alpha = acos(w);

// Compute new alpha value
float newAlpha = alpha * exponent;

// Compute new w value
w = cos(newAlpha);

// Compute new xyz values
float mult = sin(newAlpha) / sin(alpha);
x *= mult;
y *= mult;
z *= mult;
}


There are a few points to notice about this code. First, the check for the identity quaternion is necessary since a value of $w=±1$ would cause the computation of mult to divide by zero. Raising an identity quaternion to any power results in the identity quaternion, so if we detect an identity quaternion on input, we simply ignore the exponent and return the original quaternion.

Second, when we compute alpha, we use the $\mathrm{arccos}$ function, which always returns a positive angle. This does not create a loss of generality. Any quaternion can be interpreted as having a positive angle of rotation, since negative rotation about an axis is the same as positive rotation about the axis pointing in the opposite direction.

## 8.5.12Quaternion Interpolation, a.k.a. Slerp

The raison d'être of quaternions in games and graphics today is an operation known as slerp, which stands for Spherical Linear interpolation. The slerp operation is useful because it allows us to smoothly interpolate between two orientations. Slerp avoids all the problems that plagued interpolation of Euler angles (see Section 8.3.4).

Slerp is a ternary operator, meaning it accepts three operands. The first two operands to slerp are the two quaternions between which we wish to interpolate. We'll assign these starting and ending orientations to the variables ${\mathbf{q}}_{0}$ and ${\mathbf{q}}_{1}$ , respectively. The interpolation parameter will be assigned to the variable $t$ , and as $t$ varies from $0$ to $1$ , the slerp function $\mathrm{s}\mathrm{l}\mathrm{e}\mathrm{r}\mathrm{p}\left({\mathbf{q}}_{0},{\mathbf{q}}_{1},t\right)$ returns an orientation that interpolates from ${\mathbf{q}}_{0}$ to ${\mathbf{q}}_{1}$ .

Let's see if we can't derive the slerp formula by using the tools we have so far. If we were interpolating between two scalar values ${a}_{0}$ and ${a}_{1}$ , we could use the standard linear interpolation (lerp) formula:

Simple linear interpolation
$\begin{array}{rl}\mathrm{\Delta }a& ={a}_{1}-{a}_{0},\\ \mathrm{l}\mathrm{e}\mathrm{r}\mathrm{p}\left({a}_{0},{a}_{1},t\right)& ={a}_{0}+t\phantom{\rule{thinmathspace}{0ex}}\mathrm{\Delta }a.\end{array}$

The standard linear interpolation formula works by starting at ${a}_{0}$ and adding the fraction $t$ of the difference between ${a}_{1}$ and ${a}_{0}$ . This requires three basic steps:

1. Compute the difference between the two values.
2. Take a fraction of this difference.
3. Take the original value and adjust it by this fraction of the difference.

We can use the same basic idea to interpolate between orientations. (Again, remember that quaternion multiplication reads right-to-left.)

1. Compute the difference between the two values. We showed how to do this in Section 8.5.8. The angular displacement from ${\mathbf{q}}_{0}$ to ${\mathbf{q}}_{1}$ is given by
$\mathrm{\Delta }\mathbf{q}={\mathbf{q}}_{1}{{\mathbf{q}}_{0}}^{-1}.$
2. Take a fraction of this difference. To do this, we use quaternion exponentiation, which we discussed in Section 8.5.11. The fraction of the difference is given by
$\left(\mathrm{\Delta }\mathbf{q}{\right)}^{t}.$
3. Take the original value and adjust it by this fraction of the difference. We “adjust” the initial value by composing the angular displacements via quaternion multiplication:
$\left(\mathrm{\Delta }\mathbf{q}{\right)}^{t}{\mathbf{q}}_{0}.$

Thus, the equation for slerp is given by

Quaternion slerp in theory
$\mathrm{s}\mathrm{l}\mathrm{e}\mathrm{r}\mathrm{p}\left({\mathbf{q}}_{0},{\mathbf{q}}_{1},t\right)=\left({\mathbf{q}}_{1}{{\mathbf{q}}_{0}}^{-1}{\right)}^{t}{\mathbf{q}}_{0}.$

This algebraic form is how slerp is computed in theory. In practice, we use a formulation that is mathematically equivalent, but computationally more efficient. To derive this alternative formula, we start by interpreting the quaternions as existing in a 4D Euclidian space. Since all of the quaternions of interest are unit quaternions, they “live” on the surface of a 4D hypersphere. The basic idea is to interpolate around the arc that connects the two quaternions, along the surface of the 4D hypersphere. (Hence the name spherical linear interpolation.)

We can visualize this in the plane (see Figure 8.11). Imagine two 2D vectors ${\mathbf{v}}_{0}$ and ${\mathbf{v}}_{1}$ , both of unit length. We wish to compute the value of ${\mathbf{v}}_{t}$ , which is the result of smoothly interpolating around the arc by a fraction $t$ of the distance from ${\mathbf{v}}_{0}$ to ${\mathbf{v}}_{1}$ . If we let $\omega$ 13 be the angle intercepted by the arc from ${\mathbf{v}}_{0}$ to ${\mathbf{v}}_{1}$ , then ${\mathbf{v}}_{t}$ is the result of rotating ${\mathbf{v}}_{0}$ around this arc by an angle of $t\omega$ .

Figure 8.11Interpolating a rotation

We can express ${\mathbf{v}}_{t}$ as a linear combination of ${\mathbf{v}}_{0}$ and ${\mathbf{v}}_{1}$ . In other words, there exist nonnegative constants ${k}_{0}$ and ${k}_{1}$ such that ${\mathbf{v}}_{t}={k}_{0}{\mathbf{v}}_{0}+{k}_{1}{\mathbf{v}}_{1}$ . We can use elementary geometry to determine the values of ${k}_{0}$ and ${k}_{1}$ . Figure 8.12 shows how this can be done.

Figure 8.12Interpolating a vector about an arc

Applying some trig to the right triangle with ${k}_{1}{\mathbf{v}}_{1}$ as the hypotenuse (and recalling that ${\mathbf{v}}_{1}$ is a unit vector), we see that

$\begin{array}{rl}\mathrm{sin}\omega & =\frac{\mathrm{sin}t\omega }{{k}_{1}},\\ {k}_{1}& =\frac{\mathrm{sin}t\omega }{\mathrm{sin}\omega }.\end{array}$

A similar technique to solve for ${k}_{0}$ yields the following result:

${k}_{0}=\frac{\mathrm{sin}\left(1-t\right)\omega }{\mathrm{sin}\omega }.$

Thus, ${\mathbf{v}}_{t}$ can be expressed as

${\mathbf{v}}_{t}={k}_{0}{\mathbf{v}}_{0}+{k}_{1}{\mathbf{v}}_{1}=\frac{\mathrm{sin}\left(1-t\right)\omega }{\mathrm{sin}\omega }{\mathbf{v}}_{0}+\frac{\mathrm{sin}t\omega }{\mathrm{sin}\omega }{\mathbf{v}}_{1}.$

The same basic idea can be extended into quaternion space, and we can reformulate the slerp as

Quaternion slerp in practice
$\mathrm{s}\mathrm{l}\mathrm{e}\mathrm{r}\mathrm{p}\left({\mathbf{q}}_{0},{\mathbf{q}}_{1},t\right)=\frac{\mathrm{sin}\left(1-t\right)\omega }{\mathrm{sin}\omega }{\mathbf{q}}_{0}+\frac{\mathrm{sin}t\omega }{\mathrm{sin}\omega }{\mathbf{q}}_{1}.$

We just need a way to compute $\omega$ , the “angle” between the two quaternions. As it turns out, an analogy from 2D vector math can be carried into quaternion space; we can think of the quaternion dot product as returning $\mathrm{cos}\omega$ .

There are two slight complications. First, the two quaternions $\mathbf{q}$ and $-\mathbf{q}$ represent the same orientation, but may produce different results when used as an argument to slerp. This problem doesn't happen in 2D or 3D, but the surface of a 4D hypersphere has a different topology than Euclidian space. The solution is to choose the signs of ${\mathbf{q}}_{0}$ and ${\mathbf{q}}_{1}$ such that the dot product ${\mathbf{q}}_{0}\cdot {\mathbf{q}}_{1}$ is nonnegative. This has the effect of always selecting the shortest rotational arc from ${\mathbf{q}}_{0}$ to ${\mathbf{q}}_{1}$ . The second complication is that if ${\mathbf{q}}_{0}$ and ${\mathbf{q}}_{1}$ are very close, then $\omega$ is very small, and thus $\mathrm{sin}\omega$ is also very small, which will cause problems with the division. To avoid this, if $\mathrm{sin}\omega$ is very small, we will use simple linear interpolation. The code snippet in Listing 8.3 applies all of this advice to compute the quaternion slerp.

// The two input quaternions
float w0,x0,y0,z0;
float w1,x1,y1,z1;

// The interpolation parameter
float t;

// The output quaternion will be computed here
float w,x,y,z;

// Compute the "cosine of the angle" between the
// quaternions, using the dot product
float cosOmega = w0*w1 + x0*x1 + y0*y1 + z0*z1;

// If negative dot, negate one of the input
// quaternions, to take the shorter 4D "arc"
if (cosOmega < 0.0f) {
w1 = -w1;
x1 = -x1;
y1 = -y1;
z1 = -z1;
cosOmega = -cosOmega;
}

// Check if they are very close together, to protect
// against divide-by-zero
float k0, k1;
if (cosOmega > 0.9999f) {

// Very close - just use linear interpolation
k0 = 1.0f-t;
k1 = t;

} else {

// Compute the sin of the angle using the
// trig identity sin^2(omega) + cos^2(omega) = 1
float sinOmega = sqrt(1.0f - cosOmega*cosOmega);

// Compute the angle from its sine and cosine
float omega = atan2(sinOmega, cosOmega);

// Compute inverse of denominator, so we only have
// to divide once
float oneOverSinOmega = 1.0f / sinOmega;

// Compute interpolation parameters
k0 = sin((1.0f - t) * omega) * oneOverSinOmega;
k1 = sin(t * omega) * oneOverSinOmega;
}

// Interpolate
w = w0*k0 + w1*k1;
x = x0*k0 + x1*k1;
y = y0*k0 + y1*k1;
z = z0*k0 + z1*k1;


Quaternions offer a number of advantages over other methods of representing angular displacement:

• Smooth interpolation. The interpolation provided by slerp provides smooth interpolation between orientations. No other representation method provides for smooth interpolation.
• Fast concatenation and inversion of angular displacements. We can concatenate a sequence of angular displacements into a single angular displacement by using the quaternion cross product. The same operation using matrices involves more scalar operations, although which one is actually faster on a given architectures is not so clean-cut: single instruction multiple data (SIMD) vector operations can make very quick work of matrix multiplication. The quaternion conjugate provides a way to compute the opposite angular displacement very efficiently. This can be done by transposing a rotation matrix, but is not easy with Euler angles.
• Fast conversion to and from matrix form. As we see in Section 8.7, quaternions can be converted to and from matrix form a bit faster than Euler angles.
• Only four numbers. Since a quaternion contains four scalar values, it is considerably more economical than a matrix, which uses nine numbers. (However, it still is 33%larger than Euler angles.)

These advantages do come at some cost, however. Quaternions suffer from a few of the problems that affect matrices, only to a lesser degree:

• Slightly bigger than Euler angles. That one additional number may not seem like much, but an extra 33%can make a difference when large amounts of angular displacements are needed, for example, when storing animation data. And the values inside a quaternion are not “evenly spaced” along the $\left[-1,+1\right]$ interval; the component values do not interpolate smoothly, even if the orientation does. This makes quaternions more difficult to pack into a fixed-point number than Euler angles or an exponential map.
• Can become invalid. This can happen either through bad input data, or from accumulated floating point roundoff error. (We can address this problem by normalizing the quaternion to ensure that it has unit magnitude.)
• Difficult for humans to work with. Of the three representation methods, quaternions are the most difficult for humans to work with directly.

## 8.5.14Quaternions as Complex Numbers

We end our discussion on quaternions in the place that most texts begin: a discussion of their interpretation as complex numbers. If you are interested in quaternions solely for rotations, you can safely skip this section. If you want a bit deeper understanding or are interested in the mathematical heritage of quaternions and the circumstances that surrounded their invention, this section will be interesting. We will be following an approach due to John McDonald of DePaul University [9]. Among other things, this method is able to explain two peculiarities of quaternions: the appearance of $\theta /2$ rather than $\theta$ and the unusual mathematical form $\mathbf{q}\mathbf{v}{\mathbf{q}}^{-1}$ :

We begin by considering how we can embed the set of real numbers in the set of $2×2$ matrices. For any given scalar $a$ , we associate it with exactly one $2×2$ matrix, namely the matrix that has $a$ on both of the diagonal elements:

Each real scalar maps to a $\mathbf{2}\mathbf{×}\mathbf{2}$ matrix
$a\equiv \left[\begin{array}{cc}a& 0\\ 0& a\end{array}\right].$

We have chosen a subset of the $2×2$ matrices, and established a one-to-one correspondence between this smaller set of matrices and the set of all real numbers. We could have established this one-to-one relationship in other ways, but this particular way of doing it is important because it preserves all the ordinary algebra laws of addition, subtraction, and multiplication: the associative property, distributive property, nonfactorability of zero, and so on. (We can even include division if we treat division as multiplication by the inverse.) For example,

Addition, subtraction, and multiplication work the same
$\begin{array}{rl}\left[\begin{array}{cc}a& 0\\ 0& a\end{array}\right]+\left[\begin{array}{cc}b& 0\\ 0& b\end{array}\right]& =\left[\begin{array}{cc}a+b& 0\\ 0& a+b\end{array}\right],\\ \left[\begin{array}{cc}a& 0\\ 0& a\end{array}\right]-\left[\begin{array}{cc}b& 0\\ 0& b\end{array}\right]& =\left[\begin{array}{cc}a-b& 0\\ 0& a-b\end{array}\right],\\ \left[\begin{array}{cc}a& 0\\ 0& a\end{array}\right]\left[\begin{array}{cc}b& 0\\ 0& b\end{array}\right]& =\left[\begin{array}{cc}ab& 0\\ 0& ab\end{array}\right].\end{array}$

Now let's see if we can create a similar mapping for the set of complex numbers. You probably already have been introduced to complex numbers; if so, you should remember that the complex pair $\left(a,b\right)$ defines the number $a+bi$ . The number $i$ is a special number such that ${i}^{2}=-1$ . It's often called the imaginary number because no ordinary scalar (a “real” number) can have this property. The word “imaginary” gives one the impression that the number doesn't really exist; we're going avoid this term and instead stick with the more descriptive one: “complex.”

Complex numbers can be added, subtracted, and multiplied. All we need to do is follow the ordinary rules for arithmetic, and replace ${i}^{2}$ with $-1$ whenever it appears. This results in the following identities:

Adding, subtracting, and multiplying complex numbers
$\begin{array}{rl}\left(a+bi\right)+\left(c+di\right)& =\left(a+c\right)+\left(b+d\right)i,\\ \left(a+bi\right)-\left(c+di\right)& =\left(a-c\right)+\left(b-d\right)i,\\ \left(a+bi\right)\left(c+di\right)& =ac+adi+bci+bd{i}^{2}\\ & =ac+\left(ad+bc\right)i+bd\left(-1\right)\\ & =\left(ac-bd\right)+\left(ad+bc\right)i.\end{array}$

Now, how can we extend our system of embedding numbers in the space of $2×2$ matrices to include complex numbers? Before, we only had one degree of freedom, $a$ , and now we have two, $a$ and $b$ . The mapping we use is

Mapping each complex number to a $\mathbf{2}\mathbf{×}\mathbf{2}$ matrix
$\begin{array}{}\text{(8.9)}& a+bi\equiv \left[\begin{array}{cc}a& -b\\ b& a\end{array}\right].\end{array}$

We can easily verify that the complex number on the left behaves exactly the same as the matrix on the right. In a certain sense, they are just two notations for writing the same quantity:

Addition, subtraction, and multiplication in standard notation and our $\mathbf{2}\mathbf{×}\mathbf{2}$ form
$\begin{array}{rl}\left(a+bi\right)+\left(c+di\right)& \equiv \left[\begin{array}{cc}a& -b\\ b& a\end{array}\right]+\left[\begin{array}{cc}c& -d\\ d& c\end{array}\right]=\left[\begin{array}{cc}a+c& -\left(b+d\right)\\ b+d& a+c\end{array}\right]\\ & \equiv \left(a+c\right)+\left(b+d\right)i,\\ \left(a+bi\right)-\left(c+di\right)& \equiv \left[\begin{array}{cc}a& -b\\ b& a\end{array}\right]-\left[\begin{array}{cc}c& -d\\ d& c\end{array}\right]=\left[\begin{array}{cc}a-c& -\left(b-d\right)\\ b-d& a-c\end{array}\right]\\ & \equiv \left(a-c\right)+\left(b-d\right)i,\\ \left(a+bi\right)\left(c+di\right)& \equiv \left[\begin{array}{cc}a& -b\\ b& a\end{array}\right]\left[\begin{array}{cc}c& -d\\ d& c\end{array}\right]=\left[\begin{array}{cc}ac-bd& -\left(ad+bc\right)\\ ad+bc& ac-bd\end{array}\right]\\ & \equiv \left(ac-bd\right)+\left(ad+bc\right)i.\end{array}$

We also verify that the equation ${i}^{2}=-1$ still holds:

$\mathbit{i}$ doesn't seem quite as “imaginary” in $\mathbf{2}\mathbf{×}\mathbf{2}$ form
${i}^{2}\equiv {\left[\begin{array}{cc}0& -1\\ 1& 0\end{array}\right]}^{2}=\left[\begin{array}{cc}0& -1\\ 1& 0\end{array}\right]\left[\begin{array}{cc}0& -1\\ 1& 0\end{array}\right]=\left[\begin{array}{cc}-1& 0\\ 0& -1\end{array}\right]\equiv -1.$

Let's apply the geometric perspective from Chapter 5. Interpreting the columns14 $\left[0,1\right]$ and $\left[-1,0\right]$ as the basis vectors of a coordinate space, we see that this matrix performs a $90{}^{\mathrm{o}}$ rotation.

We can interpret multiplication by $i$ as a $90{}^{\mathrm{o}}$ rotation.14

There's nothing “imaginary” about this. Instead of thinking of $i$ as the square root of $-1$ , think instead of the complex number $a+bi$ as a mathematical entity with two degrees of freedom that behaves in particular ways when multiplied. The part we usually call the “real” part, $a$ , is the main degree of freedom, and $b$ measures some secondary degree of freedom. The two degrees of freedom are in some sense “orthogonal” to one another.

Continuing this further, we see that we can represent rotations by any arbitrary angle $\theta$ using this scheme. The basic $2×2$ rotation matrix derived in Section 5.1.1 happens to be in this special set of matrices that we are mapping to the complex numbers. It maps to the complex number $\mathrm{cos}\theta +i\mathrm{sin}\theta$ :

Unit complex numbers as rotations
$\mathrm{cos}\theta +i\mathrm{sin}\theta \equiv \left[\begin{array}{cc}\mathrm{cos}\theta & -\mathrm{sin}\theta \\ \mathrm{sin}\theta & \mathrm{cos}\theta \end{array}\right].$

Notice how complex conjugation (negating the complex part) corresponds to matrix transposition. This is particularly pleasing. Remember that the conjugate of a quaternion expresses the inverse angular displacement. A corresponding fact is true for transposing rotation matrices: since they are orthogonal, their transpose is equal to their inverse.

How do ordinary 2D vectors fit into this scheme? We interpret the vector $\left[x,y\right]$ as the complex number $x+iy$ , and then we can interpret the multiplication of two complex numbers

$\begin{array}{rl}\left(\mathrm{cos}\theta +i\mathrm{sin}\theta \right)\left(x+iy\right)& =x\mathrm{cos}\theta +iy\mathrm{cos}\theta +ix\mathrm{sin}\theta +{i}^{2}y\mathrm{sin}\theta \\ & =\left(x\mathrm{cos}\theta -y\mathrm{sin}\theta \right)+i\left(x\mathrm{sin}\theta +y\mathrm{cos}\theta \right)\end{array}$

as performing a rotation. This is equivalent to the matrix multiplication

$\left[\begin{array}{cc}\mathrm{cos}\theta & -\mathrm{sin}\theta \\ \mathrm{sin}\theta & \mathrm{cos}\theta \end{array}\right]\left[\begin{array}{c}x\\ y\end{array}\right]=\left[\begin{array}{c}x\mathrm{cos}\theta -y\mathrm{sin}\theta \\ x\mathrm{sin}\theta +y\mathrm{cos}\theta \end{array}\right].$

While this a not much more that mathematical trivia so far, our goal is to build up some parallels that we can carry forward to quaternions, so let's repeat the key result.

In 2D, we can interpret the vector $\left[x,y\right]$ as a complex number $x+yi$ and rotate it by using the complex multiplication $\left(\mathrm{cos}\theta +i\mathrm{sin}\theta \right)\left(x+iy\right)$ .

A similar conversion from ordinary vectors to complex numbers is necessary in order to multiply quaternions and 3D vectors.

Before we leave 2D, let's summarize what we've learned so far. Complex numbers are mathematical objects with two degrees of freedom that obey certain rules when we multiply them. These objects are usually written as $a+bi$ , but can equivalently be written as a $2×2$ matrix. When we write complex numbers as matrices, it begs the geometric interpretation of multiplication by $i$ as a $90{}^{\mathrm{o}}$ rotation. The rule ${i}^{2}=-1$ has the interpretation that combining two $90{}^{\mathrm{o}}$ rotations yields a $180{}^{\mathrm{o}}$ rotation, and that leaves us with a warm fuzzy feeling. More generally, any complex number with unit length can be written as $\mathrm{cos}\theta +i\mathrm{sin}\theta$ and interpreted as a rotation by the angle $\theta$ . If we convert a 2D vector into complex form and multiply it by $\mathrm{cos}\theta +i\mathrm{sin}\theta$ , it has the effect of performing the rotation.

It's very tempting to extend this trick from 2D into 3D. Tempting, but alas not possible in the straightforward way. The Irish mathematician William Hamilton (1805–1865) apparently fell victim to just this temptation, and had looked for a way to extend complex numbers from 2D to 3D for years. This new type of complex number, he thought, would have one real part and two imaginary parts. However, Hamilton was unable to create a useful type of complex number with two imaginary parts. Then, as the story goes, in 1843, on his way to a speech at the Royal Irish Academy, he suddenly realized that three imaginary parts were needed rather than two. He carved the equations that define the properties of this new type of complex number on the Broom Bridge. His original marks have faded into legend, but a commemorative plaque holds their place. Thus quaternions were invented.

Since we weren't on the Broom Bridge in 1843, we can't say for certain what made Hamilton realize that a 3D system of complex numbers was no good, but we can show how such a set could not be easily mapped to $3×3$ matrices and rotations. A 3D complex number would have two complex parts, $i$ and $j$ , with the properties ${i}^{2}={j}^{2}=-1$ . (We would also need to define the values of the products $ij$ and $ji$ . Exactly what these rules should be, we're not sure; perhaps Hamilton realized this was a dead end. In any case, it doesn't matter for the present discussion.) Now, a straightforward extension of the ideas from 2D would mean that we could somehow associate the numbers 1, $i$ , and $j$ with the set of $3×3$ matrices, such that all the usual algebra laws hold. The number 1 must obviously map to the 3D identity matrix ${\mathbf{I}}_{3}$ . The number $-1$ should map to its negative, $-{\mathbf{I}}_{3}$ , which has $-1$ s on the diagonal. But now we run into a problem trying to a find matrices for $i$ and $j$ whose square is $-{\mathbf{I}}_{3}$ . We can quickly see that this is not possible because the determinant of $-{\mathbf{I}}_{3}$ is $-1$ . To be a root of this matrix, $i$ or $j$ must have a determinant that is the square root of $-1$ because the determinant of a matrix product is the product of the determinants. The only way this can work is for $i$ and $j$ to contain entries that are complex. In short, there doesn't seem to be a coherent system of 3D complex numbers; certainly there isn't one that maps elegantly to rotations analogously to standard complex numbers and 2D rotations. For that, we need quaternions.

Quaternions extend the complex number system by having three imaginary numbers, $i$ , $j$ , and $k$ , which are related by Hamilton's famous equations:

The rules for 4D complex numbers that Hamilton wrote on the Broom Bridge
$\begin{array}{}\text{(8.10)}& \begin{array}{rl}& {i}^{2}={j}^{2}={k}^{2}=-1\\ & ij=k,\phantom{\rule{1em}{0ex}}ji=-k,\\ & jk=i,\phantom{\rule{1em}{0ex}}kj=-i,\\ & ki=j,\phantom{\rule{1em}{0ex}}ik=-j.\end{array}\end{array}$

The quaternion we have been denoting $\left[w,\left(x,y,z\right)\right]$ corresponds to the complex number $w+xi+yj+zk$ . The definition of the quaternion product given in Section 8.5.7 follows from these rules. (Also see Exercise 6.) The dot product, however, basically ignores all of this complex $i,j,k$ business and treats the operands as simple 4D vectors.

Now we return to matrices. Can we embed the set of quaternions into the set of matrices such that Hamilton's rules in Equation (8.10) still hold? Yes, we can, although, as you might expect, we map them to $4×4$ matrices. Real numbers are mapped to a matrix with the number on each entry of the diagonal as before,

$a\equiv \left[\begin{array}{cccc}a& 0& 0& 0\\ 0& a& 0& 0\\ 0& 0& a& 0\\ 0& 0& 0& a\end{array}\right],$

and the complex quantities are mapped to the matrices

Mapping the three complex quantities to $\mathbf{4}\mathbf{×}\mathbf{4}$ matrices
$\begin{array}{}\text{(8.11)}& i& \equiv \left[\begin{array}{cccc}0& 0& 0& 1\\ 0& 0& -1& 0\\ 0& 1& 0& 0\\ -1& 0& 0& 0\end{array}\right],& j& \equiv \left[\begin{array}{cccc}0& 0& 1& 0\\ 0& 0& 0& 1\\ -1& 0& 0& 0\\ 0& -1& 0& 0\end{array}\right],& k& \equiv \left[\begin{array}{cccc}0& -1& 0& 0\\ 1& 0& 0& 0\\ 0& 0& 0& 1\\ 0& 0& -1& 0\end{array}\right].\end{array}$

We encourage you to convince yourself that these mappings do preserve all of Hamilton's rules before moving on.

Combining the above associations, we can map an arbitrary quaternion to a $4×4$ matrix as

Encoding quaternions as $\mathbf{4}\mathbf{×}\mathbf{4}$ matrices
$\begin{array}{}\text{(8.12)}& w+xi+yj+zk\equiv \left[\begin{array}{cccc}w& -z& y& x\\ z& w& -x& y\\ -y& x& w& z\\ -x& -y& -z& w\end{array}\right].\end{array}$

Once again, we notice how complex conjugation (negating $x$ , $y$ , and $z$ ) corresponds to matrix transposition.

Everything we've said so far applies to quaternions of any length. Now let's get back to rotations. We can see that the $i$ , $j$ , $k$ matrices in Equation (8.11) permute and negate axes, so they bear some similarity to $90{}^{\mathrm{o}}$ rotations or reflections. Let's see if we can carry the simple ideas from 2D forward with these matrices. Notice how the upper left $2×2$ portion of the $k$ matrix is the same as the very first $2×2$ matrix for $i$ ; in other words, part of $k$ is a $90{}^{\mathrm{o}}$ rotation about $z$ . By analogy with the 2D case, we might reasonably expect the quaternion $\mathrm{cos}\theta +k\mathrm{sin}\theta$ to represent a rotation about the $z$ -axis by an arbitrary angle $\theta$ . Let's multiply it by the vector $\left[1,0,0\right]$ and see what happens. As in the 2D case, we need to “promote” the vector into the complex domain; what's different here is that quaternions have an extra number. We'll map $\left[x,y,z\right]$ to the complex number $0+xi+yj+zk$ , so the vector $\left[1,0,0\right]$ is simply $i$ . Expanding the multiplication, we have

$\begin{array}{rl}\left(\mathrm{cos}\theta +k\mathrm{sin}\theta \right)i& =i\mathrm{cos}\theta +ki\mathrm{sin}\theta ,\\ & =i\mathrm{cos}\theta +j\mathrm{sin}\theta ,\end{array}$

which corresponds to $\left[\mathrm{cos}\theta ,\mathrm{sin}\theta ,0\right]$ , exactly what we would expect when rotating the $x$ -axis about the $z$ -axis. So far, all is good. Let's try a slightly more general vector $\left[1,0,1\right]$ , which is represented in the complex domain as $i+k$ :

$\begin{array}{rl}\left(\mathrm{cos}\theta +k\mathrm{sin}\theta \right)\left(i+k\right)& =i\mathrm{cos}\theta +k\mathrm{cos}\theta +ki\mathrm{sin}\theta +{k}^{2}\mathrm{sin}\theta \\ \text{(8.13)}& & =i\mathrm{cos}\theta +j\mathrm{sin}\theta +k\mathrm{cos}\theta -\mathrm{sin}\theta .\end{array}$

This result does not correspond to a vector at all, since it has a nonzero value for $w$ . The rotation in the $xy$ -plane worked as expected, but unfortunately, the $z$ component did not come out right. There is unwanted rotation in the $zw$ -hyperplane. This is made perfectly clear by looking at how $\left(\mathrm{cos}\theta +k\mathrm{sin}\theta \right)$ is represented as a $4×4$ matrix:

$\begin{array}{r}\mathrm{cos}\theta +k\mathrm{sin}\theta \equiv \left[\begin{array}{cccc}\mathrm{cos}\theta & -\mathrm{sin}\theta & 0& 0\\ \mathrm{sin}\theta & \mathrm{cos}\theta & 0& 0\\ 0& 0& \mathrm{cos}\theta & \mathrm{sin}\theta \\ 0& 0& -\mathrm{sin}\theta & \mathrm{cos}\theta \end{array}\right].\end{array}$

The upper-left $2×2$ rotation matrix is the one we want; the lower-right $2×2$ rotation matrix is not wanted.

Now we are left wondering if maybe we did something wrong. Perhaps there are other $4×4$ roots of $-1$ we could use for $i$ , $j$ , and $k$ —alternative ways that we could embed the quaternion set within the set of $4×4$ matrices. In fact, there are other alternatives, and this is a hint that something is a bit different from the 2D case. Unfortunately, all of these alternatives exhibit variations of what is essentially the same behavior we are seeing here. Perhaps, instead, our problem is that we did the multiplication in the wrong order. (After all, multiplication of $i$ , $j$ , and $k$ is not commutative.) Let's try putting the vector on the left and the rotation quaternion on the right:

$\begin{array}{rl}\left(i+k\right)\left(\mathrm{cos}\theta +k\mathrm{sin}\theta \right)& =i\mathrm{cos}\theta +ik\mathrm{sin}\theta +k\mathrm{cos}\theta +{k}^{2}\mathrm{sin}\theta \\ & =i\mathrm{cos}\theta -j\mathrm{sin}\theta +k\mathrm{cos}\theta -\mathrm{sin}\theta .\end{array}$

Comparing this to Equation (8.13), when the operands were in the opposite order, we see that the only difference is the sign of the $y$ -coordinate. At first glance, it looks like this is actually worse. The rotation in the $xz$ -plane that we want got inverted; now we have a rotation by $-\theta$ . Meanwhile, the extra rotation we didn't want is exactly the same as before. But perhaps you can already see the solution. If we use the opposite rotation, which corresponds to using the conjugate of the quaternion, we fix both problems:

$\left(i+k\right)\left(\mathrm{cos}\theta -k\mathrm{sin}\theta \right)=i\mathrm{cos}\theta +j\mathrm{sin}\theta -k\mathrm{cos}\theta +\mathrm{sin}\theta .$

So, multiplying on the left by $\left(\mathrm{cos}\theta +k\mathrm{sin}\theta \right)$ produced the rotation we wanted, plus some extra rotation we didn't want, and multiplication on the right by the conjugate achieved the same rotation that was desired, with the opposite undesired rotation. If we combine these two steps, the unwanted rotation is canceled out, and we are left with only the rotation we want. Well, not quite, we are left with twice the rotation we want, but this is easily fixed by using $\theta /2$ instead of $\theta$ . Of course, we knew that $\theta /2$ would appear somewhere, but now we see the reason. Let's summarize our findings from the preceding paragraphs.

To extend the ideas about complex numbers and rotations from 2D to quaternions, we first convert the vector $\left[x,y,z\right]$ to quaternion form as $\mathbf{v}=\left[0,\left(x,y,z\right)\right]$ . A straightforward approach to rotate the vector by the angle $\theta$ about the axis $\stackrel{^}{\mathbf{n}}$ would be to create the quaternion $\mathbf{q}=\left[\mathrm{cos}\theta ,\mathrm{sin}\theta \phantom{\rule{thinmathspace}{0ex}}\stackrel{^}{\mathbf{n}}\right]$ and then perform the multiplication $\mathbf{q}\mathbf{v}$ . This, however, does not work; while the result contains the rotation we want, it also contains an unwanted rotation into $w$ . The multiplication $\mathbf{v}{\mathbf{q}}^{\ast }$ also produces the rotation we want plus some unwanted rotation, but in this case the unwanted rotation is exactly opposite of that produced by $\mathbf{q}\mathbf{v}$ . The solution is to use the half angle and set $\mathbf{q}=\left[\mathrm{cos}\left(\theta /2\right),\mathrm{sin}\left(\theta /2\right)\phantom{\rule{thinmathspace}{0ex}}\stackrel{^}{\mathbf{n}}\right]$ , and the rotation is accomplished by performing both multiplications: $\mathbf{q}\mathbf{v}{\mathbf{q}}^{\ast }$ . The first rotation rotates halfway to the goal, plus some unwanted rotation involving $w$ . The second rotation completes the desired rotation while also canceling the unwanted rotation.

Before we leave this section, let us go back and clear up one last finer point. We mentioned that there are other ways we could embed the set of quaternions within the set of $4×4$ matrices. (Equations  (8.11) and (8.12) aren't the only way to do it.) McDonald [9] explores this idea in more detail; here we merely want to note that this is another underlying cause of the need for $\mathbf{q}\mathbf{v}{\mathbf{q}}^{-1}$ . Using just a single multiplication, the variations in the embedding would produce variations in the rotated result. When both multiplications are present, the change from one style to another produces a change on the left that is exactly canceled by the matching change on the right.

## 8.5.15Summary of Quaternions

Section 8.5 has covered a lot of math, and most of it isn't important to remember. The facts that are important to remember about quaternions are summarized here.

• Conceptually, a quaternion expresses angular displacement by using an axis of rotation and an amount of rotation about that axis.
• A quaternion contains a scalar component $w$ and a vector component $\mathbf{v}$ . They are related to the angle of rotation $\theta$ and the axis of rotation $\stackrel{^}{\mathbf{n}}$ by
$\begin{array}{rlrl}w& =\mathrm{cos}\left(\theta /2\right),& \mathbf{v}& =\stackrel{^}{\mathbf{n}}\mathrm{sin}\left(\theta /2\right).\end{array}$
• Every angular displacement in 3D has exactly two different representations in quaternion space, and they are negatives of each other.
• The identity quaternion, which represents “no angular displacement,” is $\left[1,\mathbf{0}\right]$ .
• All quaternions that represent angular displacement are “unit quaternions” with magnitude equal to 1.
• The conjugate of a quaternion expresses the opposite angular displacement and is computed by negating the vector portion $\mathbf{v}$ . The inverse of a quaternion is the conjugate divided by the magnitude. If you use quaternions only to describe angular displacement (as we do in this book), then the conjugate and inverse are equivalent.
• Quaternion multiplication can be used to concatenate multiple rotations into a single angular displacement. In theory, quaternion multiplication can also be used to perform 3D vector rotations, but this is of little practical value.
• Quaternion exponentiation can be used to calculate a multiple of an angular displacement. This always captures the correct end result; however, since quaternions always take the shortest arc, multiple revolutions cannot be represented.
• Quaternions can be interpreted as 4D complex numbers, which creates interesting and elegant parallels between mathematics and geometry.

A lot more has been written about quaternions than we have had the space to discuss here. The technical report by Dam et al [2] is a good mathematical summary. Kuiper's book [8] is written from an aerospace perspective and also does a good job of connecting quaternions and Euler angles. Hanson's modestly titled Visualizing Quaternions [6] analyzes quaternions using tools from several different disciplines (Riemannian Geometry, complex numbers, lie algebra, moving frames) and is sprinkled with interesting engineering and mathematical lore; it also discusses how to visualize quaternions. A shorter presentation on visualizing quaternions is given by Hart et al. [7].

# 8.6Comparison of Methods

Let's review the most important discoveries from the previous sections. Table 8.1 summarizes the differences among the three representation methods.

• Rotating points between coordinate spaces (object and upright)
• Matrix: Possible; can often by highly optimized by SIMD instructions.
• Euler Angles: Impossible (must convert to rotation matrix).
• Exponential Map: Impossible (must convert to rotation matrix).
• Quaternion: On a chalkboard, yes. Practically, in a computer, not really. You might as well convert to rotation matrix.
• Concatenation of multiple rotations
• Matrix: Possible; can often be highly optimized by SIMD instructions. Watch out for matrix creep.
• Euler Angles: Impossible.
• Exponential Map: Impossible.
• Quaternion: Possible. Fewer scalar operations than matrix multiplication, but maybe not as easy to take advantage of SIMD instructions. Watch out for error creep.
• Inversion of rotations
• Matrix: Easy and fast, using matrix transpose.
• Euler Angles: Not easy.
• Exponential Map: Easy and fast, using vector negation.
• Quaternion: Easy and fast, using quaternion conjugate.
• Interpolation
• Matrix: Extremely problematic.
• Euler Angles: Possible, but Gimbal lock causes quirkiness.
• Exponential Map: Possible, with some singularities, but not as troublesome as Euler angles.
• Quaternion: Slerp provides smooth interpolation.
• Direct human interpretation
• Matrix: Difficult.
• Euler Angles: Easiest.
• Exponential Map: Very difficult.
• Quaternion: Very difficult.
• Storage efficiency in memory or in a file
• Matrix: Nine numbers.
• Euler Angles: Three numbers that can be easily quantized.
• Exponential Map: Three numbers that can be easily quantized.
• Quaternion: Four numbers that do not quantize well; can be reduced to three by assuming fourth component is always nonnegative and quaternion has unit length.
• Unique representation for a given rotation
• Matrix: Yes.
• Euler Angles: No, due to aliasing.
• Exponential Map: No, due to aliasing, but not as complicated as Euler angles.
• Quaternion: Exactly two distinct representations for any angular displacement, and they are negatives of each other.
• Possible to become invalid
• Matrix: Six degrees of redundancy inherent in orthogonal matrix. Matrix creep can occur.
• Euler Angles: Any three numbers can be interpreted unambiguously.
• Exponential Map: Any three numbers can be interpreted unambiguously.
• Quaternion: Error creep can occur.
Table 8.1Comparison of matrices, Euler angles, exponential maps, and quaternions

Some situations are better suited for one orientation format or another. The following advice should aid you in selecting the best format:

• Euler angles are easiest for humans to work with. Using Euler angles greatly simplifies human interaction when specifying the orientation of objects in the world. This includes direct keyboard entry of an orientation, specifying orientations directly in the code (i.e., positioning the camera for rendering), and examination in the debugger. This advantage should not be underestimated. Certainly don't sacrifice ease of use in the name of “optimization” until you are certain that it will make a difference.
• Matrix form must eventually be used if vector coordinate space transformations are needed. However, this doesn't mean you can't store the orientation in another format and then generate a rotation matrix when you need it. A common strategy is to store the “main copy” of an orientation in Euler angle or quaternion form, but also to maintain a matrix for rotations, recomputing this matrix any time the Euler angles or quaternion change.
• For storage of large numbers of orientations (e.g., animation data), Euler angles, exponential maps, and quaternions offer various tradeoffs. In general, the components of Euler angles and exponential maps quantize better than quaternions. It is possible to store a rotation quaternion in only three numbers. Before discarding the fourth component, we check its sign; if it's negative, we negate the quaternion. Then the discarded component can be recovered by assuming the quaternion has unit length.
• Reliable quality interpolation can be accomplished only by usingquaternions. Even if you are using a different form, you can always convert to quaternions, perform the interpolation, and then convert back to the original form. Direct interpolation using exponential maps might be a viable alternative in some cases, as the points of singularity are at very extreme orientations and in practice are often easily avoided.
• For angular velocity or any other situation where “extra spins” need to be represented, use the exponential map or axis-angle.

# 8.7Converting between Representations

We have established that different methods of representing orientation are appropriate in different situations, and have also provided some guidelines for choosing the most appropriate method. This section discusses how to convert an angular displacement from one format to another. It is divided into six subsections:

For more on converting between representation forms, see the paper by James Diebel [3].

## 8.7.1Converting Euler Angles to a Matrix

Euler angles define a sequence of three rotations. Each of these three rotations is a simple rotation about a cardinal axis, so each is easy to convert to matrix form individually. We can compute the matrix that defines the total angular displacement by concatenating the matrices for each individual rotation. This exercise is carried out in numerous books and websites. If you've ever tried to use one of these references, you may have been left wondering, “Exactly what happens if I multiply a vector by this matrix?” The reason for your confusion is because they forgot to mention whether the matrix rotates from object space to upright space or from upright space to object space. In other words, there are actually two different matrices, not just one. (Of course, they are transposes of each other, so, in a sense, there really is only one matrix.) This section shows how to compute both.

Some readers16 might think that we are belaboring the point. Maybe you figured out how to use that rotation matrix from that book or website, and now it's totally obvious to you. But we've seen this be a stumbling block for too many programmers, so we have chosen to dwell on this point. One common example will illustrate the confusion we've seen.

Consider a typical real-time situation with objects moving around. Assume that the orientation of each object is maintained in Euler angle format as a state variable. One of these objects is the camera, and naturally we use the same system of Euler angles to describe the orientation of the camera as we do for any other object. Now, at some point we are going to need to communicate those reference frames to the graphics API. This is where the confusion occurs: the matrix we use to describe the orientation of the objects is not the same matrix we use to describe the orientation of the camera! The graphics API needs two types of matrices. (We'll discuss them in more detail in Section 10.3.1.) The model transform is a matrix that transforms vectors from object space to world space. The view transform transforms vectors from world space to the camera's object space. The rotation portion of the model transform matrix is an object-to-upright matrix, but the rotation portion of the view transform matrix is an upright-to-object matrix. So to speak of the Euler rotation matrix leaves out some important practical details.

Now to derive the matrices. We start by deriving the object-to-upright matrix, which rotates points from object space to upright space. We will be using the simple rotation matrices developed in Section 5.1.2, and those were developed using the perspective of active transformation (see Section 3.3.1 if you don't remember the difference between activate and passive transformations). Thus, to visualize the task at hand, imagine an arbitrary point on our object. The object starts out in the “identity” orientation, and the body coordinates for our point, which are the coordinates we know, also happen to be the upright coordinates at this time because the two spaces are aligned. We perform the sequence of Euler rotations on the object, and the point moves in space until, after the third rotation, the object has arrived in the orientation described by the Euler angles. All the while, our upright coordinate space used to measure the coordinates remains fixed. So the final result of these calculations is the upright coordinates for the point in its arbitrary orientation.

There's one last catch. The elementary rotation matrices that we wish to use as building blocks each rotate about a cardinal axis. With Euler angles, the axes of rotation are the body axes, which (after the first rotation) will be arbitrarily oriented. So instead of doing the Euler rotations about the body axes, we do fixed-axis rotations, where the rotations are about the upright axes. This means we actually do the rotations in the reverse order: first bank, then pitch, and finally heading. See Section 8.3.2 if you don't remember what fixed-axis rotations are.

In summary, the generation of the object-to-upright rotation matrix is a straightforward concatenation of three simple rotation matrices,

${\mathbf{M}}_{\mathit{o}bject\to upright}=\mathbf{B}\mathbf{P}\mathbf{H},$

where $\mathbf{B}$ , $\mathbf{P}$ , and $\mathbf{H}$ are the rotation matrices for bank, pitch, and heading, which rotate about the $z$ -, $x$ -, and $y$ -axes, respectively. We learned how to compute these elementary rotation matrices in Section 5.1.2.

Elementary rotation matrices for bank, pitch, and heading
$\begin{array}{rl}\mathbf{B}={\mathbf{R}}_{z}\left(b\right)& =\left[\begin{array}{ccc}\mathrm{cos}b& \mathrm{sin}b& 0\\ -\mathrm{sin}b& \mathrm{cos}b& 0\\ 0& 0& 1\end{array}\right],\\ \mathbf{P}={\mathbf{R}}_{x}\left(p\right)& =\left[\begin{array}{ccc}1& 0& 0\\ 0& \mathrm{cos}p& \mathrm{sin}p\\ 0& -\mathrm{sin}p& \mathrm{cos}p\end{array}\right],\\ \mathbf{H}={\mathbf{R}}_{y}\left(h\right)& =\left[\begin{array}{ccc}\mathrm{cos}h& 0& -\mathrm{sin}h\\ 0& 1& 0\\ \mathrm{sin}h& 0& \mathrm{cos}h\end{array}\right].\end{array}$

Putting it all together (and leaving out the messy math to actually do the matrix multiplications), we have

Object-to-upright rotation matrix from Euler angles