Appendix A — A General Introduction to Matrices

Author

Laura Bernhofen, Richard Ressler

Published

April 16, 2024

A.1 Matrices, Vectors, and Scalars

A.1.1 Definition of a Matrix

A rectangular array is a set of numbers ( or symbols representing numbers) where the numbers are arranged in rows and columns where every column has the same number of rows and every row has the same number of columns.

A matrix is a rectangular array where each entry is called an element of the matrix. A matrix is often denoted by a capital letter, e.g., A.

A.1.2 The Size or Dimension of a Matrix

The size or dimension of a matrix M is specified by the number of rows r and the number of columns c and is denoted as (r×c).

Alternative Notation: dim(M) = r×c

Examples:

Matrix A has 3 rows and 2 columns so has size (3×2) or dim(A) = 3×2.

A=[123014]

Define matrix B

Matrix B has 2 rows and 4 columns so has size (2×4) or dim(B) = 2×4.

B=[abcdefgh]

A.1.3 Row and Column Vectors

When a matrix only has one row it may be called a row vector with dimension (1×m). A vector V may be denoted by V.

  • Think of the row vector as a row in a spreadsheet that contains all the values of the variables/attributes for one case or observation.

R=[0201]

When a matrix only has one column, it may be called a column vector with dimensions (n×1).

  • Think of a column vector as a column in a spreadsheet with the values of the observations for one variable/attribute.

C=[10201]

A.1.4 Scalars

A scalar is a matrix of size (1×1) and is usually displayed as just the value as a real number.

ex. k=2 is a scalar.

Warning

In the programming language R, a (1×1) data frame is not the same as a single value. It is still of class data frame.

A.1.5 General Notation for a Matrix

If matrix A has n rows and m columns, so dimension (n×m), each entry is called an element of A and its position in the matrix is denoted using subscripts. The notation aij identifies the element at the intersection of row i and column j.

A=[a11a12a1ma21a22a2man1an2anm]

or, in shorthand, An×m is the set of aij (where {} denotes a set), e.g., A={aij}i=1,,n,j=1,,m

At times you may see the dimension of matrix indicated using a subscript so An×m means A is a matrix with n rows and m columns.

A.2 Special Matrices

A.2.1 A Square Matrix

A square matrix is a matrix with the same number of columns and rows, i.e., of size n×n. A square matrix of size n×n is said to be of order n.

Q=[1234]

A.2.1.1 Determinants of Square Matrices

  • Given a square matrix, one can calculate a scalar number known as the determinant of the matrix. The determinant of matrix A is denoted in several ways: det(A)=|A| or for a 2×2 matrix as

det(A)=det[1324]=|1324|

  • The determinant “determines” or describes how the matrix structure may affect other matrices when used in operations.

  • The determinant of a matrix can be calculated using only the numbers in the matrix.

  • For a matrix of order 2, the calculation is straightforward:

det(A)=|a11a12a21a22|=a11a22a12a21

  • For higher order matrices, the calculations get more complicated.

A.2.2 A Symmetric Matrix

A symmetric matrix A is a square matrix where the elements that correspond by switching i and j are equal.

{aij}={aji} for all i=1,,n,j=1,,n

A=[1234258938710491011]

A.2.3 A Diagonal Matrix

A diagonal matrix is a square matrix (n×n) where all the off diagonal elements are 0, i.e., the only non-zero elements are on the diagonal where i=j.

D=[10000300002000010]

Diagonal matrices are also symmetric.

A.2.4 The Identity Matrix

The Identity matrix is a diagonal matrix where every diagonal value is equal to 1 and all off-diagonal elements are 0.

I=[1000010000100001]

A.2.5 The 1n and 0n Vectors

A column vector of all 1s of size (n×1) is denoted as

1n=[11121n]=1

A column vector of all 0s of size (n×1) is denoted as

0n=[01020n]=0

A.3 Matrix Operations

A.3.1 Equality of Matrices

Definition: Two matrices A and B are said to be equal, denoted as A=B, if

{aij}={bij} for all i=1,,n,j=1,,m

Note

For two matrices to be equal they must be of the same size (dimension).

Example: If

A=[123014] and B=[x4/2x922x3422x]

then A=B if x=1.

A.3.2 Addition of Matrices

To add two matrices A,B, they must have the same size, i.e., i.e., dim(A)=dim(B) = n×m.

The addition of two matrices, A+B, results in the matrix S where each element of S is equal to the sum of the two corresponding elements of A and B.

S=A+B{sij=aij+bij} for all i=1,,n,j=1,,m

Example: if

A=[123014] and B=[146023]

then

A+B=[1+(1)2+43+60+01+24+(3)]=[069011]=S

A.3.3 Multiplying a Matrix by a Scalar

Let k be any real number (a scalar) and A a matrix of size n×m.

Multiplying a matrix by a scalar results in a matrix where every element has been multiplied by the scalar.

kA={kaij} for all i=1,,n,j=1,,m

Example for matrix A,

12A=12[123014]=[12(1)12(2)12(3)12(0)12(1)12(4)]=[121320122]

Note

Scalar multiplication is commutative: kA=Ak

A.3.4 Multiplying a Matrix by another Matrix

A.3.4.1 Definition

Let A be an n×m matrix and B be a m×k matrix.

Note

To multiply two matrices A and B, the number of columns in A must equal the number of rows in B, as seen here where m=m.

Multiplying A (n×m) times B (m×k), denoted as AB, results in a matrix M of dimension (n×k) where mij is the sum of the product of the corresponding elements in row i from A with the elements in the column j from B such that

AB=M{mij} for all i=1,,n,j=1,,m

where

mij=(ai1b1j+ai2b2j++aimbmj)=k=1maikbkj

Example:

 Let A=[123014] and B=[24681359]

Here dim(A) = 3×2 and dim(B) = 2×4.

Therefore,

AB=[1(2)+2(1)1(4)+2(3)1(6)+2(5)1(8)+2(9)3(2)+0(1)3(4)+0(3)3(6)+0(5)3(8)+0(9)1(2)+4(1)1(4)+4(3)1(6)+4(5)1(8)+4(9)]=[41016266121824281428]

A.3.4.2 Properties

  1. Matrix multiplication is not commutative! If dim(B) = m×k and dim(A) = n×m, the product BA does not exist when kn. If the A and B are both square with the same dimension then BA may exist but it is not necessarily true that BA=AB.

  2. Matrix multiplication is distributive under matrix addition such that for three matrices of the correct sizes, A(B+C)=AB+AC.

A.3.4.3 Exercises

A.3.4.3.1 Is matrix multiplication commutative?

Let C=[1423] and D=[4354]

Compute CD and DC and check if CD=DC.

CD=[1(4)+4(5)1(3)+4(4)2(4)+3(5)2(3)+3(4)]=[24192318]

and

DC=[4(1)+3(2)4(4)+3(3)5(1)+4(2)5(4)+4(3)]=[10251332]

Warning

In matrix multiplication, order matters! Often BAAB.

A.3.4.3.2 Is matrix multiplication distributive?

Let A=[123014]B=[24681359] and C=[12345432]

Check if A(B+C)=AB+AC

A(B+C)=[123014][3691267811]=[15202534918273621222332]

AB+AC=[41016266121824281428]+[11109836912191494]=[15202534918273621222332]

A.3.4.4 Matrix Multiplication Terminology

AB means we are pre-multiplying B by A.

BA means we are post-multiplying B by A.

Depending upon the sizes of A and B, neither AB or BA may exist or, if they exist, they may not be equal.

A.4 The Transpose of a Matrix

A.4.1 Definition

Let A be an n×m matrix.

The transpose of A, denoted as A or A, is the m×n matrix created by switching the columns of A to become the rows of A. It can also be considered as rotating or flipping A about its diagonal.

If A={aij} then A={aji} for all i=1,,n,j=1,,m

If

A=[a11a12a1ma21a22a2man1an2anm]

Then

A=[a11a21an1a12a22a2na1ma2mamn]

Example:

 If A=[123014] then A=[131204]

Note

The transpose of a matrix always exists.

Tip

It is common in print to see column vectors represented as their transpose e.g., C=[123]=[123]

A.4.2 Properties of a Transpose

  1. Identity:

A=A if an only if A is symmetric.

 If S=[123246360] then S=[123246360]=S

  1. Transpose under Multiplication

(AB)=BA

Warning

Notice when taking the transpose of a product, we switch the order when multiplying the product of the transposes.

Example:

Let A=[1234] and B=[11]

so

A=[1324] and B=[11]

Then

AB=[1(1)+2(1)3(1)+4(1)]=[11] and (AB)=[11]

So

BA=[11][1324]=[1(1)+1(2)1(3)+1(4)]=[11]

(AB)=BA

Note

This property is often used, going both ways, in regression. You should be familiar with it.

A.5 More on the Determinant of A Square Matrix

  1. If A is any square matrix that contains a row (or column) or zeros, then det(A) = 0.

Example: Let

F=[12340000202011510]

Then det(F) = 0.

  1. If D is an n×n diagonal matrix, then det(D) is the product of the entries on the main diagonal, i.e.,

det(D)=d11d22d33...dnn

Example: Let

D=[10000300002000010] det(D)=1(3)(2)(10)=60.

  1. If A and B are square matrices of the same size, then det(AB)=det(A)det(B).

A.6 The Inverse of a Matrix

A.6.1 Definition

A matrix B is said to be the inverse of the matrix A if

AB=In and BA=In

If B is the inverse of A we denote the inverse as B=A1

A.6.2 Remarks

  1. Only square matrices can have inverses.
  2. Not All square matrices have inverses.
  3. If A1 exists for matrix A, then the inverse A1 is unique.
  4. If a matrix A has an inverse, we say A is invertible.
  5. Matrix A is invertible, if and only if det(A)0.

Proof:
If A is an n×n invertible matrix, then there exists an n×n matrix A1 such that AA1=In,
where the identity matrix In is a diagonal matrix with all 1’s on the diagonal.

det(AA1)=det(A)det(A1)=det(In)=1 Consequently det(A)0.

  1. When det(A)=0 that means at least one row of the matrix can be calculated as a linear combination of other rows in the matrix. If det(A)=0, then we say the matrix A is singular or non-invertible.
  2. If A is invertible, then the matrix equation AX=B has a unique solution.

A.6.3 Calculating the Inverse of a Matrix

Calculating the inverse of a matrix of order 2 is straightforward:

If A=[abcd] then A1=1det(A)[dbca]=1adbc[dbca]

Note

Since the det(A) appears in the denominator, when det(A)=0, A1 does not exist.

Warning

When calculating the determinant (or inverse) of a matrix using a computer, especially when the matrix is large or has values that differ by several orders of magnitude, special techniques are required to minimize the risk of getting values of 0 or close to 0 just due to the limited precision of a computer.

Example:

Let A=[1234] then det(A)=(1(4)2(3))=2

Define matrix B as:

B=12[4231]=[213/21/2]

Then

AB=[1234][213/21/2]=[1(2)+2(3/2)1(1)+2(1/2)3(2)+4(3/2)3(1)+4(1/2)]=[1001]=I2

and

BA=[213/21/2][1234]=[2(1)+1(3)2(2)+1(4)(3/2)(1)+(1/2)(3)(3/2)(2)+(1/2)(4)]=[1001]=I2

B=A1 and A1=B

A.6.4 Properties of the Inverse

Assuming an A, B and AB are invertible matrices, (i.e., A1, B1, and (AB)1 exist), then

  1. (A1)1=A
  2. (AB)1=B1A1
Warning

Notice when taking the inverse of a product, we switch the order when multiplying the product of the inverses.

A.6.4.1 Orthogonality

  1. If C is an n×n matrix, C is said to be orthogonal if CC=In.
  2. An n×n matrix C is orthogonal if and only if C1=C.
  3. The determinant of an orthogonal matrix is either 1 or -1.
  4. Orthogonal matrices have nice properties such as enabling numerical stability in computer-based linear regression algorithms.

A.7 Linear Independence and the Rank of a Matrix

A.7.1 Definition of Linear Independence

Let V1,V2,,Vm be m vectors and k1,k2,,km be m scalars.

The vectors V1,V2,,Vm are said to be linearly dependent if there exists some kis0i=1,2,m such that the linear combination:

k1V1+k2V2++kmVm=0=0m

Note

If the vectors V1,V2,,Vm are linearly dependent, this implies there is at least one of the vectors Vi which can be expressed (calculated) as a linear combination of one or more of the other vectors.

Practical Interpretation: If a set of vectors is linearly dependent, then at least one of the vectors is redundant (does not add any new information to the other vectors).

Example:

Let S=[123246360] with row vectors V1=[123]V2=[246]V3=[360]

V2=2V1(2)×V1+1×V2+0×V3=0 the vectors are linear dependent

If a set of vectors V1,V2,,Vm are not linearly dependent, they are said to be linearly independent.

Practical Interpretation: If a set of vectors is linearly independent, then all vectors contribute new information.

Note

Either row vectors or column vectors can be linearly dependent.

  • Row vectors: Cases can be repeated so have identical information.
  • Column vectors: Variables contain similar or redundant information.

A.7.2 Definition of the Rank of a Matrix

The rank of a matrix is the maximum number of linearly independent columns (rows).

A.7.3 Properties

  1. The inverse of a n×n matrix Q exists if an only if rank(Q)=n. We say Q is of full rank.
  2. For an n×m matrix A, rank(A)min{n,m}.
  3. if C=AB, then rank(C)min{rank(A),rank(B)}.

A.8 Probability Results for Random Vectors

A.8.1 Definitions for a Random Vector.

Let X1,X2,,Xk be a set of k random variables.

A vector X is a random vector when each element Xi is a random variable, e.g.,

X=[X1X2Xk] is a random vector.

Note

Since each random variable Xi has a probability distribution, the random vector X has what we refer to as a joint distribution.

The joint distribution describes how the {X1,X2,,Xk} are distributed in relation to one another.

Given a random vector X,

  1. The expected value of X is the mean vector of X and represents the center of the joint distribution of X. This is denoted as:

μX=E(X)=[E(X1)E(X2)E(Xk)]

  1. Given a k-dimensional random variable X, there exists a k×k symmetric matrix, Cov(X), called the variance-covariance or covariance matrix of X. It has the form:

(A.1)Cov(X)=[Var(X1)Cov(X1,X2)Cov(X1,Xk)Cov(X2,X1)Var(X2)Cov(X2,Xk)Cov(Xk,X1)Cov(Xk,X2)Var(Xk)]

Note
  1. Var(Xi) is equivalent to the variance of just the random variable Xi itself.

  2. Cov(Xi,Xj)=E[(XiE[Xi])(XjE[Xj])=E[(Xiμi)(Xjμj)] provides us information on how the random variables Xi,Xj are related (distributionally).

  3. If Cor(Xi,Xj)=Cov(Xi,Xj)Var(Xi)Var(Xj)=0, then Xi,Xj are uncorrelated.

Example: Contour and surface plots of the joint density function of two normally distributed random variables with a joint distribution.

  • In the first case, the two variables are independent, i.e., covariance is 0. The contour plot makes it easy to see Var(X2)> Var(X1).
  • In the second case, the covariance is greater than zero. The contour plot shows how the values of X1 and X2 are related and that creates an angle in the joint density function.
(a) Cov = 0 Contour
(b) Cov = 0 Surface
(c) Cov = .8 Contour
(d) Cov = .8 Surface
Figure A.1: Bivariate Normals N(0,1), N(0,2)

A.8.2 Definition of a Multivariate Normal Distribution

A random vector X has a multivariate normal distribution if its probability density function (pdf) is given by:

(A.2)f(X)=1(2π)k/2|Σ|1/2e12(Xμ)Σ1(Xμ)

where: μ=E[X],Σ=Cov(X)k×k and |Σ| denotes the determinant of det(Σ).

The joint Normal distribution of X is denoted by XNk(μ,Σ).

A.8.3 Properties

  1. If XNk(μ,Σ), then the marginal distribution of Xi is N(μi,σi2).
  2. If Cor(xi,xj)=0 for all ij then

Σ=[σ12000σ22000σk2]

and x1,,xk are independent normally-distributed random variables.

A.9 Matrices and the Classic Multiple Linear Regression Model

Let there be p1 predictor/explanatory variables X1,,Xp1.

Assume the true model is:

(A.3)Yi=β0+β1Xi1+β2Xi2++βp1Xi,p1+ϵii=1,,n

The model can also be written as a set of n equations:

Y1=β0+β1X11+β2X12++βp1X1,p1+ϵ1Y2=β0+β1X21+β2X22++βp1X2,p1+ϵ2Y3=β0+β1X31+β2X32++βp1X3,p1+ϵ3=Yn=β0+β1Xn1+β2Xn2++βp1Xn,p1+ϵn

One can then convert the n equations into a new form based on four matrices:

(A.4)Let Yn×1=[Y1Y2Y3Yn] and ϵn×1=[ϵ1ϵ2ϵ3ϵn]

Then define

(A.5)Xn×p=[1X11X12X1,p11X21X22X2,p11X31X32X3,p11Xn1Xn2Xn,p1] and βp×1=[β0β1β2βp1]

One can combine and to get the matrix form of the true model:

(A.6)Yn×1=Xn×pβp×1+ϵn×1

Note

The multiplication of Xβ results in an (n×1) matrix (a column vector).

In the form of the linear model, the matrix X is called the design matrix.

A.10 A Geometric Perspective on Matrices and Matrix Operations

The previous sections discuss matrices from an analytical perspective. This section will look at matrices from a geometric perspective. This is based heavily on ideas and content in the YouTube series by 3Blue1Brown called Essence of linear algebra.

A.11 Vectors, Points, Spans, and Bases in Rn

A.11.1 Vectors with One Element: R1

Given a vector with one element: v=[2], this can be thought of as representing a point on a 1-Dimensional number line ranging from 0 where the element value is the distance from 0.

A one-dimensional vector

We can multiply this vector by any scalar value and get another value on the number line.

This is known as scaling the vector.

Figure A.2: Scaling a one-dimensional vector

A.11.1.1 The 1-d Unit Vector

We can represent any point on the number line by scaling a vector of length 1, i^ (known as the unit vector), by the appropriate scalar a.

The red vector in above is a unit vector i^=[1]

A.11.1.2 Linear Combinations of 1-d Vectors

We can also create linear combinations of 1-d vectors by adding them together.

[4]+[2]=[2][4]+[2]=4i^+(2)i^=2i^=2[1]

Span of a Vector and Basis Vectors

The set of all linear combinations of a vector, say i^, is known as the span of the vector.

In 1-d, the span of i^ includes every point on the number line. We don’t need any other vector to find a point.

Thus i^ is known as a generator for the 1-d vector space.

Since there is only 1 vector, i^, in the set of generators for 1-d space, we do not have to check if it is independent of other vectors.

The set of vectors S that are linearly independent and generate the space (their span includes every point in the space) is known as the basis for the space. The member vectors of S are the basis vectors for the space.

Here, i^ is a basis vector for the 1-d vector space.

Vector Space

A vector space of dimension d is a subset of possible values for a geometric object of d dimensions that passes through the origin which has the following properties.

  • One can do addition and scalar-multiplication operations
  • Those operations are commutative and distributive
  • The subset contains the zero vector 0 (the origin).
  • If the subset contains v then it contains av for every scalar a.
  • If subset contains u and v, then it contains u+v.

A.11.2 Vectors with Two Elements:R2

Let’s consider a vector with two elements: v=[23].

With two elements, this can be thought of as representing a point on a 2-dimensional x,y plane where both x and y have the range 0.

  • The element values are the distance from 0 in the x direction and then the y direction.
  • We use the following notation to denote the x and y elements of the 2 dimensional vector.

v=[xy] - The point (0,0) is called the origin. - We will think of all vectors as being expressed geometrically as an arrow with the tail at the origin and head at the point away from the origin.

Figure A.3: A two-dimensional vector

We can still scale a two dimensional vector by multiplying it by a scalar.

Important

However, we cannot create every point in the 2-d space with one vector, only those points in the span of the original vector.

A single vector does not have a span that generates a 2-d space.

To generate the 2-d space, we need a second vector that is not in the span of the first vector.

A.11.2.1 Unit Vectors in 2-d

Let’s define two unit vectors, vi in the x direction, and vj in the y direction.

vi=[10] and vj[01]

A.11.2.2 Scaling and Vector Addition in 2-d

We can consider the vector v=[23] as representing scalar multiplication of two unit vectors, vi in the x direction and vj in the y direction, followed by their addition.

v=[23]=2vi+3vj=2[10]+3[01]

Figure A.4: A 2-d vector expressed as scaled unit vectors added together.

The orange line in represents the addition of two vectors as discussed in .

Geometrically, it can be seen as moving the tail of the second vector to the head of the first vector. Their sum is the new location of the head of the second vector.

Let’s add v=[23]+v=[31].

  • We plot the two vectors with tails at the origin and then move the second (red) vector so its tail is at the head of the first vector.
  • The result is the head of the shifted second vector, here

v=[23]+[31]=[14].

Figure A.5: Addition of 2-d vectors.

A.11.2.3 Linear Combinations and Linear Independence

With two 2-d vectors, u and v, we can use scaling and vector addition to create a new vector w that is a linear combination of u and v, such that

au+bv=w

Every vector space contains the origin.

If a=b=0 in au+bv=w, then w is the origin.

Assuming a,b0, then we say u and v are linearly dependent if we can choose a and b such that au+bv=0.

Important

If two vectors are linearly dependent, it means they share the same span. Thus the dimension of the span of the set {u,v} is 1, not 2.

Consider u=[23] and v=[11.5].

  • We can find scalars a=1,b=2 such that au+bv=0.

a[23]+b[11.5][23]+2[11.5]=[00]

  • Thus u and v are linearly dependent and share the same span as seen here.
Figure A.6: Two linear dependent vectors have the same span and cannot generate the 2-d space.

Reducing the span by one dimension from 2-1 is equivalent to converting the vector space from a 2-d plane to a 1-d number line.

To generate a 2-d space, we need a second vector that is not in the span of the first vector i.e., is linearly independent of the first vector.

The unit vectors i^ and j^ are linearly independent.

Thus the set S={i^,j^} has dimension 2 and can generate the 2-d space.

The set S=i^,j^ are a basis for the 2-d space.

A.11.2.4 Basis Vectors

Is the set S=i^,j^ the only set of basis vectors in 2-d space? No.

Any set of two linearly independent vectors u and v, such as u=[23] and v=[13], can serve as the basis.

  • The vectors do not even have to be orthogonal.

Changing the basis is equivalent to changing the reference coordinate system for the space.

  • Any time we interpret the values of the elements in a vector, we are implicitly using the bases vectors to shape our interpretation.

Thus we normally assume we are using S=i^,j^ as the basis since it corresponds to the 2-d Cartesian plane with x as the horizontal axis and y as the vertical axis.

A.11.3 Three dimensions and Higher: Rn

The same concepts from R1 and R2 apply as we move into higher dimensions.

A.11.3.1 Vector Elements

For three dimensions R3, a vector now has three elements u=[xyz]

For n dimensions, a vector has n elements x=[x1xn].

A.11.3.2 Span and Bases in Rn

With two linearly independent vectors in 3-d vector space, their span is still in 2-d space.

  • In 3-d space, graphing the result of every linear combination of 2 linearly independent vectors, e.g., ai^+bj^=w, creates a 2-d plane in 3-d space, centered on the origin.
  • In Rn, the result of every linear combination of (n1) linearly independent vectors a1x1+a2x2++a(n1)xn1=wn1 creates an (n1)-dimensional hyper-plane in Rn space, centered on the origin.

If we add a third vector in R3 we can add vectors as before.

  • Add the first two vectors by moving the tail of the second to the head of the first.
  • Then, move the tail of the third vector to the head of the second.
  • The new location of the head of the third vector is the result.

This is equivalent to:

Given u=[x1y1z1]v=[x2y2z2]w=[x3y3z3]

Vector addition u+v+w=[x1+x2y1+y2z1+z2]+[x3y3z3]=[x1+x2+x3y1+y2+y3z1+z2+z3]=w

The span of the three 3-d vectors is the set S of all possible linear combinations of au+bv+cw.

  • The dimension of the S is the number of linearly independent vectors in S.
  • If one or more of the vectors in S is linearly dependent on one or more of the others, (it is in the span of one or their linear combination), the dimension of set S is still the number of linearly independent vectors in S, so may be 1 or 2.
  • If the three vectors are all linearly independent of each other, the span of S has dimension 3, the three vectors can generate R3, and the set S can serve as a basis for R3.
  • This can be thought of as taking the span of the first two vectors (a horizontal x,y plane) and using the third vector to move it from in the z dimension.

In Rn, the span of n, n-dimensional vectors is the set S of all possible linear combinations of a1x1+a2x2++anxn.

  • The dimension of a set of vectors S is the number of linearly independent vectors in S.
  • If one or more of the vectors in S is linearly dependent on one or more of the others, (it is in the span of one or a linear combination of others), the dimension of set S is still the number of linearly independent vectors in S, so may from range from 1 to n1.
  • If the n vectors are all linearly independent of each other, the span of S has dimension n, the n vectors can generate Rn, and the set S can serve as a basis for Rn.

In Rn, consider the set of n unit vectors along each axis as the basis for the Rn. This helps with the interpretation of matrices as linear transformations of a vector in Rn.

A.12 Matrices as Linear Transformations of Vectors

A.12.1 Linear Transformations in General

A transformation is a function that maps an input value to an output value.

We are interested in using a function to map a vector to a different vector in Rn.

  • The function can be considered the rules for reshaping and moving the input vector to look like the output vector.
  • This is equivalent to applying a function to a point in Rn to produce another point in Rn.

To make the transformation linear, we have to add two constraints to the function.

  1. It must preserve the linearity of lines - all input lines must be output as lines.
  2. The origin must not be shifted.

These are equivalent to transformations that keep all the gird lines on the plan as parallel and evenly spaced.

Important

A linear transformation of any n dimensional vector can be described as a linear combination of the unit vectors in the basis for Rn.

An example in R2.

  • Start with a vector u=[12]=1i^+2j^

If we use transformation that moves the vector to v, it preserves the coefficients of the linear combination and we just have to think about what happens to the i^ and j^.

v=1(transformed i^)+2(transformed j^)

If our transformation moved i^ to [12] and j^ to [30], then the final vector will be

v=1[12]+2[30]=[52]

We can write this transformation in terms of any input as

u=[xy]transformed x[12]+y[30]=[1x+3y2x+0y]

We can make this even more general.

Linear Transformations in R2.

We can describe any linear transformation in R2 using just 4 numbers.

  • Two for the vector where i^ lands after the transformation and
  • Two for the vector where j^ lands after the transformation.

We combine these four numbers (two vectors) into a 2x2 matrix as

[xi^xj^yi^yj^]

Where the first column describes the new location of i^ and the second column in the new location of j^.

So, to apply a transformation matrix to any vector in R2, we use the elements in the input vector to create a linear combination of the vectors in the transformation matrix.

Given a 2x2 matrix [3221], if we want to transform [57], the linear combination looks like:

[3221][57]=5[32]+7[21]=[2917]

Or in general.

[abcd][xy]=x[ac]+y[bd]=[ax+bycx+dy]

Note

Putting the matrix on the left of the vector is equivalent to using f(v)=w where f() is a transformation function.

We have previously seen f() as the function for matrix multiplication in .

A square matrix of size n can thus be interpreted as just a function to make a linear transformation of a vector in Rn.

  • The linear transformation is completely described by n2 numbers which describe the new locations of the unit vectors in the basis of the space.
  • We can put these numbers into the columns of a square matrix to describe where each of the n unit vectors winds up after the transformation.
  • Matrix multiplication can be interpreted as reshaping the space to move the input vector into a new position in Rn.

A.12.2 Executing Multiple Transformations

We often want to execute a sequence of transformations, sometimes called creating a composition of transformations.

This is equivalent to multiplying multiple matrices.

Since each multiplication results in a new matrix describing where each of the n unit vectors winds up after the transformation, we can just repeat the process.

Note

When multiplying a vector by two matrices, we write it in the form from left to right as

[efgh][abcd][xy]

where we execute as g(f(X))

[efgh]([abcd][xy])

As noted in , compositions of matrix transformations (multiple multiplications) are not commutative.

That means, that except for special cases,

[efgh]([abcd][xy])[abcd]([efgh][xy])

A.12.3 Determinants

These linear transformations by matrices are reshaping the space of the input vector.

This means they are often either stretching the space or shrinking the space, with or without some rotations in one or more dimensions.

In R2, we are often then interested in by what factor does a transformation change the area of a space.

Consider the transformation matrix [3002].

  • It scales i^ by a factor of 3 and j^ by a factor of 2.
  • This means that the 1 x 1 square formed by i^ and j^ now is a 3 x 2 rectangle so has an area of 6.
  • We can say this linear transformation scaled the area by a factor of 6.
Figure A.7: A linear transformation can scale the area of the 2-d space.

Now, consider a linear transformation known as a shear transformation.

  • It has the transformation matrix [1101]
  • This leaves i^ in place but moves j^ over to (1,1).
  • This means the 1 x 1 square is now a parallelogram but it still has area 1.
Figure A.8: A linear shear transformation scales the area of the 2-d space into a parallelogram.

This scaling factor of a linear transformation matrix is called the determinant of the transformation matrix as seen back in .

  • This scaling factor applies to any area defined by vectors in R2.

  • For Rn, the matrix will scale the volumes instead of the area.

  • For Rn3, it is the volume of the parallelepiped created by the three unit vectors along each axis.

  • If A is a transformation matrix with determinant 3, it scales up all the areas by a factor of 3.

  • If B is a transformation matrix with determinant 1/2, it shrinks down all the areas by a factor of 1/2.

Determinant of 0

If a 2-d transformation matrix has a determinant of 0, it shrinks all of the space into a 1-d line or even a single point (0,0).

This is equivalent to the columns of the matrix being linearly dependent.

This can be useful as we will see later on to simplify the representation of a transformation as a matrix.

What does it mean to have a determinant that is <0?

  • This is known as a transformation that flips the coordinate reference system or “invert the orientation of space”.
  • This can also be visualized as i^ moving from the usual position to the right of j^ to now being on the left of j^.
  • The Absolute Value of the determinant still shows how much the areas have been scaled (increased or shrunk down).
  • In R3, this means the orientation has been inverted from a “right hand rule” to a “left-hand rule”.

To take a general view, consider the matrix A=[abcd] where

det(A)=det([abcd])=adbc

If b=c=0 then there then A is the diagonal matrix A=[a00d].

  • The transformation creates a new rectangle out of the unit vector 1x1 square as seen in .
  • a is the factor for how much i^ is stretched.
  • b is the factor for how much j^ is stretched
  • ab is the area of the new rectangle compared to the original 1x1 square.

If either b or c is non-zero, then,

  • The transformation creates a new parallelogram out of the unit vector 1x1 square as seen in .
  • ab is still the area of the new parallelogram, with base a and height b, compared to the original 1x1 square.

If b and c are both non-zero, then,

  • The transformation creates a new parallelogram out of the unit vector 1x1 square and stretches (shrinks) it in the diagonal direction.
Note

When multiplying two matrices, A and B, det(AB)=det(A)det(B)

A.13 Matrices and Systems of Linear Equations

A system of linear equations is a set of equations that describes the relationships among n variables through scaling the variables and adding then together to create a result.

We want to solve the system of linear equations and transformation matrices can help us do that.

If you have a system of linear equations you can organize it with all of the variables and their coefficients on the left and all of the results (scalars) on the right of the equal sign.

  • You may need to make 0 or 1 coefficients explicit so you have the same number of coefficients in each equation.

Now, convert the system of equations into matrix form by converting the variables and results to vectors and the coefficients into a matrix.

2x+5y+3z=34x+0y+8z=01x+3y+0z=2[253408130][xyz]Variables=[302]

We can label each part as follows

2x+5y+3z=34x+0y+8z=01x+3y+0z=2[253408130]A[xyz]x=[302]v

and now rewrite in matrix vector form:

(A.7)Ax=v

We can interpret the system of equations then as a matrix transformation (or function) of the input vector x that results in the vector v.

Our interest is in figuring out what input vector x, when the space is scaled and squished by A, now looks like v.

There are two cases

  1. The matrix A squishes the space down by a dimension, i.e., det(A)=0
  2. The matrix A squishes the space in a way that preserves the dimension of the space, i.e., det(A)0

A.13.0.1 Solutions with det(A)0

When det(A)0, there where always be one, and only one, x such that Ax=v.

We can find this by reversing the transformation. This reverse of the transformation is its own transformation matrix which is called the inverse of A or A1.

As an example, if A were a counterclockwise rotation of 90, [0110] then,

A1 would be the matrix of a clockwise rotation of 90, [0110].

So in case 2, where A1 exists, like with functions where f1f(x)=x, you wind up where you started.

A1Ax=x and A1A=I=[1001]

Now we can solve for x by multiplying both sides of by A1 to get

A1Ax=A1vx=A1v

This can be interpreted as using v as the input vector and shifting its space using A1 to see what vector is the output, the x of interest.

A.13.0.2 Solutions with det(A)=0

When the system of equations has a transformation matrix A with det(A)=0, then the matrix squishes the space down into a volume of 0, effectively dropping the dimension of the space by one.

  • det(A)=0A1

However, there can still be a solution to the system of equations if the solution exists in the lower dimensional space.

As an example, if a 2-d matrix squishes down to a line, it could be true that v^ is in the span of that line.

In R3, it is also possible for a solution to exist if A squishes down the solution space by 2 dimensions to a plane or even 1 dimension to a line.

  • It would much hard to find a solution for the single line than for the plane even though both have det(A)=0.

To differentiate the different types of output in higher dimensions, we use the term Rank.

  • If the transformation matrix has an output of 1-d, the matrix has Rank = 1.
  • If the transformation matrix has an output of 2-d, the matrix has Rank = 2.
  • and so on.

The set of all possible outputs of Av is called the Column Space of A.

  • You can think about it as the number of columns in A where each represents the effect on the unit vectors.
  • The span of these vectors is all possible outputs, which by definition is the column space.
  • So, Rank is also the number of dimensions of the column space.
  • If a matrix has a Rank = the number of columns, which is as high as it could be, the matrix has Full Rank.

If a matrix has full rank, then the only vector that transforms to the origin is the 0 vector.

If a matrix has less than full rank, so it squishes down to a smaller dimension, then you can have a lot of vectors that transform to 0.

If a 2-d matrix squishes to a line, there is a second line, in a different direction, where all the vectors on the second line, get squished onto the origin.

This set of vectors (or planes in higher dimensions) that transform to the origin, to 0, are called the Null Space or Kernel of the matrix.

Null Space

In a system of linear equations, the Null Space of the matrix A gives all possible solutions to the system when Ax=0

A.14 Change of Bases

In , we discussed that there many possible basis vectors for a vector space.

The choice of basis vectors determines how to describe other vectors in terms of the origin, the direction of movement and the unit of distance.

As an example, i^ and j^ mean we interpret a vector [32] as saying the head of the vector can be found by moving three units horizontally to the right and two units vertically up from the origin.

This relationship between the numbers and a geographic interpretation as a vector is defined by the Coordinate System.

  • i^ and j^ are part of a “standard” Coordinate system with length one and horizontal and vertical basis vectors of length 1.

A.14.1 Differing Linear Coordinate Systems

Suppose someone else uses a different set of basis vectors, b1 and b2 where from the perspective of the standard coordinate system, b1 points up to the right at a slight angle, and b2 points up to the left at a slight angle.

What the standard system defines as [32] would be described as [5/31/3] in the system with basis vectors b1 and b2.

Important

The original vector has not moved - it is just being described from the perspective of a different coordinate system.

  • The origin is the same 0
  • It is the same approach of scaling each basis vector and adding the results.
  • However the orientation of the axes and the scaling of the units is different.
  • The choice of these is arbitrary and can be changed to provide a more visually or mathematically convenient perspective of a vector.
  • Consider a picture taken by a tilted camera and tilting your head a few degrees so it looks like a normal portrait or landscape perspective. The location of objects and the relationships among objects in the picture did not change, but it might be easier to interpret now that your are looking at it from a new perspective.

In this example, the standard coordinate system would describe b1=[21] and b2=[11]

In the alternate system, b1=[10] and b2=[01] and are the unit basis vectors.

A.14.2 Translating (Transforming) a Vector Between Coordinate Systems

Given two coordinate systems we can translate a vector from one representation to the other if we can describe the basis vectors in one system in terms of the other.

Assume there is a vector identified as [12] in the alternate system.

What would its description be in the standard coordinate system?

We know how to describe the basis vectors in the alternate system using standard coordinates so we can apply the scale from the alternate system to those.

1[21]+2[11]=[41]

This is just the same as using the basis vectors of the alternate system, as a transformation matrix to describe the change using the standard coordinate system.

Important

This matrix is called the Change of Basis matrix as it changes i^ and j^ to b1 and b2.

That is equivalent to changing the description of the vector based on scaling the basis vectors b1 and b2 to a description using the basis vectors i^ and j^ using the standard coordinate system.

  • The change of basis matrix allows us to describe a vector from the alternate system in terms of the standard coordinate system basis vectors i^ and j^.

[2111][12]=[41]

To transform from the standard to the alternate just requires using the inverse of the change of basis matrix.

  • To translate a standard coordinate vector [32], multiply it by the inverse of the change of basis matrix to get the description in terms of scaling the basis vectors b1 and b2.
  • We get the result mentioned earlier.

[2111]1=[1/31/31/32/3][1/31/31/32/3][32]=[5/31/3]

A.14.3 Translating (Transforming) a Matrix Between Coordinate Systems

We can also translate a matrix that describes a transformation for a vector written in one coordinate system so the translated matrix describes the same spatial transformation from the perspective of the basis vectors in the alternate coordinate system.

  • This means we can’t just multiply by the coordinate transformation matrix as that would still be describing the transformation in terms of i^ and j^.

Assume we have a transformation matrix [0110] to rotate a vector 90 to the left (counterclockwise) in the standard coordinate system. We want to apply that transformation to a vector in an alternate coordinate system where we know the change of basis matrix is [2111].

The following steps will translate a transformation matrix written in one system so it can describe the same transformation of a vector in an alternate coordinate system by using the change of basis matrix.

  1. Start with the vector written in the alternate coordinate system.
  2. Translate it to the standard coordinate system using the change of basis matrix.
  3. Transform it with the transformation matrix in the standard coordinate system.
  4. Translate it back to the alternate system using the inverse of the change of basis matrix.

[2111]14. Translate to Alternate[0110]3. Transform in Standard[2111]2. Translate to Standard (CofB)v1. Alternate Basis=[1/32/35/31/3] Transform in Alternate

The result of this series of matrix multiplications in the new matrix on the right which will now create the equivalent rotation of 90 left to any vector in the alternate coordinate system based in terms of scaling/shrinking the basis vectors b1 and b2.

Important

We can now choose to use different coordinate systems, and each system’s set of basis vectors’ and go back and forth in a way that makes it easier to do our analysis.

  • This can be thought of as similar to using a log transform to minimize numerical precision errors when multiplying a very large number and and very small small number on a computer.
    • We take the log of each number, add the logged values, and then use the anti-log (or exponentiation) to translate back to the original space to get the final result.

A.15 Eigenvectors and Eigenvalues of a Linear Transformation Matrix

A.15.1 Background

Eigenvectors and their associated eigenvalues have been used in mathematics to “simplify” the analysis of linear transformations since the 18th century.

  • Euler was using linear transformations in R3 to analyze the rotation of solid bodies centered on the origin.
  • He found that the linear transformation matrix describing the rotation also rotated most vectors.
  • However, the vector of the axis of rotation was special in that it did not rotate under the transformation.
  • Another way of saying this is the axis of rotation vector remains on its own span after the transformation and is not rotated off of it like most vectors.
  • Others studied these special vectors and determined that many (but not all) matrices of linear transformations may have one or more of these special vectors, which remain on their original span, and which may be scaled by the transformation but not rotated off the span.
  • In 1904, David Hilbert coined the term eigenvector (German for “own”) as each transformation matrix may have one or more of its “own” special vectors.
  • The eigenvalues are the scale factors associated with one more eigenvectors for a matrix.
Important

When a matrix has a set of eigenvectors that span the space, changing the coordinate system to use them as the basis vectors (an eigenbasis), greatly simplifies the transformation matrix.

  • Translating the transformation matrix into the eigenbasis coordinate system translates the original matrix into a diagonal matrix.
  • The columns now represent each eigenvector.
  • The values on the diagonals are the eigenvalues (scale factors) for each eigenvector!

A.15.2 Example

Let’s assume we have a transformation matrix [3102].

  • We can see from the matrix that i^ is special in that it is not rotated off of its span (the x axis). It is only scaled by a factor of 3.

  • Any other vector on the x axis is also scaled by a factor of 3.

  • Thus i^ is an eigenvector with eigenvalue 3.

  • It turns out the vector [12] is also not rotated off its span and is scaled by a factor of 2.

  • An other vector in its span is also not rotated off the span and is just scaled by a factor of 2.

  • Thus [12] is an eigenvector with eigenvalue 2.

Important

Using eigenvectors as the basis provides a way of describing a linear transformation that emphasizes the effects of the transformation without worrying about which coordinate system is being used to describe it.

A.15.3 Derivation

Given a transformation matrix A, the definition of an eigenvector v is that the output of A transforming v is a scaled version of v. That can be expressed as:

(A.8)Av=λeigenvaluev

To find the eigenvectors and eigenvalues of a matrix A, we have to solve .

We can do some rearranging to put into a matrix form.

(A.9)Av=λvAv=(λI)vAv(λI)v=0(AλI)v=0

The last line in says we are looking for a vectorv such that the new transformation matrix (AλI) applied to v maps to 0.

  • This means that we are looking for a vector v0 that is squished by (AλI) to 0.
  • That only happens if det(AλI)=0.
Important

We are thus looking for a λ such it causes (AλI) to be singular with determinant of zero.

As an example, if A=[2213] the value of λ that causes det(AλI)=det([2λ213λ])=0 is λ=1.

  • If we had chosen another matrix, the eigenvalue might not be 1.

Since det(A(1)I)=0 that means there exists a v0 where (AλI)v=0 or Av=(λ)v.

To check if a value of λ is an eigenvalue, you can substitute into det(AλI) and see if it equals 0.

Using the formula for a 2x2 matrix determinant we can compute det([3λ102λ]) for the original example of [3102].

  • That gives us the following:

det([3λ102λ])=(3λ)(2λ)0(1)=(3λ)(2λ)=0.

This a quadratic polynomial in λ where the roots of the polynomial are 3 and 2.

To figure out the eigenvectors with these eigenvalues, substitute the eigenvalue back into the [3λ102λ] and compute the vector.

For eigenvalue 2, we get

[321022][xy]=0[1100][xy]=0x+y=0{x=1,y=1}

  • So an eigenvector for λ=2 is v=[11] and any vector in the span of that line, x=y, is a solution for the equation.

Similarly, the eigenvector for λ=3 we get

[331023][xy]=0[0101][xy]=0y=0{x,y=0}

  • So an eigenvector for λ=3 is v=[10] and any vector in the span of that line, y=0, is a solution for the equation.
Note

Note not every matrix has eigenvectors.

A rotation of 90in R2 by definition rotates every vector off of its span.

  • The transformation matrix is [0110] and the roots of det(AλI)=0 are the roots of (λ)(λ)=1 so the roots are the complex numbers ±i.
  • Having only complex roots means there are no eigenvectors in Rn.

For a shear transformation in R2, the only eigenvector is i^ with eigenvalue 1. But i^ does not generate R2 so you can’t have an eigenbasis.

A.15.4 Eigenbases

Assume we have a transformation matrix in R2 with two eigenvectors that span the space, say v1=[10] and v2=[02].

Putting this into a transformation matrix, we have [1002] which is a diagonal matrix.

Diagnonal Matrices and Eigenvectors and Eigenvalues

Any time we have a diagonal matrix, where all off diagonal values = 0, we can interpret this as the matrix of an eigenbasis where each column is an eigenvector with an eigenvalue of the diagonal element for that column.

Diagonal matrices are often easier to work with than non-diagonal matrices.

  • As an example, raising a diagonal matrix to the power n simply requires raising the diagonal values to the power n and multiplying the vector.

If you have a transformation that has sufficient eigenvectors to create an eigenbasis, you can change to the eigenbasis to make your computations and then change back as desired as seen in .

Follow similar steps.

  1. Take the original transformation
  2. Put the eigenvectors into a change of basis matrix and put that on the right.
  3. Put the inverse of the eigenvector change of basis matrix on the left.

The resulting matrix is guaranteed to be diagonal which may make the computations easier.

A.15.5 Beyond Matrices

The concepts of eigenvectors and eigenvalues have been looked at from the perspective of matrices which represent linear transformations.

These matrices are functions that map an input vector to an output vector in Rn.

The concepts of eigenvectors and eigenvalues can be extended to operate in other functions that express linear transformation.