Vectors, Matricies, and Operations

Computational Mathematics and Statistics Camp University of Chicago September 2018

Linear algebra

Data stored in matricies
Higher dimensional spaces
- Flood of big data, stored in many dimensions
Linear algebra
- Algebra of matricies
- Geometry of high dimensional space
- Calculus (multivariable) in many dimensions
Very important for regression/machine learning/deep learning

Points and vectors

A point in \(\Re^1\)
- \(1\)
- \(\pi\)
- \(e\)
An ordered pair in \(\Re^2 = \Re \times \Re\)
- \((1,2)\)
- \((0,0)\)
- \((\pi, e)\)
An ordered triple in \(\Re^3 = \Re \times \Re \times \Re\)
- \((3.1, 4.5, 6.1132)\)
An ordered \(n\)-tuple in \(R^n = \Re \times \Re \times \ldots \times \Re\)
- \((a_{1}, a_{2}, \ldots, a_{n})\)

Points and vectors

A point \(\mathbf{x} \in \Re^{n}\) is an ordered n-tuple, \((x_{1}, x_{2}, \ldots, x_{n})\). The vector \(\mathbf{x} \in \Re^{n}\) is the arrow pointing from the origin \((0, 0, \ldots, 0)\) to \(\mathbf{x}\).

One dimensional example

Two dimensional example

Three dimensional example

(Latitude, Longitude, Elevation)
\((1,2,3)\)
\((0,1,2)\)

\(N\)-dimensional example

Individual campaign donation records \[\mathbf{x} = (1000, 0, 10, 50, 15, 4, 0, 0, 0, \ldots, 2400000000)\]
Counties have proportion of vote for Trump \[\mathbf{y} = (0.8, 0.5, 0.6, \ldots, 0.2)\]
Run experiment, assess feeling thermometer of elected official \[\mathbf{t} = (0, 100, 50, 70, 80, \ldots, 100)\]

Vector arithmetic

Suppose:

\[ \begin{aligned} \mathbf{u} & = (1, 2, 3, 4, 5) \nonumber \\ \mathbf{v} & = (1, 1, 1, 1, 1) \nonumber \\ k & = 2 \end{aligned} \]

Then,

\[ \begin{aligned} \mathbf{u} + \mathbf{v} & = (1 + 1, 2 + 1, 3+ 1, 4 + 1, 5+ 1) = (2, 3, 4, 5, 6)\nonumber \\ k \mathbf{u} & = (2 \times 1, 2 \times 2, 2 \times 3, 2 \times 4, 2 \times 5) = (2, 4, 6, 8, 10) \nonumber \\ k \mathbf{v} & = (2 \times 1,2 \times 1,2 \times 1,2 \times 1,2 \times 1) = (2, 2, 2, 2, 2) \end{aligned} \]

Inner product

\[ \begin{aligned} \mathbf{u} \cdot \mathbf{v} &= u_{1} v_{1} + u_{2}v_{2} + \ldots + u_{n} v_{n} \nonumber \\ & = \sum_{i=1}^{N} u_{i} v_{i} \nonumber \end{aligned} \]

Inner product

Suppose \(\mathbf{u} = (1, 2, 3)\) and \(\mathbf{v} = (2, 3, 1)\). Then:

\[ \begin{aligned} \mathbf{u} \cdot \mathbf{v} & = 1 \times 2 + 2 \times 3 + 3 \times 1 \\ & = 2+ 6 + 3 \\ & = 11 \end{aligned} \]

Calculating vector length

Vector norm

\[ \begin{aligned} \| \mathbf{v}\| & = (\mathbf{v} \cdot \mathbf{v} )^{1/2} \\ & = (v_{1}^2 + v_{2}^{2} + v_{3}^{2} + \ldots + v_{n}^{2} )^{1/2} \end{aligned} \]

Vector norm of a three-dimensional vector \(\mathbf{x} = (1,1,1)\)

\[ \begin{aligned} \| \mathbf{x}\| & = (\mathbf{x} \cdot \mathbf{x} )^{1/2} \\ & = (x_{1}^2 + x_{2}^{2} + x_{3}^{2})^{1/2} \\ & = (1 + 1 + 1)^{1/2} \\ &= \sqrt{3} \end{aligned} \]

Example: text analysis

 a abandoned abc ability able about above abroad absorbed absorbing abstract
43         0   0       0    0    10     0      0        0         0        1

\((43,0,0,0,0,10,\dots)\)

Example: text analysis

\[ \begin{aligned} \text{Doc1} & = (1, 1, 3, \ldots, 5) \\ \text{Doc2} & = (2, 0, 0, \ldots, 1) \\ \textbf{Doc1}, \textbf{Doc2} & \in \Re^{M} \end{aligned} \]

Inner product

\[ \begin{aligned} \textbf{Doc1} \cdot \textbf{Doc2} & = (1, 1, 3, \ldots, 5) (2, 0, 0, \ldots, 1)' \\ & = 1 \times 2 + 1 \times 0 + 3 \times 0 + \ldots + 5 \times 1 \\ & = 7 \end{aligned} \]

Length

\[ \begin{aligned} \| \textbf{Doc1} \| & \equiv \sqrt{ \textbf{Doc1} \cdot \textbf{Doc1} } \\ & = \sqrt{(1, 1, 3, \ldots , 5) (1, 1, 3, \ldots, 5)' } \\ & = \sqrt{1^{2} +1^{2} + 3^{2} + 5^{2} } \\ & = 6 \end{aligned} \]

Cosine

\[ \begin{aligned} \cos (\theta) & \equiv \left(\frac{\textbf{Doc1} \cdot \textbf{Doc2}}{\| \textbf{Doc1}\| \|\textbf{Doc2} \|} \right) \\ & = \frac{7} { 6 \times 2.24} \\ & = 0.52 \end{aligned} \]

Measuring similarity

Why it is useful
What properties show a similarity measure have?
- The maximum should be the document with itself
- The minimum should be documents which have no words in common (i.e. orthogonal)
- Increasing when more of the same words are used
- Normalize for document length

Measure 1: inner product

\[(2,1) \cdot (1,4) = 6\]

Measure 1: inner product

\[(4,2) \cdot (1,4) = 12\]

Measure 2: cosine similarity

\[ \begin{aligned} (4,2) \cdot (1,4) &= 12 \\ \mathbf{a} \cdot \mathbf{b} &= \|\mathbf{a} \| \times \|\mathbf{b} \| \times \cos(\theta) \end{aligned} \]

Measure 2: cosine similarity

\[ \begin{aligned} \cos (\theta) & \equiv \left(\frac{\textbf{Doc1} \cdot \textbf{Doc2}}{\| \textbf{Doc1}\| \|\textbf{Doc2} \|} \right) \\ & = \frac{(2, 1) \cdot (1, 4)} {\| (2,1)\| \| (1,4) \|} \\ & = \frac{6} {(\sqrt{2^2 + 1^2}) (\sqrt{1^2 + 4^2})} \\ & = \frac{6} {(\sqrt{5}) (\sqrt{17})} \\ & \approx 0.65 \end{aligned} \]

\[ \begin{aligned} \cos (\theta) & \equiv \left(\frac{\textbf{Doc3} \cdot \textbf{Doc2}}{\| \textbf{Doc3}\| \|\textbf{Doc2} \|} \right) \\ & = \frac{(4,2) \cdot (1, 4)} {\| (24,2)\| \| (1,4) \|} \\ & = \frac{12} {(\sqrt{4^2 + 2^2}) (\sqrt{1^2 + 4^2})} \\ & = \frac{12} {(\sqrt{20}) (\sqrt{17})} \\ & \approx 0.65 \end{aligned} \]

Matricies

Rectangular arrangement (array) of numbers defined by two axes
1. Rows
2. Columns
  
  \[ \mathbf{A} = \begin{array}{rrrr} a_{11} & a_{12} & \ldots & a_{1n} \\ a_{21} & a_{22} & \ldots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \ldots & a_{mn} \\ \end{array} \]
  
  If \(\mathbf{A}\) has \(m\) rows \(n\) columns we will say that \(\mathbf{A}\) is an \(m \times n\) matrix. Suppose \(\mathbf{X}\) and \(\mathbf{Y}\) are \(m \times n\) matrices. Then \(\mathbf{X} = \mathbf{Y}\) if \(x_{ij} = y_{ij}\) for all \(i\) and \(j\).

Simple examples

\[ \mathbf{X} = \left[ \begin{array}{rrr} 1 & 2 & 3 \\ 2 & 1 & 4 \\ \end{array} \right] \]

\[ \mathbf{Y} = \left[ \begin{array}{rr} 1 & 2 \\ 3 & 2 \\ 1 & 4 \\ \end{array} \right] \]

Matrix addition

Suppose \(\mathbf{X}\) and \(\mathbf{Y}\) are \(m \times n\) matrices. Then:

\[ \begin{aligned} \mathbf{X} + \mathbf{Y} & = \begin{pmatrix} x_{11} & x_{12} & \ldots & x_{1n} \\ x_{21} & x_{22} & \ldots & x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & \ldots & x_{mn} \\ \end{pmatrix} + \begin{pmatrix} y_{11} & y_{12} & \ldots & y_{1n} \\ y_{21} & y_{22} & \ldots & y_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \ldots & y_{mn} \\ \end{pmatrix} \nonumber \\ & = \begin{pmatrix} x_{11} + y_{11} & x_{12} + y_{12} & \ldots & x_{1n} + y_{1n} \\ x_{21} + y_{21} & x_{22} + y_{22} & \ldots & x_{2n} + y_{2n} \\ \vdots & \vdots & \ddots & \vdots\\ x_{m1} + y_{m1} & x_{m2} + y_{m2} & \ldots & x_{mn} + y_{mn} \\ \end{pmatrix} \end{aligned} \]

Scalar addition

Suppose \(\mathbf{X}\) is an \(m \times n\) matrix and \(k \in \Re\). Then:

\[ \begin{aligned} k \mathbf{X} & = \begin{pmatrix} k x_{11} & k x_{12} & \ldots & k x_{1n} \\ k x_{21} & k x_{22} & \ldots & k x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ k x_{m1} & k x_{m2} & \ldots & k x_{mn} \\ \end{pmatrix} \end{aligned} \]

Transposition

\[ \begin{aligned} \mathbf{X} & = \begin{pmatrix} x_{11} & x_{12} & \ldots & x_{1n} \\ x_{21} & x_{22} & \ldots & x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & \ldots & x_{mn} \\ \end{pmatrix} \\ \mathbf{X}' & = \begin{pmatrix} x_{11} & x_{21} & \ldots & x_{m1} \\ x_{12} & x_{22} & \ldots & x_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ x_{1n} & x_{2n} & \ldots & x_{mn} \end{pmatrix} \end{aligned} \]

If \(\mathbf{X}\) is an \(m \times n\) then \(\mathbf{X}'\) is \(n \times m\)
If \(\mathbf{X} = \mathbf{X}'\) then we say \(\mathbf{X}\) is symmetric

\[\mathbf{X} = \begin{pmatrix} 4 & 1 & 2 \\1 & 2 & 3 \\ \end{pmatrix} \quad \mathbf{X}^{'} = \begin{pmatrix} 4 & 1 \\ 1 & 2 \\ 2 & 3 \\ \end{pmatrix}\]

Matrix multiplication

\[\mathbf{X} = \begin{pmatrix} 1 & 1 \\ 1& 1 \\ \end{pmatrix} , \quad \mathbf{Y} = \begin{pmatrix} 1 & 2 \\ 3 & 4 \\ \end{pmatrix} \]

Create a new matrix \(\mathbf{A}\) by matrix multiplication:

\[ \begin{aligned} \mathbf{A} & = \mathbf{X} \mathbf{Y} \\ & = \begin{pmatrix} 1 & 1 \\ 1 & 1 \\ \end{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \\ \end{pmatrix} \\ &= \begin{pmatrix} 1 \times 1 + 1 \times 3 & 1 \times 2 + 1 \times 4 \\ 1 \times 1 + 1 \times 3 & 1 \times 2 + 1 \times 4\\ \end{pmatrix} \\ &= \begin{pmatrix} 4 & 6 \\ 4 & 6 \end{pmatrix} \end{aligned} \]

Algebraic properties

Matricies must be conformable
Associative property: \((\mathbf{XY})\mathbf{Z} = \mathbf{X}(\mathbf{YZ})\)
Additive distributive property: \((\mathbf{X} + \mathbf{Y})\mathbf{Z} = \mathbf{XZ} + \mathbf{YZ}\)
Zero property: \(\mathbf{X0} = 0\)
Order matters: \(\mathbf{XY} \neq \mathbf{YX}\)
- Different from scalar multiplication: \(xy = yx\)

What is deep learning?

Learning representations from data

The “deep” in deep learning

How is deep learning used

Self-driving cars
Voice activated assistants
Automatic machine translation
Image recognition
Detection of diseases

Tensor operations

Generalizations of matrix operations
Add tensors
Multiple tensors
If you can do it with a matrix, you can do it with a tensor

Geometric interpretation

Here is the linear algebra

\[\text{output} = \text{relu}([W \cdot \text{input}] + b)\]

input - the input data
dot() - dot product (tensor multiplication)
relu() - element-wise operations
- Operations applied independently to each entry (cell/value) in the tensor
- Highly scalable
\(W, b\) - tensors that are attributes of the layer
- Weights
- Parameters

Vectors, Matricies, and Operations

Computational Mathematics and Statistics Camp University of Chicago September 2018

Linear algebra

Points and vectors

Points and vectors

One dimensional example

Two dimensional example

Three dimensional example

\(N\)-dimensional example

Vector arithmetic

Inner product

Inner product

Calculating vector length

Vector norm

Example: text analysis

Example: text analysis

Inner product

Length

Cosine

Measuring similarity

Measure 1: inner product

Measure 1: inner product

Measure 2: cosine similarity

Measure 2: cosine similarity

Matricies

Simple examples

Matrix addition

Scalar addition

Transposition

Matrix multiplication

Algebraic properties

What is deep learning?

Learning representations from data

The “deep” in deep learning

How is deep learning used

How is deep learning related to linear algebra?

Tensor operations

Geometric interpretation

Here is the linear algebra