Vectors, Matricies, and Operations

Computational Mathematics and Statistics Camp University of Chicago September 2018

Linear algebra

  • Data stored in matricies
  • Higher dimensional spaces
    • Flood of big data, stored in many dimensions
  • Linear algebra
    • Algebra of matricies
    • Geometry of high dimensional space
    • Calculus (multivariable) in many dimensions
  • Very important for regression/machine learning/deep learning

Points and vectors

  • A point in \(\Re^1\)
    • \(1\)
    • \(\pi\)
    • \(e\)
  • An ordered pair in \(\Re^2 = \Re \times \Re\)
    • \((1,2)\)
    • \((0,0)\)
    • \((\pi, e)\)
  • An ordered triple in \(\Re^3 = \Re \times \Re \times \Re\)
    • \((3.1, 4.5, 6.1132)\)
  • An ordered \(n\)-tuple in \(R^n = \Re \times \Re \times \ldots \times \Re\)
    • \((a_{1}, a_{2}, \ldots, a_{n})\)

Points and vectors

A point \(\mathbf{x} \in \Re^{n}\) is an ordered n-tuple, \((x_{1}, x_{2}, \ldots, x_{n})\). The vector \(\mathbf{x} \in \Re^{n}\) is the arrow pointing from the origin \((0, 0, \ldots, 0)\) to \(\mathbf{x}\).

One dimensional example

Two dimensional example

Three dimensional example

  • (Latitude, Longitude, Elevation)
  • \((1,2,3)\)
  • \((0,1,2)\)

\(N\)-dimensional example

  • Individual campaign donation records \[\mathbf{x} = (1000, 0, 10, 50, 15, 4, 0, 0, 0, \ldots, 2400000000)\]
  • Counties have proportion of vote for Trump \[\mathbf{y} = (0.8, 0.5, 0.6, \ldots, 0.2)\]
  • Run experiment, assess feeling thermometer of elected official \[\mathbf{t} = (0, 100, 50, 70, 80, \ldots, 100)\]

Vector arithmetic

Suppose:

\[ \begin{aligned} \mathbf{u} & = (1, 2, 3, 4, 5) \nonumber \\ \mathbf{v} & = (1, 1, 1, 1, 1) \nonumber \\ k & = 2 \end{aligned} \]

Then,

\[ \begin{aligned} \mathbf{u} + \mathbf{v} & = (1 + 1, 2 + 1, 3+ 1, 4 + 1, 5+ 1) = (2, 3, 4, 5, 6)\nonumber \\ k \mathbf{u} & = (2 \times 1, 2 \times 2, 2 \times 3, 2 \times 4, 2 \times 5) = (2, 4, 6, 8, 10) \nonumber \\ k \mathbf{v} & = (2 \times 1,2 \times 1,2 \times 1,2 \times 1,2 \times 1) = (2, 2, 2, 2, 2) \end{aligned} \]

Inner product

\[ \begin{aligned} \mathbf{u} \cdot \mathbf{v} &= u_{1} v_{1} + u_{2}v_{2} + \ldots + u_{n} v_{n} \nonumber \\ & = \sum_{i=1}^{N} u_{i} v_{i} \nonumber \end{aligned} \]

Inner product

Suppose \(\mathbf{u} = (1, 2, 3)\) and \(\mathbf{v} = (2, 3, 1)\). Then:

\[ \begin{aligned} \mathbf{u} \cdot \mathbf{v} & = 1 \times 2 + 2 \times 3 + 3 \times 1 \\ & = 2+ 6 + 3 \\ & = 11 \end{aligned} \]

Calculating vector length

Vector norm

\[ \begin{aligned} \| \mathbf{v}\| & = (\mathbf{v} \cdot \mathbf{v} )^{1/2} \\ & = (v_{1}^2 + v_{2}^{2} + v_{3}^{2} + \ldots + v_{n}^{2} )^{1/2} \end{aligned} \]

  • Vector norm of a three-dimensional vector \(\mathbf{x} = (1,1,1)\)

    \[ \begin{aligned} \| \mathbf{x}\| & = (\mathbf{x} \cdot \mathbf{x} )^{1/2} \\ & = (x_{1}^2 + x_{2}^{2} + x_{3}^{2})^{1/2} \\ & = (1 + 1 + 1)^{1/2} \\ &= \sqrt{3} \end{aligned} \]

Example: text analysis

 a abandoned abc ability able about above abroad absorbed absorbing abstract
43         0   0       0    0    10     0      0        0         0        1
  • \((43,0,0,0,0,10,\dots)\)

Example: text analysis

\[ \begin{aligned} \text{Doc1} & = (1, 1, 3, \ldots, 5) \\ \text{Doc2} & = (2, 0, 0, \ldots, 1) \\ \textbf{Doc1}, \textbf{Doc2} & \in \Re^{M} \end{aligned} \]

Inner product

\[ \begin{aligned} \textbf{Doc1} \cdot \textbf{Doc2} & = (1, 1, 3, \ldots, 5) (2, 0, 0, \ldots, 1)' \\ & = 1 \times 2 + 1 \times 0 + 3 \times 0 + \ldots + 5 \times 1 \\ & = 7 \end{aligned} \]

Length

\[ \begin{aligned} \| \textbf{Doc1} \| & \equiv \sqrt{ \textbf{Doc1} \cdot \textbf{Doc1} } \\ & = \sqrt{(1, 1, 3, \ldots , 5) (1, 1, 3, \ldots, 5)' } \\ & = \sqrt{1^{2} +1^{2} + 3^{2} + 5^{2} } \\ & = 6 \end{aligned} \]

Cosine

\[ \begin{aligned} \cos (\theta) & \equiv \left(\frac{\textbf{Doc1} \cdot \textbf{Doc2}}{\| \textbf{Doc1}\| \|\textbf{Doc2} \|} \right) \\ & = \frac{7} { 6 \times 2.24} \\ & = 0.52 \end{aligned} \]

Measuring similarity

  • Why it is useful
  • What properties show a similarity measure have?
    • The maximum should be the document with itself
    • The minimum should be documents which have no words in common (i.e. orthogonal)
    • Increasing when more of the same words are used
    • Normalize for document length

Measure 1: inner product

\[(2,1) \cdot (1,4) = 6\]

Measure 1: inner product

\[(4,2) \cdot (1,4) = 12\]

Measure 2: cosine similarity

\[ \begin{aligned} (4,2) \cdot (1,4) &= 12 \\ \mathbf{a} \cdot \mathbf{b} &= \|\mathbf{a} \| \times \|\mathbf{b} \| \times \cos(\theta) \end{aligned} \]

Measure 2: cosine similarity

\[ \begin{aligned} \cos (\theta) & \equiv \left(\frac{\textbf{Doc1} \cdot \textbf{Doc2}}{\| \textbf{Doc1}\| \|\textbf{Doc2} \|} \right) \\ & = \frac{(2, 1) \cdot (1, 4)} {\| (2,1)\| \| (1,4) \|} \\ & = \frac{6} {(\sqrt{2^2 + 1^2}) (\sqrt{1^2 + 4^2})} \\ & = \frac{6} {(\sqrt{5}) (\sqrt{17})} \\ & \approx 0.65 \end{aligned} \]

\[ \begin{aligned} \cos (\theta) & \equiv \left(\frac{\textbf{Doc3} \cdot \textbf{Doc2}}{\| \textbf{Doc3}\| \|\textbf{Doc2} \|} \right) \\ & = \frac{(4,2) \cdot (1, 4)} {\| (24,2)\| \| (1,4) \|} \\ & = \frac{12} {(\sqrt{4^2 + 2^2}) (\sqrt{1^2 + 4^2})} \\ & = \frac{12} {(\sqrt{20}) (\sqrt{17})} \\ & \approx 0.65 \end{aligned} \]

Matricies

  • Rectangular arrangement (array) of numbers defined by two axes
    1. Rows
    2. Columns

      \[ \mathbf{A} = \begin{array}{rrrr} a_{11} & a_{12} & \ldots & a_{1n} \\ a_{21} & a_{22} & \ldots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \ldots & a_{mn} \\ \end{array} \]

      If \(\mathbf{A}\) has \(m\) rows \(n\) columns we will say that \(\mathbf{A}\) is an \(m \times n\) matrix. Suppose \(\mathbf{X}\) and \(\mathbf{Y}\) are \(m \times n\) matrices. Then \(\mathbf{X} = \mathbf{Y}\) if \(x_{ij} = y_{ij}\) for all \(i\) and \(j\).

Simple examples

\[ \mathbf{X} = \left[ \begin{array}{rrr} 1 & 2 & 3 \\ 2 & 1 & 4 \\ \end{array} \right] \]

\[ \mathbf{Y} = \left[ \begin{array}{rr} 1 & 2 \\ 3 & 2 \\ 1 & 4 \\ \end{array} \right] \]

Matrix addition

Suppose \(\mathbf{X}\) and \(\mathbf{Y}\) are \(m \times n\) matrices. Then:

\[ \begin{aligned} \mathbf{X} + \mathbf{Y} & = \begin{pmatrix} x_{11} & x_{12} & \ldots & x_{1n} \\ x_{21} & x_{22} & \ldots & x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & \ldots & x_{mn} \\ \end{pmatrix} + \begin{pmatrix} y_{11} & y_{12} & \ldots & y_{1n} \\ y_{21} & y_{22} & \ldots & y_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ y_{m1} & y_{m2} & \ldots & y_{mn} \\ \end{pmatrix} \nonumber \\ & = \begin{pmatrix} x_{11} + y_{11} & x_{12} + y_{12} & \ldots & x_{1n} + y_{1n} \\ x_{21} + y_{21} & x_{22} + y_{22} & \ldots & x_{2n} + y_{2n} \\ \vdots & \vdots & \ddots & \vdots\\ x_{m1} + y_{m1} & x_{m2} + y_{m2} & \ldots & x_{mn} + y_{mn} \\ \end{pmatrix} \end{aligned} \]

Scalar addition

Suppose \(\mathbf{X}\) is an \(m \times n\) matrix and \(k \in \Re\). Then:

\[ \begin{aligned} k \mathbf{X} & = \begin{pmatrix} k x_{11} & k x_{12} & \ldots & k x_{1n} \\ k x_{21} & k x_{22} & \ldots & k x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ k x_{m1} & k x_{m2} & \ldots & k x_{mn} \\ \end{pmatrix} \end{aligned} \]

Transposition

\[ \begin{aligned} \mathbf{X} & = \begin{pmatrix} x_{11} & x_{12} & \ldots & x_{1n} \\ x_{21} & x_{22} & \ldots & x_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ x_{m1} & x_{m2} & \ldots & x_{mn} \\ \end{pmatrix} \\ \mathbf{X}' & = \begin{pmatrix} x_{11} & x_{21} & \ldots & x_{m1} \\ x_{12} & x_{22} & \ldots & x_{m2} \\ \vdots & \vdots & \ddots & \vdots \\ x_{1n} & x_{2n} & \ldots & x_{mn} \end{pmatrix} \end{aligned} \]

  • If \(\mathbf{X}\) is an \(m \times n\) then \(\mathbf{X}'\) is \(n \times m\)
  • If \(\mathbf{X} = \mathbf{X}'\) then we say \(\mathbf{X}\) is symmetric

    \[\mathbf{X} = \begin{pmatrix} 4 & 1 & 2 \\1 & 2 & 3 \\ \end{pmatrix} \quad \mathbf{X}^{'} = \begin{pmatrix} 4 & 1 \\ 1 & 2 \\ 2 & 3 \\ \end{pmatrix}\]

Matrix multiplication

\[\mathbf{X} = \begin{pmatrix} 1 & 1 \\ 1& 1 \\ \end{pmatrix} , \quad \mathbf{Y} = \begin{pmatrix} 1 & 2 \\ 3 & 4 \\ \end{pmatrix} \]

Create a new matrix \(\mathbf{A}\) by matrix multiplication:

\[ \begin{aligned} \mathbf{A} & = \mathbf{X} \mathbf{Y} \\ & = \begin{pmatrix} 1 & 1 \\ 1 & 1 \\ \end{pmatrix} \begin{pmatrix} 1 & 2 \\ 3 & 4 \\ \end{pmatrix} \\ &= \begin{pmatrix} 1 \times 1 + 1 \times 3 & 1 \times 2 + 1 \times 4 \\ 1 \times 1 + 1 \times 3 & 1 \times 2 + 1 \times 4\\ \end{pmatrix} \\ &= \begin{pmatrix} 4 & 6 \\ 4 & 6 \end{pmatrix} \end{aligned} \]

Algebraic properties

  • Matricies must be conformable
  • Associative property: \((\mathbf{XY})\mathbf{Z} = \mathbf{X}(\mathbf{YZ})\)
  • Additive distributive property: \((\mathbf{X} + \mathbf{Y})\mathbf{Z} = \mathbf{XZ} + \mathbf{YZ}\)
  • Zero property: \(\mathbf{X0} = 0\)
  • Order matters: \(\mathbf{XY} \neq \mathbf{YX}\)
    • Different from scalar multiplication: \(xy = yx\)

What is deep learning?

Learning representations from data

The “deep” in deep learning

How is deep learning used

  • Self-driving cars
  • Voice activated assistants
  • Automatic machine translation
  • Image recognition
  • Detection of diseases

Tensor operations

  • Generalizations of matrix operations
  • Add tensors
  • Multiple tensors
  • If you can do it with a matrix, you can do it with a tensor

Geometric interpretation

Here is the linear algebra

\[\text{output} = \text{relu}([W \cdot \text{input}] + b)\]

  • input - the input data
  • dot() - dot product (tensor multiplication)
  • relu() - element-wise operations
    • Operations applied independently to each entry (cell/value) in the tensor
    • Highly scalable
  • \(W, b\) - tensors that are attributes of the layer
    • Weights
    • Parameters