Introduction to computational social science, notation, and functions

Computational Mathematics and Statistics Camp University of Chicago September 2018

Why we are here

  • Social science
    • Anthropology
    • Economics
    • History
    • Political science
    • Psychology
    • Sociology
    • Comparative human development(?)

Computational social science

Computational social science

Source: GivenTheData

Acquiring CSS skills

  • Computer science
  • Social science
  • Math/statistics

Mathematics

  • Purely abstract
  • Based on axioms that are independent from the real world
  • If the axioms are accepted, mathematical inferences are certain
  • Language for expressing structure and relationships
  • Generally proof-based

Probability

  • Systematic and rigorous method for treating uncertainty
  • “Mathematical models of uncertain reality”
  • Derivation of “applied mathematics”

Statistics

  • Practice or science of collecting and analyzing numerical data in large quantities, especially for the purpose of inferring proportions in a whole from those in a sample
  • Making inferences from data that are not entirely certain
  • Based on mathematical models, but with deviations from the model (not deterministic)

Their uses

  • Mathematical models
    • Game theory
    • Formal theory
    • Much bigger in economics
    • Defining statistical models and relationships
  • Probability/statistics
    • Establishing a structure for relationships between variables using data
    • Inferring relationships and assessing their validity

Goals for the camp

  • Computational Mathematics and Statistics Camp
  • Survey math and statistical tools that are foundational to CSS
  • Review common mathematical notation
  • Apply math/statistics methods
  • Prepare students to pass the MACSS math/stats placement exam

Course logistics

All course materials are located on GitHub

Course staff

  • Me
  • Teaching assistants
    • Sanja Miklin
    • Nora Nickels

Prerequisites

  • No formal prerequisites
  • Most likely to succeed
    • Linear algebra
    • Calculus
    • Probability theory
    • Statistical inference
  • Assume prior exposure

Alternatives to this camp

  • SOSC 30100 - Mathematics for Social Sciences (aka the Hansen math camp)
  • MACS 30100 - Mathematics and Statistics for Computational Social Science
  • Departmental methods sequences
  • Math/stats departments

Evaluation

  • Pass/fail (no credit)
  • Grades no longer matter (or should not)
  • Learn as much material as possible
  • If you truly only care about learning material, you’ll get amazing grades

Pedagogical approach

  • Flipped-classroom design
  • Instructional staff is to be your “sherpa” and guide you along your journey

Computational tools for the future

  • Vision is open-source
  • Shift away from proprietary formats (e.g. SPSS, Stata, SAS)
  • Emphasis on reproducibility
    • Code
    • Results
    • Analysis
    • Publication

Programming languages

Programming languages

  • Python
  • R
  • Things R does well
    • Statistical analysis
    • Data visualization
  • Things R does not do as well
    • Speed
  • Things Python does well
  • Things Python does not do as well
    • Visualizations
    • Add-on libraries

Version control

  • Revisions in research
  • Tracking revisions
  • Multiple copies
    • analysis-1.r
    • analysis-2.r
    • analysis-3.r
  • Cloud storage (e.g. Dropbox, Google Drive, Box)
  • Version control software
    • Repository
    • Git

Move away from WYSIWYG

  • What You See Is What You Get
  • Microsoft Word or Google Docs
  • Not a reproducible format

Notebooks

  • Integrate code, output, and written text
  • Reproducible
    • Rerun the notebook to regenerate all the output
  • Good for prototyping and exploratory data analysis
  • For Python - Jupyter Notebooks
  • For R - R Markdown

\(\LaTeX\)

  • High-quality typesetting system
  • De facto standard for production of technical and scientific documentation
  • Free software
  • Renders documents as PDFs
  • Makes typesetting easy

    $$f(x) = \frac{\exp(-\frac{(x - \mu)^2}{2\sigma^2} )}{ \sqrt{2\pi \sigma^2}}$$

    \[f(x) = \frac{\exp(-\frac{(x - \mu)^2}{2\sigma^2} )}{ \sqrt{2\pi \sigma^2}}\]

  • Tables/figures/general typesetting/nice presentations - easier in \(\LaTeX\)
  • Steep learning curve up front, but leads to big dividends later

Markdown

  • Lightweight markup language with plain text formatting syntax
  • Easy to convert to HTML, PDF, and more
  • Used commonly on GitHub documentation, Jupyter Notebooks, R Markdown, and more
  • Simplified syntax compared to \(\LaTeX\) - also less flexibility
  • Publishing formats

How you will acquire these skills

Irony - you won’t learn any of that here. No time to teach programming AND math AND statistics in 3 weeks.

  • Python
  • R
  • Git
  • \(\LaTeX\)/Markdown/notebooks

Why is math important to social science

  • Consistent language to communicate ideas in an orderly and systematic way
  • Science uses highly precise language that is not easily interpretable to outsiders
  • Mathematics is an effective way to describe our world
  • Mathematical notation lets us convey precision and minimizes the risk of misinterpretation by other scholars

Example: Rational voter theory

  • Rational voters should weigh the rewards vs. costs of voting
  • Mathematical notation

    \[R = PB - C\]

    • \(R =\) the utility satisfaction of voting
    • \(P =\) the actual probability that the voter will affect the outcome with her particular vote
    • \(B =\) the perceived difference in benefits between the two candidates measured in utiles (units of utility)
    • \(C =\) the actual cost of voting in utiles (e.g. time, effort, money)
  • Does the model seem reasonable?
  • What implications does the model provide?

Example: Rational voter theory

Implication Formal statement
If individuals do not get enough benefit from voting, they will abstain The voter will abstain if \(R > 0\).
Individuals have other things to do on election day (like going to work). If the benefit of voting is not as large as alternative benefits, then individuals will abstain. The voter may still not vote even if \(R > 0\) if there exist other competing activities that produce a higher \(R\).
Most elections have thousands, if not millions, of ballots cast. There is no point to voting since any individual ballot is unlikely to change the outcome of the election. Therefore everyone should abstain. If \(P\) is very small, then it is unlikely that this individual will vote.
  • Leads to the paradox of voting
  • Clearly a question worth answering in political science
  • Pure mathematical model may not be fully accurate, but allows us to delve deeper into the paradox

Equations

  • An equation “equates” two quantities - they are arithmetically identical
    • \(R = PB - C\)
    • \(R\) and \(PB - C\) are exactly equal to one another
  • Equations do not need to be equalities
    • Greater than/less than
    • Approximately equal

Functions

  • A mapping which gives a correspondence from one measure onto exactly one other for that value
  • Mapping from one defined space to another, such as \(f \colon \Re \rightarrow \Re\)

    \[f(x) = x^2 - 1\]

    • Maps \(x\) to \(f(x)\) by squaring \(x\) and subtracting 1

Not a function

  • All functions are relations, but not all relations are functions
  • Functions have exactly one value returned by \(f(x)\) for each value of \(x\)
  • Relations may have more than one value returned

Not a function

Not a function

Two major properties of functions

\[f(x) = y\]

  1. Continuous
  2. Invertible

    \[g^{-1}(y) = x, \text{where } g^{-1}(g(x)) = x\]

Not all functions are continuous and invertible

\[ f(x) = \left\{ \begin{array}{ll} \frac{1}{x} & \quad x \neq 0 \text{ and } x \text{ is rational}\\ 0 & \quad \text{otherwise} \end{array} \right. \]

Not all functions are continuous and invertible

Not all functions are continuous and invertible

Why this is important

  • Functions must be continuous AND invertible to calculate derivatives in calculus
  • Important for optimization and solving for parameter values in modeling strategies
  • Non-continuous functions - cannot do much about
  • Non-invertible functions
    • Can make invertible by restricting the domain

Restrict the domain

Logarithms and exponents

  • Important component to many mathematical and statistical methods in social science
  • Exponents - repeatedy multiply a number by itself
  • Logarithms - reverse of an exponent

Common rules of exponents

  • \(x^0 = 1\)
  • \(x^1 = x\)
  • \(\left ( \frac{x}{y} \right )^a = \left ( \frac{x^a}{y^a}\right ) = x^a y^{-a}\)
  • \((x^a)^b = x^{ab}\)
  • \((xy)^a = x^a y^a\)
  • \(x^a \times x^b = x^{a+b}\)

Logarithms

  • Class of functions
    • \(\log_{b}(x) = a \Rightarrow b^a = x\)
    • What number \(a\) solves \(b^a = x\)
  • Commonly used bases
    • Base 10

      \[\log_{10}(100) = 2 \Rightarrow 10^2 = 100\]

      \[\log_{10}(0.1) = -1 \Rightarrow 10^{-1} = 0.1\]
    • Base 2

      \[\log_{2}(8) = 3 \Rightarrow 2^3 = 8\]

      \[\log_{2}(1) = 0 \Rightarrow 2^0 = 1\]

Logarithms

  • Base \(e\) - Euler’s number (aka a natural logarithm)

    \[\log_{e}(e) = 1 \Rightarrow e^1 = e\]

    • Natural logarithms are incredibly useful in math
    • Often \(\log()\) is assumed to be a natural log
    • Also seen as \(\ln()\)
  • Rules of logarithms
    • \(\log_b(1) = 0\)
    • \(\log(x \times y) = \log(x) + \log(y)\)
    • \(\log(\frac{x}{y}) = \log(x) - \log(y)\)
    • \(\log(x^y) = y \log(x)\)