Data Mining and Analysis: Fundamental Concepts and Algorithms

Size: px
Start display at page:

Download "Data Mining and Analysis: Fundamental Concepts and Algorithms"

Transcription

1 Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil Chapter 1: Data Mining and Analysis Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 1 / 3

2 Data Matrix Data can often be represented or abstracted as an n d data matrix, with n rows and d columns, given as X 1 X X d x 1 x 11 x 1 x 1d D = x x 1 x x d x n x n1 x n x nd Rows: Also called instances, examples, records, transactions, objects, points, feature-vectors, etc. Given as a d-tuple x i = (x i1, x i,...,x id ) Columns: Also called attributes, properties, features, dimensions, variables, fields, etc. Given as an n-tuple X j = (x 1j, x j,...,x nj ) Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis / 3

3 Iris Dataset Extract Sepal Sepal Petal Petal Class length width length width X 1 X X 3 X 4 X 5 x Iris-versicolor x Iris-versicolor x Iris-versicolor x Iris-setosa x Iris-versicolor x Iris-setosa x Iris-virginica x Iris-virginica x Iris-virginica x Iris-setosa Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 3 / 3

4 Attributes Attributes may be classified into two main types Numeric Attributes: real-valued or integer-valued domain Interval-scaled: only differences are meaningful e.g., temperature Ratio-scaled: differences and ratios are meaningful e..g,age Categorical Attributes: set-valued domain composed of a set of symbols Nominal: only equality is meaningful e.g., domain(sex) = {M, F} Ordinal: both equality (are two values the same?) and inequality (is one value less than another?) are meaningful e.g., domain(education) = {High School,BS,MS,PhD} Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 4 / 3

5 Data: Algebraic and Geometric View For numeric data matrix D, each row or point is a d-dimensional column vector: x i1 x i x i =. = ( x i1 x i ) T x id R d x id whereas each column or attribute is a n-dimensional column vector: X j = ( x 1j x j x nj ) T R n X3 X x 1 = (5.9, 3.0) T 3 x1 = (5.9, 3.0, 4.) T (a) R X 1 X (b) R 3 X Figure: Projections of x 1 = (5.9, 3.0, 4., 1.5) T in D and 3D Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 5 / 3

6 Scatterplot: D Iris Dataset sepal length versussepal width. Visualizing Iris dataset as points/vectors in D Solid circle shows the mean point X: sepal width X 1 : sepal length Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 6 / 3

7 Numeric Data Matrix If all attributes are numeric, then the data matrix D is an n d matrix, or equivalently a set of n row vectors x T i R d or a set of d column vectors X j R n x 11 x 1 x 1d x T 1 x 1 x x d D = = x T. = X 1 X X d x n1 x n x nd x T n The mean of the data matrix D is the average of all the points: mean(d) = µ = 1 n x i n i=1 The centered data matrix is obtained by subtracting the mean from all the points: x T 1 µ T x T 1 µ T z T 1 x T Z = D 1 µ T =. µ T. = x T µ T. = z T. x T n µ T x T n µ T z T n (1) where z i = x i µ is a centered point, and 1 R n is the vector of ones. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 7 / 3

8 Norm, Distance and Angle Given two points a, b R m, their dot product is defined as the scalar a T b = a 1 b 1 + a b + +a mb m m = a i b i i=1 The Euclidean norm or length of a vector a is defined as a = a T a = m i=1 The unit vector in the direction of a is u = a with a = 1. a a i Distance between a and b is given as a b = m (a i b i ) i=1 Angle between a and b is given as ( ) T ( ) cosθ = at b a b a b = a b X (1, 4) b θ a a b (5, 3) X1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 8 / 3

9 Orthogonal Projection Two vectors a and b are orthogonal iff a T b = 0, i.e., the angle between them is 90. Orthogonal projection of b on a comprises the vector p = b parallel to a, and r = b perpendicular or orthogonal to a, given as b = b + b = p+r where X p = b = ( a T ) b a T a a 4 b 3 r = b a 1 p = b Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 9 / 3 X 1

10 Projection of Centered Iris Data Onto a Line l. l X X 1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 10 / 3

11 Data: Probabilistic View The sample space of A is O = [4.3, 7.9], and its range is {0, 1}. Thus, A is discrete. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 11 / 3 A random variable X is a function X: O R, where O is the set of all possible outcomes of the experiment, also called the sample space. A discrete random variable takes on only a finite or countably infinite number of values, whereas a continuous random variable if it can take on any value in R. By default, a numeric attribute X j is considered as the identity random variable given as X(v) = v for all v O. Here O = R. Discrete Variable: Long Sepal Length Define random variable A, denoting long sepal length (7cm or more) as follows: { 0 if v < 7 A(v) = 1 if v 7

12 Probability Mass Function If X is discrete, the probability mass function of X is defined as f(x) = P(X = x) for all x R f must obey the basic rules of probability. That is, f must be non-negative: f(x) 0 and the sum of all probabilities should add to 1: f(x) = 1 x Intuitively, for a discrete variable X, the probability is concentrated or massed at only discrete values in the range of X, and is zero for all other values. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 1 / 3

13 Sepal Length: Bernoulli Distribution Iris Dataset Extract: sepal length (in centimeters) Define random variable A as follows: A(v) = { 0 if v < 7 1 if v 7 We find that only 13 Irises have sepal length of at least 7 cm. Thus, the probability mass function of A can be estimated as: f(1) = P(A = 1) = = = p and f(0) = P(A = 0) = 137 = = 1 p 150 A has a Bernoulli distribution with parameter p [0, 1], which denotes the probability of a success, that is, the probability of picking an Iris with a long sepal length at random from the set of all points. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 13 / 3

14 Sepal Length: Binomial Distribution Define discrete random variable B, denoting the number of Irises with long sepal length in m independent Bernoulli trials with probability of success p. In this case, B takes on the discrete values [0, m], and its probability mass function is given by the Binomial distribution f(k) = P(B = k) = ( m k ) p k (1 p) m k Binomial distribution for long sepal length (p = 0.087) for m = 10 trials P(B=k) k Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 14 / 3

15 Probability Density Function If X is continuous, the probability density function of X is defined as P ( X [a, b] ) = b a f(x) dx f must obey the basic rules of probability. That is, f must be non-negative: f(x) 0 and the sum of all probabilities should add to 1: f(x) dx = 1 Note that P(X = v) = 0 for all v R since there are infinite possible values in the sample space. What it means is that the probability mass is spread so thinly over the range of values that it can be measured only over intervals [a, b] R, rather than at specific points. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 15 / 3

16 Sepal Length: Normal Distribution We model sepal length via the Gaussian or normal density function, given as { } 1 (x µ) f(x) = exp πσ σ where µ = 1 n n i=1 x i is the mean value, and σ = 1 n n i=1 (x i µ) is the variance. Normal distribution for sepal length: µ = 5.84, σ = f(x) 0.5 µ±ǫ x Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 16 / 3

17 Cumulative Distribution Function For random variable X, its cumulative distribution function (CDF) F : R [0, 1], gives the probability of observing a value at most some given value x: F(x) = P(X x) for all < x < When X is discrete, F is given as F(x) = P(X x) = u x f(u) CDF for binomial distribution (p = 0.087, m = 10) F(x) CDF for the normal distribution (µ = 5.84,σ = 0.681) F(x) x When X is continuous, F is given as F(x) = P(X x) = x f(u) du (µ, F(µ)) = (5.84, 0.5) x Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 17 / 3

18 Bivariate Random Variable: Joint Probability Mass Function Iris: joint PMF for long sepal length and Define discrete random variables sepal width { 1 ifv 7 f (0, 0) = P(X 1 = 0, X = 0) = 116/150 = long sepal length:x 1 (v) = 0 otherwise f (0, 1) = P(X 1 = 0, X = 1) = 1/150 = { f (1, 0) = P(X 1 = 1, X = 0) = 10/150 = ifv 3.5 long sepal width:x (v) = f (1, 1) = P(X 0 otherwise 1 = 1, X = 1) = 3/150 = 0.00 The bivariate random variable ( ) X1 X = X has the joint probability mass function f(x) = P(X = x) i.e., f(x 1, x ) = P(X 1 = x 1, X = x ) X f(x) X Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 18 / 3

19 Bivariate Random Variable: Probability Density Function Bivariate Normal: modeling joint distribution for long sepal length (X 1 ) and sepal width (X ) 1 f(x µ,σ) = π exp Σ (x µ)t Σ 1 (x µ) Bivariate Normal µ = (5.843, 3.054) T ( ) Σ = where µ and Σ specify the D mean and covariance matrix: ( ) µ = (µ 1,µ ) T σ Σ = 1 σ 1 σ 1 σ f(x) with mean µ i = 1 n n k=1 x ki and covariance σ ij = 1 n k=1 (x ki µ i )(x kj µ j ). Also, X1 σi = σ ii X Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 19 / 3

20 Random Sample and Statistics Given a random variable X, a random sample of size n from X is defined as a set of n independent and identically distributed (IID) random variables S 1, S,...,S n The S i s have the same probability distribution as X, and are statistically independent. Two random variables X 1 and X are (statistically) independent if, for every W 1 R and W R, we have which also implies that P(X 1 W 1 and X W ) = P(X 1 W 1 ) P(X W ) F(x) = F(x 1, x ) = F 1 (x 1 ) F (x ) f(x) = f(x 1, x ) = f 1 (x 1 ) f (x ) where F i is the cumulative distribution function, and f i is the probability mass or density function for random variable X i. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 0 / 3

21 Multivariate Sample Given dataset D, the n data points x i (with 1 i n) constitute a d-dimensional multivariate random sample drawn from the vector random variable X = (X 1, X,...,X d ). Since the x i are assumed to be independent and identically distributed, their joint distribution is given as f(x 1, x,...,x n ) = n f X (x i ) where f X is the probability mass or density function for X. Assuming that the d attributes X 1, X,...,X d are statistically independent, the joint distribution for the entire dataset is given as: f(x 1, x,...,x n ) = i=1 n f(x i ) = i=1 n i=1 j=1 d f Xj (x ij ) Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 1 / 3

22 Sample Statistics Let {S i } m i=1 be a random sample of size m drawn from a (multivariate) random variable X. A statistic ˆθ is a function ˆθ: (S 1, S,...,S m) R The statistic is an estimate of the corresponding population parameterθ, where the population refers to the entire universe of entities under study. The statistic is itself a random variable. The sample mean is a statistic, defined as the average ˆµ = 1 n n i=1 x i For sepal length, we have ˆµ = 5.84, which is an estimator for the (unknown) true population mean sepal length. Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis / 3

23 Sample Statistics: Variance The sample variance is a statistic ˆσ = 1 n n (x i µ) i=1 For sepal length, we have ˆσ = The total variance is a multivariate statistic var(d) = 1 n n x i µ For the Iris data (with 4 attributes: sepal length and width, petal length and width), we have var(d) = i=1 Zaki & Meira Jr. (RPI and UFMG) Data Mining and Analysis Chapter 1: Data Mining and Analysis 3 / 3

Data Mining and Analysis

Data Mining and Analysis 978--5-766- - Data Mining and Analysis: Fundamental Concepts and Algorithms CHAPTER Data Mining and Analysis Data mining is the process of discovering insightful, interesting, and novel patterns, as well

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki Wagner Meira Jr. Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms : Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA 2 Department of Computer

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification

ISyE 6416: Computational Statistics Spring Lecture 5: Discriminant analysis and classification ISyE 6416: Computational Statistics Spring 2017 Lecture 5: Discriminant analysis and classification Prof. Yao Xie H. Milton Stewart School of Industrial and Systems Engineering Georgia Institute of Technology

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Classification Methods II: Linear and Quadratic Discrimminant Analysis

Classification Methods II: Linear and Quadratic Discrimminant Analysis Classification Methods II: Linear and Quadratic Discrimminant Analysis Rebecca C. Steorts, Duke University STA 325, Chapter 4 ISL Agenda Linear Discrimminant Analysis (LDA) Classification Recall that linear

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

An Introduction to Multivariate Methods

An Introduction to Multivariate Methods Chapter 12 An Introduction to Multivariate Methods Multivariate statistical methods are used to display, analyze, and describe data on two or more features or variables simultaneously. I will discuss multivariate

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Four Paradigms in Data Mining

Four Paradigms in Data Mining Four Paradigms in Data Mining dataminingbook.info Wagner Meira Jr. 1 1 Department of Computer Science Universidade Federal de Minas Gerais, Belo Horizonte, Brazil October 13, 2015 Meira Jr. (UFMG) Four

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

EE4601 Communication Systems

EE4601 Communication Systems EE4601 Communication Systems Week 2 Review of Probability, Important Distributions 0 c 2011, Georgia Institute of Technology (lect2 1) Conditional Probability Consider a sample space that consists of two

More information

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

Review of Probability. CS1538: Introduction to Simulations

Review of Probability. CS1538: Introduction to Simulations Review of Probability CS1538: Introduction to Simulations Probability and Statistics in Simulation Why do we need probability and statistics in simulation? Needed to validate the simulation model Needed

More information

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis. Linear Models DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis http://www.cims.nyu.edu/~cfgranda/pages/obda_fall17/index.html Carlos Fernandez-Granda Linear regression Least-squares estimation

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. A Probability Primer A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes. Are you holding all the cards?? Random Events A random event, E,

More information

Contents 1. Contents

Contents 1. Contents Contents 1 Contents 6 Distributions of Functions of Random Variables 2 6.1 Transformation of Discrete r.v.s............. 3 6.2 Method of Distribution Functions............. 6 6.3 Method of Transformations................

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

L2: Review of probability and statistics

L2: Review of probability and statistics Probability L2: Review of probability and statistics Definition of probability Axioms and properties Conditional probability Bayes theorem Random variables Definition of a random variable Cumulative distribution

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

Random Variables Example:

Random Variables Example: Random Variables Example: We roll a fair die 6 times. Suppose we are interested in the number of 5 s in the 6 rolls. Let X = number of 5 s. Then X could be 0, 1, 2, 3, 4, 5, 6. X = 0 corresponds to the

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Introduction to Probability and Stocastic Processes - Part I

Introduction to Probability and Stocastic Processes - Part I Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Recitation. Shuang Li, Amir Afsharinejad, Kaushik Patnaik. Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014

Recitation. Shuang Li, Amir Afsharinejad, Kaushik Patnaik. Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014 Recitation Shuang Li, Amir Afsharinejad, Kaushik Patnaik Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014 Probability and Statistics 2 Basic Probability Concepts A sample space S is the set of all possible

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch. School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:

More information

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations Chapter 5 Statistical Models in Simulations 5.1 Contents Basic Probability Theory Concepts Discrete Distributions Continuous Distributions Poisson Process Empirical Distributions Useful Statistical Models

More information

1 Review of Probability and Distributions

1 Review of Probability and Distributions Random variables. A numerically valued function X of an outcome ω from a sample space Ω X : Ω R : ω X(ω) is called a random variable (r.v.), and usually determined by an experiment. We conventionally denote

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

LEC 4: Discriminant Analysis for Classification

LEC 4: Discriminant Analysis for Classification LEC 4: Discriminant Analysis for Classification Dr. Guangliang Chen February 25, 2016 Outline Last time: FDA (dimensionality reduction) Today: QDA/LDA (classification) Naive Bayes classifiers Matlab/Python

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Chapter 2. Discrete Distributions

Chapter 2. Discrete Distributions Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation

More information

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models Fatih Cavdur fatihcavdur@uludag.edu.tr March 20, 2012 Introduction Introduction The world of the model-builder

More information

Introduction to Normal Distribution

Introduction to Normal Distribution Introduction to Normal Distribution Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 17-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Introduction

More information

Lecture Notes 1: Vector spaces

Lecture Notes 1: Vector spaces Optimization-based data analysis Fall 2017 Lecture Notes 1: Vector spaces In this chapter we review certain basic concepts of linear algebra, highlighting their application to signal processing. 1 Vector

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

The Gaussian distribution

The Gaussian distribution The Gaussian distribution Probability density function: A continuous probability density function, px), satisfies the following properties:. The probability that x is between two points a and b b P a

More information

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables. Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just

More information

Slides 8: Statistical Models in Simulation

Slides 8: Statistical Models in Simulation Slides 8: Statistical Models in Simulation Purpose and Overview The world the model-builder sees is probabilistic rather than deterministic: Some statistical model might well describe the variations. An

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS PRINCIPAL COMPONENTS ANALYSIS Iris Data Let s find Principal Components using the iris dataset. This is a well known dataset, often used to demonstrate the effect of clustering algorithms. It contains

More information

Probability Distributions Columns (a) through (d)

Probability Distributions Columns (a) through (d) Discrete Probability Distributions Columns (a) through (d) Probability Mass Distribution Description Notes Notation or Density Function --------------------(PMF or PDF)-------------------- (a) (b) (c)

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Probability Distributions: Continuous

Probability Distributions: Continuous Probability Distributions: Continuous INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber FEBRUARY 28, 2017 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions:

More information

Formulas for probability theory and linear models SF2941

Formulas for probability theory and linear models SF2941 Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

15 Discrete Distributions

15 Discrete Distributions Lecture Note 6 Special Distributions (Discrete and Continuous) MIT 4.30 Spring 006 Herman Bennett 5 Discrete Distributions We have already seen the binomial distribution and the uniform distribution. 5.

More information

LINEAR ALGEBRA - CHAPTER 1: VECTORS

LINEAR ALGEBRA - CHAPTER 1: VECTORS LINEAR ALGEBRA - CHAPTER 1: VECTORS A game to introduce Linear Algebra In measurement, there are many quantities whose description entirely rely on magnitude, i.e., length, area, volume, mass and temperature.

More information

Bayesian Classification Methods

Bayesian Classification Methods Bayesian Classification Methods Suchit Mehrotra North Carolina State University smehrot@ncsu.edu October 24, 2014 Suchit Mehrotra (NCSU) Bayesian Classification October 24, 2014 1 / 33 How do you define

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

Lecture Notes 6: Linear Models

Lecture Notes 6: Linear Models Optimization-based data analysis Fall 17 Lecture Notes 6: Linear Models 1 Linear regression 1.1 The regression problem In statistics, regression is the problem of characterizing the relation between a

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Definitions and Properties of R N

Definitions and Properties of R N Definitions and Properties of R N R N as a set As a set R n is simply the set of all ordered n-tuples (x 1,, x N ), called vectors. We usually denote the vector (x 1,, x N ), (y 1,, y N ), by x, y, or

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition A Brief Mathematical Review Hamid R. Rabiee Jafar Muhammadi, Ali Jalali, Alireza Ghasemi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Probability theory

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

STAT 730 Chapter 1 Background

STAT 730 Chapter 1 Background STAT 730 Chapter 1 Background Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 27 Logistics Course notes hopefully posted evening before lecture,

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

In most cases, a plot of d (j) = {d (j) 2 } against {χ p 2 (1-q j )} is preferable since there is less piling up

In most cases, a plot of d (j) = {d (j) 2 } against {χ p 2 (1-q j )} is preferable since there is less piling up THE UNIVERSITY OF MINNESOTA Statistics 5401 September 17, 2005 Chi-Squared Q-Q plots to Assess Multivariate Normality Suppose x 1, x 2,..., x n is a random sample from a p-dimensional multivariate distribution

More information

Methods for sparse analysis of high-dimensional data, II

Methods for sparse analysis of high-dimensional data, II Methods for sparse analysis of high-dimensional data, II Rachel Ward May 23, 2011 High dimensional data with low-dimensional structure 300 by 300 pixel images = 90, 000 dimensions 2 / 47 High dimensional

More information

2. The Sample Mean and the Law of Large Numbers

2. The Sample Mean and the Law of Large Numbers 1 of 11 7/16/2009 6:05 AM Virtual Laboratories > 6. Random Samples > 1 2 3 4 5 6 7 2. The Sample Mean and the Law of Large Numbers The Sample Mean Suppose that we have a basic random experiment and that

More information

Statistical Data Analysis

Statistical Data Analysis DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the

More information

Intro Vectors 2D implicit curves 2D parametric curves. Graphics 2012/2013, 4th quarter. Lecture 2: vectors, curves, and surfaces

Intro Vectors 2D implicit curves 2D parametric curves. Graphics 2012/2013, 4th quarter. Lecture 2: vectors, curves, and surfaces Lecture 2, curves, and surfaces Organizational remarks Tutorials: TA sessions for tutorial 1 start today Tutorial 2 will go online after lecture 3 Practicals: Make sure to find a team partner very soon

More information

Intro Vectors 2D implicit curves 2D parametric curves. Graphics 2011/2012, 4th quarter. Lecture 2: vectors, curves, and surfaces

Intro Vectors 2D implicit curves 2D parametric curves. Graphics 2011/2012, 4th quarter. Lecture 2: vectors, curves, and surfaces Lecture 2, curves, and surfaces Organizational remarks Tutorials: Tutorial 1 will be online later today TA sessions for questions start next week Practicals: Exams: Make sure to find a team partner very

More information

Continuous Distributions

Continuous Distributions Continuous Distributions 1.8-1.9: Continuous Random Variables 1.10.1: Uniform Distribution (Continuous) 1.10.4-5 Exponential and Gamma Distributions: Distance between crossovers Prof. Tesler Math 283 Fall

More information