Descriptive Statistics

Size: px

Start display at page:

Download "Descriptive Statistics"

Joel Cain
5 years ago
Views:

1 Descriptive Statistics DS GA 1002 Probability and Statistics for Data Science Carlos Fernandez-Granda

2 Descriptive statistics Techniques to visualize and summarize data Can often be interpreted within a probabilistic framework Often probabilistic assumptions do not hold, but techniques are still useful We describe them from a deterministic point of view

3 Histogram Empirical mean and variance Order statistics Empirical covariance Empirical covariance matrix

4 Histogram Technique to visualize one-dimensional data Bin range of the data, then count the number of instances in each bin The width of the bins can be adjusted to yield higher or lower resolution Approximation to their pmf or pdf if data are iid

5 Temperature in Oxford January August Degrees (Celsius)

6 GDP per capita of different countries Thousands of dollars

7 Histogram Empirical mean and variance Order statistics Empirical covariance Empirical covariance matrix

8 Empirical mean Let {x 1, x 2,..., x n } be a set of real-valued data The empirical mean is defined as av (x 1, x 2,..., x n ) := 1 n n i=1 x i Temperature data: 6.73 C in January and 21.3 C in August GDP per capita: $16 500

9 Empirical mean Let { x 1, x 2,..., x n } be a set of d-dimensional real-valued data The empirical mean is defined as av ( x 1, x 2,..., x n ) := 1 n n x i i=1

10 Centering Let { x 1, x 2,..., x n } be a set of d-dimensional real-valued data To center the data set we: 1. Compute the empirical mean 2. Subtract it from each vector y i := x i av ( x 1, x 2,..., x n ), 1 i n y 1,..., y n are centered at the origin

11 Centering Uncentered data Centered data

12 Empirical variance Let {x 1, x 2,..., x n } be a set of real-valued data The empirical variance is defined as var (x 1, x 2,..., x n ) := 1 n 1 n (x i av (x 1, x 2,..., x n )) 2 The empirical standard deviation is the square root of the empirical variance Temperature data: 1.99 C in January and 1.73 C in August GDP per capita: $ i=1

13 Histogram Empirical mean and variance Order statistics Empirical covariance Empirical covariance matrix

14 Temperature dataset In January the temperature in Oxford is around 6.73 C give or take 2 C

15 GDP dataset Countries typically have a GDP per capita of about $ give or take $25 300

16 Quantiles and percentiles Let x (1) x (2)... x (n) denote the ordered elements of a dataset {x 1, x 2,..., x n } The q quantile of the data for 0 < q < 1 is x ([q(n+1)]) [q (n + 1)] is the closest integer to q (n + 1) The 100 p quantile is known as the p percentile

17 Quartiles and median The 0.25 and 0.75 quantiles are the first and third quartiles The 0.5 quantile is the empirical median If n is even, the empirical median is usually set to x (n/2) + x (n/2+1) 2 The difference between the 3rd and 1st quartiles is the interquartile range (IQR)

18 Quartiles and median Temperature data (January): Sample mean: 6.73 C Median: 6.80 C Interquartile range: 2.9 C Temperature data (August): Sample mean: 21.3 C Median: 21.2 C Interquartile range: 2.1 C

19 Quartiles and median GDP per capita: Sample mean: $ (71% of the countries have lower GDP per capita!) Median: $6 350 Interquartile range: $ Five-number summary: $130, $1 960, $6 350, $20 100, $

20 Boxplot of temperature data Degrees (Celsius) January April August November

21 Boxplot of GDP data Thousands of dollars

22 Histogram Empirical mean and variance Order statistics Empirical covariance Empirical covariance matrix

23 Multidimensional data Each dimension represents a feature We can visualize two-dimensional data using scatter plots

24 Scatter plot April August

25 Scatter plot 20 Minimum temperature Maximum temperature

26 Empirical covariance Data: {(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )} The empirical covariance is defined as cov ((x 1, y 1 ),..., (x n, y n )) := 1 n 1 n (x i av (x 1,..., x n )) (y i av (y 1,..., y n )) i=1

27 Empirical correlation coefficient Data: {(x 1, y 1 ), (x 2, y 2 ),..., (x n, y n )} The empirical correlation coefficient is defined as ρ ((x 1, y 1 ),..., (x n, y n )) := cov ((x 1, y 1 ),..., (x n, y n )) std (x 1,..., x n ) std (y 1,..., y n ) Cauchy-Schwarz inequality: for any a, b 1 a T b a 2 b 2 1 Consequence: 1 ρ ((x 1, y 1 ),..., (x n, y n )) 1

28 ρ = April August

29 ρ = Minimum temperature Maximum temperature

30 Histogram Empirical mean and variance Order statistics Empirical covariance Empirical covariance matrix

31 Empirical covariance matrix Data: { x 1, x 2,..., x n } (d features) The empirical covariance matrix is defined as Σ ( x 1,..., x n ) := 1 n 1 n ( x i av ( x 1,..., x n )) ( x i av ( x 1,..., x n )) T i=1 The (i, j) entry, 1 i, j d, is given by { var (( x1 ) i,..., ( x n i ) if i = j, Σ ( x 1,..., x n ) ij = ) )) cov ((( x 1 ) i, ( x 1 ) j,..., (( x n ) i, ( x n ) j if i j.

32 Empirical variance in a certain direction Let v be a unit-norm vector aligned with a direction of interest ) var ( v T x 1,..., v T x n

33 Empirical variance in a certain direction Let v be a unit-norm vector aligned with a direction of interest ) var ( v T x 1,..., v T x n = 1 n 1 n i=1 ( )) 2 v T x i av ( v T x 1,..., v T x n

34 Empirical variance in a certain direction Let v be a unit-norm vector aligned with a direction of interest ) var ( v T x 1,..., v T x n = 1 n 1 = 1 n 1 n i=1 n i=1 ( )) 2 v T x i av ( v T x 1,..., v T x n ( ) 2 v T ( x i av ( x 1,..., x n ))

35 Empirical variance in a certain direction Let v be a unit-norm vector aligned with a direction of interest ) var ( v T x 1,..., v T x n = 1 n 1 = 1 n 1 ( = v T n i=1 n i=1 1 n 1 ( )) 2 v T x i av ( v T x 1,..., v T x n ( ) 2 v T ( x i av ( x 1,..., x n )) ) n ( x i av ( x 1,..., x n )) ( x i av ( x 1,..., x n )) T v i=1

36 Empirical variance in a certain direction Let v be a unit-norm vector aligned with a direction of interest ) var ( v T x 1,..., v T x n = 1 n 1 = 1 n 1 ( = v T n i=1 n i=1 1 n 1 ( )) 2 v T x i av ( v T x 1,..., v T x n ( ) 2 v T ( x i av ( x 1,..., x n )) ) n ( x i av ( x 1,..., x n )) ( x i av ( x 1,..., x n )) T v i=1 = v T Σ ( x 1,..., x n ) v

37 Eigendecomposition of the covariance matrix Let v be a unit-norm vector aligned with a direction of interest Σ ( x 1,..., x n ) = UΛU T λ = [ ] u 1 u 2 u n 0 λ 2 0 [ u1 u 2 ] T u n 0 0 λ n

38 Eigendecomposition of the covariance matrix For any symmetric matrix A R n with normalized eigenvectors u 1, u 2,..., u n and corresponding eigenvalues λ 1 λ 2... λ n λ 1 = max v 2 =1 v T A v u 1 = arg max v 2 =1 v T A v λ k = max v 2 =1, u u 1,..., u k 1 v T A v u k = arg max v T A v v 2 =1, u u 1,..., u k 1

39 Principal component analysis Compute eigenvectors of empirical covariance matrix to determine directions of maximum variation

40 Example: 2D data σ 1 n = σ 2 n = u 1 u 2

41 Example: 2D data σ 1 n = σ 2 n = u 1 u 2

42 Example: 2D data σ 1 n = σ 2 n = u 1 u 2

43 Centering is important! σ 1 n = σ 2 n = u 1 u 2

44 Centering is important! σ 1 n = σ 2 n = u 2 u 1

45 Dimensionality reduction Projection of data onto a lower-dimensional space Applications: Visualization / computational efficiency / denoising Example: Seeds from 3 varieties of wheat (Kama, Rosa and Canadian) 7 features: area, perimeter, compactness, length of kernel, width of kernel, asymmetry coefficient and length of kernel groove

46 PCA dimensionality reduction Projection onto second PC Projection onto first PC

47 PCA dimensionality reduction Projection onto dth PC Projection onto (d-1)th PC

48 Whitening Preprocessing procedure Linear transformation to eliminate skew in the data Enhances nonlinear structure After whitening, the data are uncorrelated

49 Whitening Let x 1,..., x n be a set of d-dimensional centered data with a full-rank covariance matrix. To whiten the data we 1. Compute the eigendecomposition of the empirical covariance matrix Σ ( x 1,..., x n ) = UΛU T 2. For i = 1,..., n set y i := Λ 1 U T x i, λ1 0 0 Λ := 0 λ λn

50 Whitening Σ ( y 1,..., y n )

51 Whitening Σ ( y 1,..., y n ) := 1 n 1 n i=1 y i y T i

52 Whitening Σ ( y 1,..., y n ) := 1 n 1 = 1 n 1 n i=1 y i y T i n 1 ( ) Λ U T 1 T x i Λ U T x i i=1

53 Whitening Σ ( y 1,..., y n ) := 1 n 1 = 1 n 1 n i=1 y i y T i n 1 ( ) Λ U T 1 T x i Λ U T x i i=1 = ( Λ 1 U T 1 n 1 ) n x i x i T U Λ 1 i=1

54 Whitening Σ ( y 1,..., y n ) := 1 n 1 = 1 n 1 n i=1 y i y T i n 1 ( ) Λ U T 1 T x i Λ U T x i i=1 = ( Λ 1 U T 1 n 1 ) n x i x i T U Λ 1 i=1 = Λ 1 U T Σ ( x 1,..., x n ) U Λ 1

55 Whitening Σ ( y 1,..., y n ) := 1 n 1 = 1 n 1 n i=1 y i y T i n 1 ( ) Λ U T 1 T x i Λ U T x i i=1 = ( Λ 1 U T 1 n 1 ) n x i x i T U Λ 1 i=1 = Λ 1 U T Σ ( x 1,..., x n ) U Λ 1 = Λ 1 U T U Λ ΛU T U Λ 1

56 Whitening Σ ( y 1,..., y n ) := 1 n 1 = 1 n 1 n i=1 y i y T i n 1 ( ) Λ U T 1 T x i Λ U T x i i=1 = ( Λ 1 U T 1 n 1 ) n x i x i T U Λ 1 i=1 = Λ 1 U T Σ ( x 1,..., x n ) U Λ 1 = Λ 1 U T U Λ ΛU T U Λ 1 = I

57 x

58 U T x

59 Λ 1 U T x

Statistical Data Analysis

DS-GA 0 Lecture notes 8 Fall 016 1 Descriptive statistics Statistical Data Analysis In this section we consider the problem of analyzing a set of data. We describe several techniques for visualizing the