MATH 10 INTRODUCTORY STATISTICS

Similar documents
MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS

Math 10 - Compilation of Sample Exam Questions + Answers

Math 31 Lesson Plan. Day 2: Sets; Binary Operations. Elizabeth Gillaspy. September 23, 2011

Math 147 Lecture Notes: Lecture 12

REVIEW 8/2/2017 陈芳华东师大英语系

SCATTERPLOTS. We can talk about the correlation or relationship or association between two variables and mean the same thing.

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Part III: Unstructured Data

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics

Statistics Introductory Correlation

The regression model with one stochastic regressor (part II)

Practical Statistics

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

HUDM4122 Probability and Statistical Inference. February 2, 2015

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

Lecture 1: Probability Fundamentals

Correlation. Patrick Breheny. November 15. Descriptive statistics Inference Summary

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

PS2: Two Variable Statistics

What Causality Is (stats for mathematicians)

PHYS 275 Experiment 2 Of Dice and Distributions

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

Lecture 4: Constructing the Integers, Rationals and Reals

Fourier and Stats / Astro Stats and Measurement : Stats Notes

Upon completion of this chapter, you should be able to:

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Chapter 9: Sampling Distributions

appstats27.notebook April 06, 2017

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

Do students sleep the recommended 8 hours a night on average?

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Linear Models in Econometrics

Chapter 6. September 17, Please pick up a calculator and take out paper and something to write with. Association and Correlation.

What is proof? Lesson 1

Econ 325: Introduction to Empirical Economics

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Chapter 27 Summary Inferences for Regression

Math Review Sheet, Fall 2008

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

( ) P A B : Probability of A given B. Probability that A happens

1.1 Administrative Stuff

The Simple Linear Regression Model

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

CRP 272 Introduction To Regression Analysis

Math101, Sections 2 and 3, Spring 2008 Review Sheet for Exam #2:

Lecture 1. Econ 2001: Introduction to Mathematical Methods (a.k.a. Math Camp) 2015 August 10

Harvard University. Rigorous Research in Engineering Education

Unit 10: Simple Linear Regression and Correlation

Solving Quadratic & Higher Degree Equations

1.1 The Language of Mathematics Expressions versus Sentences

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Applied Regression Analysis

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Descriptive Statistics (And a little bit on rounding and significant digits)

Measuring Associations : Pearson s correlation

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

MATHE 4800C FOUNDATIONS OF ALGEBRA AND GEOMETRY CLASS NOTES FALL 2011

Chapter 7. Scatterplots, Association, and Correlation. Copyright 2010 Pearson Education, Inc.

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

Discrete Structures Proofwriting Checklist

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

CHEM 115 Lewis Structures Model

For a list of topics, look over the previous review sheets. Since the last quiz we have... Benford s Law. When does it appear? How do people use it?

Final Exam - Solutions

Measures of Central Tendency. Mean, Median, and Mode

MATH 118 FINAL EXAM STUDY GUIDE

Time Allowed: 120 minutes Maximum points: 80 points. Important Instructions:

11 Correlation and Regression

# of 6s # of times Test the null hypthesis that the dice are fair at α =.01 significance

Probability and Statistics

For all For every For each For any There exists at least one There exists There is Some

9. Linear Regression and Correlation

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Why is the field of statistics still an active one?

AP Statistics Semester I Examination Section I Questions 1-30 Spend approximately 60 minutes on this part of the exam.

Do not copy, post, or distribute

Lecture 13: Covariance. Lisa Yan July 25, 2018

Homework 2. Spring 2019 (Due Thursday February 7)

MATH 22 FUNCTIONS: ORDER OF GROWTH. Lecture O: 10/21/2003. The old order changeth, yielding place to new. Tennyson, Idylls of the King

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

Correlation and regression

Lecture 1: Asymptotic Complexity. 1 These slides include material originally prepared by Dr.Ron Cytron, Dr. Jeremy Buhler, and Dr. Steve Cole.

1/19 2/25 3/8 4/23 5/24 6/11 Total/110 % Please do not write in the spaces above.

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter

CORELATION - Pearson-r - Spearman-rho

Lecture 3. Big-O notation, more recurrences!!

Chapter 18. Sampling Distribution Models /51

Sampling Distributions. Introduction to Inference

Lecture 27. DATA 8 Spring Sample Averages. Slides created by John DeNero and Ani Adhikari

Transcription:

MATH 10 INTRODUCTORY STATISTICS Tommy Khoo Your friendly neighbourhood graduate student.

It is Time for Homework! ( ω `) First homework + data will be posted on the website, under the homework tab. And also sent out via email. 30% weekly homework. Will give out first today, due the following Tues. Each homework might have different points assigned but carry the same weight. Your other homework: read and understand the relevant chapters in the textbook.

Week 2 Chapter 4 Bivariate Data today s lecture Data with two/paired variables, Pearson correlation coefficient and its properties, general variance sum law Chapter 6 Research Design Data collection, sampling bias, causation. today s lecture Chapter 5 Probability Probability, gambler s fallacy, permutations and combinations, binomial distribution, Bayes theorem.

Quick Recap on Important Parts of Last Week Central Tendency Mean - minimizes sum of squared deviations. Balances the scale because deviations of the mean sums to zero. Median - minimizes sum of absolute deviations. The Median is unique for this course but there is a general definition that allows for more than one median. More robust to outliers than mean. Mode - can have no modes or multiple modes. Common usage of the word bi-modal often does not strictly uses the definition of a mode.

Quick Recap on Important Parts of Last Week Variability Population variance - σ 2 = σ(x μ)2, Take square root to get population standard deviation. N Estimator of the variance - s 2 = σ(x M)2 N 1 Dividing by N underestimates the variance. Bessel s Correction. Estimator of the standard deviation - s, is actually an overestimate!

Sum of Variance for Uncorrelated Variables leads in to chapter 4 Chapter 3, Section 19 : 2 σ X±Y = σ 2 2 X + σ Y (population version) Curiously, the textbook left out the sample version: 2 s X±Y = s 2 2 X + s Y First, let s talk about the population case. In the context of chapter 3, X and Y are (usually) data for two different populations. The math works but it is actually rather contrived/artificial (will explain later). Context of chapter 4 makes more sense / is more natural: paired bivariate data.

Sum of Variance for Uncorrelated Variables leads in to chapter 4 2 σ X±Y = σ 2 2 X + σ Y My sloppy example last lecture: let s pretend our two populations NH and CA has the same size N (of course they don t). Let X = income of a person in NH, and Y = income of a person in CA. When comparing two populations, we usually care about the differences in their mean. E.g. if we have population data, we can easily calculate μ Z = μ X μ Y to see if μ Z > 0. Later in this course you will see that most comparison of two populations are based on this approach: whether there is a difference in population means.

Sum of Variance for Uncorrelated Variables leads in to chapter 4 2 σ X±Y = σ 2 2 X + σ Y New variable Z = X Y. Then, μ Z = μ X μ Y by our calculation last lecture. However, Z i = X i Y i depends on choosing a pairing of X i, Y i. For example, if this pairing is done with people in similar industries or job types, there would probably be some correlation (since both NH and CA are in the USA). You can get zero correlation by pairing them randomly but this is an artificial choice. Chapter 4 : paired bivariate data the pairing is natural / given to us.

Sum of Variance for Uncorrelated Variables leads in to chapter 4 Now, let s talk about the sample case. 2 s X±Y = s 2 2 X + s Y Note: the textbook did not give the sample version of this formula in chapter 3. The sampling process can affect whether the variables X and Y are correlated. Your textbook offered a way to make sure X and Y are uncorrelated: Pick a X i at random, then pick a Y i at random. Pair them together (X i, Y i ). Since the pairing is done randomly, X and Y will be uncorrelated.

Sum of Variance for Uncorrelated Variables leads in to chapter 4 2 s X±Y = s 2 2 X + s Y The sampling process can affect whether the variables X and Y are correlated. Your textbook offered a way to make sure X and Y are uncorrelated: Pick a X i at random, then pick a Y i at random. Pair them together (X i, Y i ). Since the pairing is done randomly, X and Y will be uncorrelated. Random pairing makes a lot more sense when it comes to samples. E.g. difference between two dice rolls. Not so useful for comparing two populations. As mentioned previously, in chapter 9 will see that we care more about estimating the difference (if any) between the two population means. We will have a powerful new tool in chapter 9 : the sampling distribution of the mean.

Chapter 4 Bivariate Data Data with 2 quantitative variables for each individual / object. In practice, you often have k quantitative variables for each individual. E.g. using *cough* Facebook *cough*, we might have access to this set of data about each person: ( age, gender, location, number of friends )

Chapter 4 Bivariate Data Dot or scatter plots from chapter 2 are great for visualizing bivariate data. Important distinction : linear vs non-linear relationship.

Very awesome example of bivariate data and dot/scatter plot from: https://cosmos-book.github.io/human-spaceflight/index.html

Pearson Correlation : Chapter 4 Section 3 We can eyeball an intuitive difference between these three sets of data. How can we quantify this difference?

Pearson Correlation : Chapter 4 Section 3 Super official name: Pearson product-moment correlation coefficient. Usual name: correlation coefficient, r. 1 r 1 r = 1 perfect positive linear relationship r = 1 perfect negative linear relationship r = 0 no linear relationship Warning: only applicable for linear relationships! next slide

Pearson Correlation : Chapter 4 Section 3 Warning: only applicable for linear relationships! There are other mathematical measures of bivariate relationships. Guessing r : Chapter 4 Section 4, important skill for the exam.

Properties of r : Chapter 4 Section 5 Also, important for the exam. Symmetric: correlation coefficient of X and Y = correlation coefficient of Y and X (you will see why this matters when we give you the formula later) Unaffected by linear transformations ax + b, when a > 0. (useful when working with different units of measurement) Warning: affected by non-linear transformation!

Break Time! \o/ See you back here after 10 minutes. Cat meme kindly provided by: lovecuteanimals.com/lol/24268/

Computing r : Chapter 4 Section 6 Population version Sample version ρ = r = σ(x μ X )(Y μ Y ) σ(x μ X ) 2 σ(y μ Y ) 2 σ(x M X )(Y M Y ) σ(x M X ) 2 σ(y M Y ) 2 For paired data (X, Y). Curiously, the textbook uses ρ later but does not talk about computing ρ.

Chapter 4 Section 7 A possible exam question might be: what might happen to the value of r if you throw away outliers as shown in the figure A quick word about restricting the range of your data... Usually, more data is better than less data. If you are throwing away data, you need a very good reason. If you are throwing away data just to create stronger evidence, to support your hypothesis don t let anyone know I taught you statistics. Please? :p

Variance Sum Law : Chapter 4 Section 8 Population version: Sample version: 2 σ X±Y = σ 2 X + σ 2 Y ± 2ρσ X σ Y 2 s X±Y = s 2 X + s 2 Y ± 2rs X s Y These formulas and the exercises in the book assumed that the data comes with a pairing (bivariate data). We will tell you how the data is paired or it will come with an obvious pairing. I will be surprised if you had to choose a pairing in this course.

Week 2 Chapter 4 Bivariate Data today s lecture Data with two/paired variables, Pearson correlation coefficient and its properties, general variance sum law Chapter 6 Research Design Data collection, sampling bias, causation. today s lecture Chapter 5 Probability Probability, gambler s fallacy, permutations and combinations, binomial distribution, Bayes theorem.

Research Design, Chapter 6 Theories in science can never be proved since one can never be 100% certain that a new empirical finding inconsistent with the theory will never be found. Scientific theories must be potentially falsifiable. If a theory can accommodate all possible results then it is not a scientific theory. Therefore, a scientific theory should lead to testable hypotheses. If a hypothesis derived from a theory is confirmed then the theory has survived a test and it becomes more useful and better thought of by the researchers in the field. A theory is not confirmed when correct hypotheses are derived from it. In so far as a scientific statement speaks about reality, it must be falsifiable; and in so far as it is not falsifiable, it does not speak about reality. - Karl Popper

Public Service Announcement: We are skipping Chapter 6, Section 3, titled Measurement.

Sampling Bias, Chapter 6, Section 5 For the exams, you do not have to memorize the names of these biases. But you should be able to recognize when a bias could occur. Self Selection Bias : asking people to nominate themselves, not randomizing treatment and control group. Under-coverage Bias : the way you collect samples could discourage certain groups from responding. E.g. the poor not having telephones example in the textbook. Survivorship Bias : famous world war 2 aircraft armor example in the textbook, studying only people who perform well in gambling or the stock market.

Experimental Design, Chapter 6, Section 6 Public service announcement: we have decided to nuke this section from orbit (it s the only way to be sure). Memorize: no. Understand: yes. Between-Subjects : different groups used for different possibility of the independent variable. Within-Subject : same subject tested with all possibilities of the independent variables. Both cases : want to study the relationship between the independent variable and the dependent variable. Counter-balancing : in a within-subject design, the order which you test the possibilities might affect the result. Need to randomize the order. Multi-Factor Between Subject : two or more independent variables.

Causation, Chapter 6, Section 7 Correlation does not imply causation : http://www.tylervigen.com/spurious-correlations

Causation, Chapter 6, Section 7 Correlation does not imply causation Confounding/ Third variable How can we establish causation? This is a deep philosophical question. Even defining causation is difficult. From a statistics point of view: we use statistics to guide us towards possible theories. Then, try to explain why the theory should be true. E.g. smoking is correlated with cancer, but this is supported by scientific understanding of the underlying mechanism.