MATH 10 INTRODUCTORY STATISTICS

Similar documents
MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS

MATH 10 INTRODUCTORY STATISTICS

Math 147 Lecture Notes: Lecture 12

REVIEW 8/2/2017 陈芳华东师大英语系

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

Math 10 - Compilation of Sample Exam Questions + Answers

Math 31 Lesson Plan. Day 2: Sets; Binary Operations. Elizabeth Gillaspy. September 23, 2011

HUDM4122 Probability and Statistical Inference. February 2, 2015

Practical Statistics

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

Lecture 6. Statistical Processes. Irreversibility. Counting and Probability. Microstates and Macrostates. The Meaning of Equilibrium Ω(m) 9 spins

What Causality Is (stats for mathematicians)

PS2.1 & 2.2: Linear Correlations PS2: Bivariate Statistics

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Linear Models in Econometrics

Data files for today. CourseEvalua2on2.sav pontokprediktorok.sav Happiness.sav Ca;erplot.sav

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

ES103 Introduction to Econometrics

The regression model with one stochastic regressor (part II)

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

PHYS 275 Experiment 2 Of Dice and Distributions

EXAMINATION: QUANTITATIVE EMPIRICAL METHODS. Yale University. Department of Political Science

Lecture 13: Covariance. Lisa Yan July 25, 2018

Lecture 27. DATA 8 Spring Sample Averages. Slides created by John DeNero and Ani Adhikari

SCATTERPLOTS. We can talk about the correlation or relationship or association between two variables and mean the same thing.

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

Lecture 1: Probability Fundamentals

Econ 1123: Section 2. Review. Binary Regressors. Bivariate. Regression. Omitted Variable Bias

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

Statistics Introductory Correlation

( ) P A B : Probability of A given B. Probability that A happens

appstats27.notebook April 06, 2017

Biostatistics and Design of Experiments Prof. Mukesh Doble Department of Biotechnology Indian Institute of Technology, Madras

Course Introduction and Overview Descriptive Statistics Conceptualizations of Variance Review of the General Linear Model

Kernel-based density. Nuno Vasconcelos ECE Department, UCSD

STAT 285 Fall Assignment 1 Solutions

Econ 325: Introduction to Empirical Economics

Chapter 27 Summary Inferences for Regression

psyc3010 lecture 2 factorial between-ps ANOVA I: omnibus tests

Lecture 1: Asymptotic Complexity. 1 These slides include material originally prepared by Dr.Ron Cytron, Dr. Jeremy Buhler, and Dr. Steve Cole.

Linear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?

Harvard University. Rigorous Research in Engineering Education

The Simple Linear Regression Model

Descriptive Statistics (And a little bit on rounding and significant digits)

Geological Foundations of Environmental Sciences

Homework 2. Spring 2019 (Due Thursday February 7)

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

value of the sum standard units

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Correlation. Patrick Breheny. November 15. Descriptive statistics Inference Summary

PS2: Two Variable Statistics

Machine Learning CSE546 Sham Kakade University of Washington. Oct 4, What about continuous variables?

STA 291 Lecture 16. Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately) normal

P (E) = P (A 1 )P (A 2 )... P (A n ).

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Chapter 18. Sampling Distribution Models. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Wed, June 26, (Lecture 8-2). Nonlinearity. Significance test for correlation R-squared, SSE, and SST. Correlation in SPSS.

For a list of topics, look over the previous review sheets. Since the last quiz we have... Benford s Law. When does it appear? How do people use it?

MATHE 4800C FOUNDATIONS OF ALGEBRA AND GEOMETRY CLASS NOTES FALL 2011

CHEM 115 Lewis Structures Model

Student Questionnaire (s) Main Survey

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Important note: Transcripts are not substitutes for textbook assignments. 1

Introductory Probability

Math 31 Lesson Plan. Day 5: Intro to Groups. Elizabeth Gillaspy. September 28, 2011

What is proof? Lesson 1

Draft Proof - Do not copy, post, or distribute. Chapter Learning Objectives REGRESSION AND CORRELATION THE SCATTER DIAGRAM

Data Analysis and Statistical Methods Statistics 651

PHY131H1F Class 3. From Knight Chapter 1:

Correlation. We don't consider one variable independent and the other dependent. Does x go up as y goes up? Does x go down as y goes up?

Correlation and regression

Applied Regression Analysis

CRP 272 Introduction To Regression Analysis

- measures the center of our distribution. In the case of a sample, it s given by: y i. y = where n = sample size.

Chapter 18. Sampling Distribution Models /51

Chapter 2. Mean and Standard Deviation

Part III: Unstructured Data

15-451/651: Design & Analysis of Algorithms September 13, 2018 Lecture #6: Streaming Algorithms last changed: August 30, 2018

CMPSCI 240: Reasoning about Uncertainty

Lecture 1. Econ 2001: Introduction to Mathematical Methods (a.k.a. Math Camp) 2015 August 10

One-sample categorical data: approximate inference

MS-E2112 Multivariate Statistical Analysis (5cr) Lecture 1: Introduction, Multivariate Location and Scatter

Lecture 10: Powers of Matrices, Difference Equations

1.1 Administrative Stuff

Ch. 1: Data and Distributions

Chapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals

Experiment 2 Random Error and Basic Statistics

Slide 7.1. Theme 7. Correlation

Statistical inference

Statistics 1. Edexcel Notes S1. Mathematical Model. A mathematical model is a simplification of a real world problem.

CS 124 Math Review Section January 29, 2018

Physics 121, Spring 2008 Mechanics. Physics 121, Spring What are we going to talk about today? Physics 121, Spring Goal of the course.

Probability and Statistics

Math 105 Course Outline

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A.

Probability (Devore Chapter Two)

Warm-up Using the given data Create a scatterplot Find the regression line

Quantitative Analysis II PUBPOL 528C Spring 2017

Transcription:

MATH 10 INTRODUCTORY STATISTICS Ramesh Yapalparvi

It is Time for Homework! ( ω `) First homework + data will be posted on the website, under the homework tab. And also sent out via email. 30% weekly homework. Will give out first one next Tues, due the following Tues. Each homework might have different points assigned but carry the same weight. Your other homework : read and understand the relevant chapters in the textbook.

Addendum Sum of Variance for Uncorrelated Variables % Chapter 3, Section 19 :! "±$ =! % % " +! $ (population version) % Curiously, the textbook left out the sample version: ' "±$ = ' " % + ' $ % First, let s talk about the population case.

Addendum Sum of Variance for Uncorrelated Variables %! "±$ =! % % " +! $ My sloppy example last lecture: let s pretend our two populations NH and CA has the same size ' (of course they don t). Let X = income of a person in NH, and Y = income of a person in CA. When comparing two populations, we usually care about the differences in their mean. E.g. if we have population data, we can easily calculate * + = * " * $ to see if * + > 0.

Addendum Sum of Variance for Uncorrelated Variables %! "±$ =! % % " +! $ My sloppy example last lecture: let s pretend our two populations NH and CA has the same size ' (of course they don t). Let X = income of a person in NH, and Y = income of a person in CA. When comparing two populations, we usually care about the differences in their mean. E.g. if we have population data, we can easily calculate * + = * " * $ to see if * + > 0. New variable / = 0 1. Then, * + = * " * $ by our calculation last lecture. However, / 2 = 0 2 + 1 2 depends choosing a pairing of 0 2, 1 2. For example, if this pairing is done with people in similar industries or job types, there would probably be some correlation (since both NH and CA are in the USA).

Addendum Sum of Variance for Uncorrelated Variables %! "±$ =! % % " +! $ My sloppy example last lecture: let s pretend our two populations NH and CA has the same size ' (of course they don t). Let X = income of a person in NH, and Y = income of a person in CA. When comparing two populations, we usually care about the differences in their mean. E.g. if we have population data, we can easily calculate * + = * " * $ to see if * + > 0. New variable / = 0 1. Then, * + = * " * $ by our calculation last lecture. What I forgot to mention is that / 2 = 0 2 + 1 2 depends choosing a pairing of 0 2, 1 2. For example, if this pairing is done with people in similar industries or job types, there would probably be some correlation (since both NH and CA are in the USA). You can get zero correlation by pairing them randomly. When we do bivariate/pair data today, we will see that in the context of chapter 3, this population formula is actually not very meaningful.

Addendum Sum of Variance for Uncorrelated Variables Now, let s talk about the sampling case. %! "±$ =! % % " +! $ Note: the textbook did not give the sample version of this formula in chapter 3. The sampling process can affect whether the variables X and Y are correlated. Your textbook offered a way to make sure ' and ( are uncorrelated: Pick a ' ) at random, then pick a ( ) at random. Pair them together (' ), ( ) ). Since the pairing is done randomly, ' and ( will be uncorrelated. Random pairing makes a lot more sense when it comes to samples. E.g. difference between two dice rolls. Not so useful for comparing two populations (chapter 9 we have a more powerful tool call the sampling distribution )

Some comments related to last lecture Medians in the Wild In this course we define the median to be the middle value or the average of the middle two values, of a ranked list. The official definition is actually number such that 50% of the data is below this value. So, many numbers can be a median. Estimator of the Variance, and of the Standard Deviation For this course, the estimator of the variance is,! " = (&'()* Also, the estimator of the standard deviation (SD) is!. While Bessel s Correction (use of - 1) removes the underestimation in the former case.! unfortunately actually overestimates the population SD. +',

Week 2 Chapter 4 Bivariate Data ß today s lecture Data with two/paired variables, Pearson correlation coefficient and its properties, general variance sum law Chapter 6 Research Design Data collection, sampling bias, causation. ß today s lecture Chapter 5 Probability Probability, gambler s fallacy, permutations and combinations, binomial distribution, Bayes theorem.

Chapter 4 Bivariate Data Data with 2 quantitative variables for each individual / object. In practice, you often have k quantitative variables for each individual. E.g. using *cough* Facebook *cough*, we might have access to this set of data about each person: ( age, gender, location, number of friends )

Chapter 4 Bivariate Data Dot or scatter plots from chapter 2 are great for visualizing bivariate data. Important distinction : linear vs non-linear relationship.

Pearson Correlation : Chapter 4 Section 3 We can eyeball an intuitive difference between these three sets of data. How can we quantify this difference?

Pearson Correlation : Chapter 4 Section 3 Super official name: Pearson product-moment correlation coefficient. Usual name: correlation coefficient, r. 1 $ 1 $ = 1 perfect positive linear relationship $ = 1 perfect negative linear relationship $ = 0 no linear relationship Warning: only applicable for linear relationships! à next slide

Pearson Correlation : Chapter 4 Section 3 Warning: only applicable for linear relationships! There are other mathematical measures of bivariate relationships. Guessing r : Chapter 4 Section 4, important skill for the exam.

Properties of r : Chapter 4 Section 5 Also, important for the exam. Symmetric: correlation coefficient of X and Y = correlation coefficient of Y and X (you will see why this matters when we give you the formula later) Unaffected by linear transformations. (useful when working with different units of measurement) Warning: affected by non-linear transformation!

Computing r : Chapter 4 Section 6 Population version Sample version! = r = ($%& ' )()%& * ) ($%& ' ) + ()%& * ) + ($%, ' )()%, * ) ($%, ' ) + ()%, * ) + For paired data (-, /). Curiously, the textbook uses! later but does not talk about computing!.

A quick word about restricting the range of your data. Chapter 4 Section 7 A possible exam question might be: what might happen to the value of r if you throw away outliers as shown in the figure à Usually, more data is better than less data. If you are throwing away data, you need a very good reason. If you are throwing away data just to create stronger evidence, to support your hypothesis don t let anyone know I taught you statistics. Please? :p

Variance Sum Law : Chapter 4 Section 8 % Population version:! "±$ =! % " +! % $ ± 2(! "! $ % Sample version: ) "±$ = ) % " + ) % $ ± 2*) " ) $ These formulas and the exercises in the book assumed that the data comes with a pairing (bivariate data).

Break Time! \o/

Week 2 Chapter 4 Bivariate Data ß today s lecture Data with two/paired variables, Pearson correlation coefficient and its properties, general variance sum law Chapter 6 Research Design Data collection, sampling bias, causation. ß today s lecture Chapter 5 Probability Probability, gambler s fallacy, permutations and combinations, binomial distribution, Bayes theorem.

Research Design, Chapter 6 Theories in science can never be proved since one can never be 100% certain that a new empirical finding inconsistent with the theory will never be found. Scientific theories must be potentially falsifiable. If a theory can accommodate all possible results then it is not a scientific theory. Therefore, a scientific theory should lead to testable hypotheses. If a hypothesis derived from a theory is confirmed then the theory has survived a test and it becomes more useful and better thought of by the researchers in the field. A theory is not confirmed when correct hypotheses are derived from it. In so far as a scientific statement speaks about reality, it must be falsifiable; and in so far as it is not falsifiable, it does not speak about reality. - Karl Popper

Public Service Announcement: We are skipping Chapter 6, Section 3, titled Measurement.

Sampling Bias, Chapter 6, Section 5 For the exams, you do not have to memorize the names of these biases. But you should be able to recognize when a bias could occur. Self Selection Bias : asking people to nominate themselves, not randomizing treatment and control group. Under-coverage Bias : the way you collect samples could discourage certain groups from responding. E.g. the poor not having telephones example in the textbook. Survivorship Bias : famous world war 2 aircraft armor example in the textbook, studying only people who perform well in gambling or the stock market.

Sampling Bias, Chapter 6, Section 5 For the exams, you do not have to memorize the names of these biases. But you should be able to recognize when a bias could occur. Self Selection Bias : asking people to nominate themselves, not randomizing treatment and control group. Under-coverage Bias : the way you collect samples could discourage certain groups from responding. E.g. the poor not having telephones example in the textbook. Survivorship Bias : famous world war 2 aircraft armor example in the textbook, studying only people who perform well in gambling or the stock market.

Experimental Design, Chapter 6, Section 6 Memorize: no. Understand: yes. Between-Subjects : different groups used for different possibility of the independent variable. Within-Subject : same subject tested with all possibilities of the independent variables. Both cases : want to study the relationship between the independent variable and the dependent variable. Counter-balancing : in a within-subject design, the order which you test the possibilities might affect the result. Need to randomize the order. Multi-Factor Between Subject : two or more independent variables.

Causation, Chapter 6, Section 7 Correlation does not imply causation : http://www.tylervigen.com/spurious-correlations

Causation, Chapter 6, Section 7 Correlation does not imply causation Confounding variable How can we establish causation? This is a deep philosophical question. Even defining causation is difficult. From a statistics point of view: we use statistics to guide us towards possible theories. Then, try to explain why the theory should be true. E.g. smoking is correlated with cancer, but this is supported by scientific understanding of the underlying mechanism.