Statistics Introduction to Probability

Similar documents
Statistics Introduction to Probability

Probability Review - Bayes Introduction

Statistics 220 Bayesian Data Analysis

Math 493 Final Exam December 01

MATH 450: Mathematical statistics

Econometrics I G (Part I) Fall 2004

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

MAT 271E Probability and Statistics

Lecture 14 Bayesian Models for Spatio-Temporal Data

IEOR 3106: Introduction to Operations Research: Stochastic Models. Professor Whitt. SOLUTIONS to Homework Assignment 2

Math 447. Introduction to Probability and Statistics I. Fall 1998.

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Bayesian Models in Machine Learning

Seasonal Climate Watch June to October 2018

Lecture 5: Introduction to Probability

Learning Bayesian belief networks

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

6.4 Type I and Type II Errors

NRCSE. Statistics, data, and deterministic models

Seasonal Climate Watch July to November 2018

Data Mining Techniques. Lecture 3: Probability

Introduction to Machine Learning CMU-10701

Probability- describes the pattern of chance outcomes

Probability (Devore Chapter Two)

Instructor Dr. Tomislav Pintauer Mellon Hall Office Hours: 1-2 pm on Thursdays and Fridays, and by appointment.

Chapter 8: An Introduction to Probability and Statistics

Introduction and Overview STAT 421, SP Course Instructor

COPYRIGHTED MATERIAL CONTENTS. Preface Preface to the First Edition

STATISTICS 3A03. Applied Regression Analysis with SAS. Angelo J. Canty

Introduction to Machine Learning

All of this information is on the course home page in courseworks:

Lecture 6 Probability

MATH-3150H-A: Partial Differential Equation 2018WI - Peterborough Campus

PHYS 480/580 Introduction to Plasma Physics Fall 2017

COS 341: Discrete Mathematics

Fundamental Probability and Statistics

Lecture 1: Probability Fundamentals

MTH 163, Sections 40 & 41 Precalculus I FALL 2015

MAT 271E Probability and Statistics

Randomized Algorithms

Stats Probability Theory

Lecture 2: Repetition of probability theory and statistics

Great Theoretical Ideas in Computer Science

Machine Learning CSE546

Econ 325: Introduction to Empirical Economics

Bayesian analysis in nuclear physics

Stellar Astronomy 1401 Spring 2009

AS 101: The Solar System (Spring 2017) Course Syllabus

S n = x + X 1 + X X n.

MAT 271E Probability and Statistics

MATH/STAT 3360, Probability

PHYSICAL GEOLOGY Geology 110 Spring Semester, 2018 Syllabus

Today s Outline. Biostatistics Statistical Inference Lecture 01 Introduction to BIOSTAT602 Principles of Data Reduction

Department of Physics & Astronomy Trent University

July Forecast Update for North Atlantic Hurricane Activity in 2018

W/F = 10:00 AM - 1:00 PM Other times available by appointment only

Seasonal Climate Watch September 2018 to January 2019

Notes on probability : Exercise problems, sections (1-7)

Machine Learning. Instructor: Pranjal Awasthi

CS 361: Probability & Statistics

Probability & Statistics - FALL 2008 FINAL EXAM

Physics 141 Course Information

Differential Equations MTH 205

July Forecast Update for Atlantic Hurricane Activity in 2017

Physics 141 Course Information

Important Dates. Non-instructional days. No classes. College offices closed.

Math Linear Algebra Spring Term 2014 Course Description

Introduction to Stochastic Processes

Climate Outlook and Review

SYLLABUS Stratigraphy and Sedimentation GEOL 4402 Fall, 2015 MWF 10:00-10:50AM VIN 158 Labs T or W 2-4:50 PM

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Chemistry 401 : Modern Inorganic Chemistry (3 credits) Fall 2014

General Info. Grading

Last time. Numerical summaries for continuous variables. Center: mean and median. Spread: Standard deviation and inter-quartile range

Lecture 2 APPLICATION OF EXREME VALUE THEORY TO CLIMATE CHANGE. Rick Katz

Math 410 Linear Algebra Summer Session American River College

Probability Theory for Machine Learning. Chris Cremer September 2015

Chemistry 401: Modern Inorganic Chemistry (3 credits) Fall 2017

Probabilistic Reasoning

Chemistry 8 Principles of Organic Chemistry Spring Semester, 2013

Chem 406/Nuc E 405 Nuclear and Radiochemistry Fall Semester 2010

MAT 135B Midterm 1 Solutions

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Lecture 2: Probability Distributions

18.05 Practice Final Exam

Homework 2. Spring 2019 (Due Thursday February 7)


ATM 10. Severe and Unusual Weather. Prof. Richard Grotjahn.

Statistical Data Analysis Stat 3: p-values, parameter estimation

University of California, Los Angeles Department of Statistics. Joint probability distributions

STAT 414: Introduction to Probability Theory

Pre-Season Forecast for North Atlantic Hurricane Activity in 2018

Probability (Devore Chapter Two)

Bayesian data analysis using JASP

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Probabilistic and Bayesian Machine Learning

Discrete distribution. Fitting probability models to frequency data. Hypotheses for! 2 test. ! 2 Goodness-of-fit test

Introductory Probability

Transcription:

Statistics 110 - Introduction to Probability Mark E. Irwin Department of Statistics Harvard University Summer Term Monday, June 28, 2004 - Wednesday, August 18, 2004

Personnel Instructor: Mark Irwin Office: 611 Science Center Phone: 617-495-5617 E-mail: irwin@stat.harvard.edu Web-site: http://www.people.fas.harvard.edu/ mirwin/stat110/ Lectures: M-F 11:00-12:00, Science Center 109 Office Hours: Tuesday, 12:00-1:00, Thursday 1:00-2:00, or by appointment Teaching Fellow: E-mail: Tingting Zhang tzang@stat.harvard.edu Section: Monday 1:00-2:00, Science Center 109 Administration 1

Text Books Required Text: Optional Texts: Rice JA (1994). Mathematical Statistics and Data Analysis, 2nd Edition. Duxbury Press. Ross S (1994), A First Course in Probability, 6th Edition. Prentice Hall. Feller WA (1968). An Introduction to Probability and its Applications Vol 1, 3rd edition. Wiley. Administration 2

Grading Homework (25%): 6 or 7 during the term. Late assignments will not be accepted. The lowest homework grade will be dropped when computing your final grade. Quizzes (15%): There will be two 30 minute quizzes during the term. Dates TBA. Midterm (20%): Tentatively scheduled for Wednesday, July 21 in lecture. Final (40%): Wednesday, August 18th, 9:00 am. Location to be announced. Administration 3

Syllabus Basics: sample space, basic laws of probability, conditional probability, Bayes Theorem, independence. Univariate distributions: mass functions and densities, expectation and variance, binomial, Poisson, normal, and gamma distributions. Multivariate distributions: joint and conditional distribution, independence, covariance and correlation, transformations, multivariate normal and related distributions. Limit laws: moment generating and characteritic functions, probability inequalities, laws of large numbers, central limit theorem, delta rule. Markov chains: transition probabilities, classification of states, stationary distributions and convergence results. A more detailed schedule, with suggested readings, is given in the syllabus available on line Administration 4

What is Probability? An approach (language) for describing randomness, uncertainty, and levels of belief. Examples: Rolling dice (Uniform distribution) P [1] = P [2] =... = P [6] = 1 6 Radioactive decay (Poisson Process) The probability of having k alpha particles emitted from time T to time T + t is P [k decays from T to T + t] = eλt (λt) k λ is a rate parameter giving the expected number of decays in 1 time unit. k! Introductory material 5

Genetics - Mendel s breeding experiments. The expected fraction of observed phenotypes in one of Mendel s experiments is given by the following model Round/Yellow Round/Green Wrinkled/Yellow Wrinkled/Green 2+(1 θ) 2 4 θ(1 θ) 4 θ(1 θ) 4 (1 θ) 2 4 The value θ, known as the recombination fraction, is a measure of distance between the two genes which regulate the two traits. θ must satisfy 0 θ 0.5 (under some assumptions about the process of meiosis) and when the two genes are on different chromosomes, θ = 0.5. Introductory material 6

Tropical Pacific sea surface temperature evolution Probabilistic based model involving wind, sea surface temperature, and the Southern Oscillation Index (air pressure measure) data to forecast sea surface temperatures 7 months in the future. A highly complex, hierarchical model was used. In addition to the three sets of input variables into the model, there is a regime component, recognizing that the region goes through periods of below normal, above normal, and typical temperatures. As part of the analysis, we can ask what would temperature field look like conditional on being in a given temperature regime. The forecast map is a summary of the probabilistic model (the posterior mean), which is a weighted average over all possible sea surface temperature maps. As we can determine the likelihood of any temperature field, summary measures of the uncertainty in the forecasts can be determined. Introductory material 7

April 2004 anomaly forecast (top) and observed anomaly (bottom) based on Jan 1970 to September 2003 data. Introductory material 8

April 2004 regime specific anomaly forecasts based on Jan 1970 to September 2003 data. Introductory material 9

November 2004 anomaly forecasts with 25th and 75th percentiles based on Jan 1970 to April 2004 data. For the Niño 3.4 region, the average forecast anomaly is 1.37 C, suggesting a possible El Niño this winter. Introductory material 10

November 2004 regime specific anomaly forecasts based on Jan 1970 to April 2004 data. Introductory material 11

Refs: Berliner, Wikle, and Cressie (2000). Long-Lead Prediction of Pacific SSTs via Bayesian Dynamic Modeling. Journal of Climate, 13, 3953-3968. http://www.stat.ohio-state.edu/ sses/collab enso.php Note that the model used here is only an approximation of reality. This does not include information about the underlying physics of the meteorology. The models for the other 3 examples are much closer to the true mechanism generating the data, though there still may be some approximation going on, particularly with the Mendel example. Introductory material 12

What is Statistics? To gain understanding from data Making decisions or taking actions under uncertainty Parameter estimation Mendel: recombination fraction θ. SST forecasts: there are a large number of parameters that need to be estimated from the past data. These involve the associations (correlations) of the winds and the SOI with the SST field, how the three variables evolve over time, how likely it is to switch from one regime to another, and so on. Introductory material 13

One approach used is to use data to determine which probability models are plausible For the Mendel data Phenotype Count Obs. Rel. Freq. Exp. Rel. Freq. Round/Yellow 315 0.5625 0.5625 = 9 16 Round/Green 108 0.1929 0.1875 = 3 16 Wrinkled/Yellow 105 0.1875 0.1875 = 3 16 Wrinkled/Green 32 0.0571 0.0625 = 1 16 For this pair of traits, the data is consistent with the two genes being on different chromosomes. In fact the genes for all 7 traits studied by Mendel are each on different chromosomes. Introductory material 14

For a second example using the same breeding design (sorry I can t find out what the two traits actually are here) Phenotype Count Obs. Rel. Freq. Exp. Rel. Freq. A/B 125 0.6345 0.5625 = 9 16 A/b 18 0.0914 0.1875 = 3 16 a/b 20 0.1015 0.1875 = 3 16 a/b 34 0.1726 0.0625 = 1 16 This data does not appear to be consistent with the hypothesis that the genes are on different chromosomes. In fact the data is most consistent with θ = 0.21. This estimate is known as the maximum likelihood estimate of the parameter. Introductory material 15

Where to probabilities come from? Long run relative frequencies If an experiment is repeated over and over again, the relative frequency of an event will converge to the probability of the event. For example, three different experiments looked at the probability of getting a head when flipping a coin. The French naturalist Count Buffon: 4040 tosses, 2048 heads (ˆp = 0.5069). While imprisoned during WWII, the South African statistician John Kerrich: 10000 tosses, 5067 heads (ˆp = 0.5067) Statistician Karl Pearson: 24000 tosses, 12012 heads (ˆp = 0.5005) Introductory material 16

Subjective beliefs Can be used for experiments that can t be repeated exactly, such as sporting event. For example, what is the probability that the Red Sox will win the World Series this year. Can be done through comparison (i.e. is getting a head on a single flip of a coin more or less likely, getting a 6 when rolling a fair 6 sided die, etc). Can also be done by comparing different possible outcomes (1.5 times more likely than the Yankees, 10 more likely than than the A s, 1,000,000 times more likely than the Expos, etc). Often expressed in terms of odds Odds = Prob Odds ; Prob = 1 Prob 1 + Odds Introductory material 17

Models, physical understanding, etc. The structure of the problem will often suggest a probability model. For example, the physics of rolling a die suggest that no one side should be favoured, giving the uniform model discussed earlier. However this could be verified by looking at the long run frequencies. The probabilities used to describe Mendel s experiments come from current beliefs of the underlying processes of meiosis (actually approximations to the processes). Introductory material 18