Data Analysis, Statistics, Machine Learning

Size: px
Start display at page:

Download "Data Analysis, Statistics, Machine Learning"

Transcription

1 Data Analysis, Statistics, Machine Learning Leland Wilkinsn Adjunct Prfessr UIC Cmputer Science Chief Scien<st H2O.ai

2 Data Analysis What is data analysis? Summaries f batches f data Methds fr discvering pajerns in data Methds fr visualizing data Benefits Data analysis helps us supprt suppsi<ns Data analysis helps us discredit false explana<ns Data analysis helps us generate new ideas t inves<gate 2 Cpyright 2016 Leland Wilkinsn

3 Sta<s<cs What is (are) sta<s<cs? Summaries f samples frm ppula<ns Methds fr analyzing samples Making inferences based n samples Benefits Sta<s<cs help us avid false cnclusins when evalua<ng evidence Sta<s<cs prtect us frm being fled by randmness Sta<s<cs help us find pajerns in nnrandm events Sta<s<cs quan<fy risk Sta<s<cs cunteract ingrained bias in human judgment Sta<s<cal mdels are understandable by humans 3 Cpyright 2016 Leland Wilkinsn

4 Machine Learning What is machine learning? Data mining systems Discver pajerns in data Learning systems Benefits Adapt mdels ver <me ML helps t predict utcmes ML Xen utperfrms tradi<nal sta<s<cal predic<n methds ML mdels d nt need t be understd by humans Mst ML results are unintelligible (the excep<ns prve the rule) ML peple care abut the quality f a predic<n, nt the meaning f the result ML is ht (Deep Learning!, Big Data!) 4 Cpyright 2016 Leland Wilkinsn

5 Curse Outline 1. Intrduc<n 2. Data 3. Visualizing 4. Explring 5. Summarizing 6. Distribu<ns 7. Inference 8. Predic<ng 9. Smthing 10. Time Series 11. Cmparing 12. Reducing 13. Gruping 14. Learning 15. Anmalies 16. Analyzing 5 Cpyright 2016 Leland Wilkinsn

6 Data What is (are) data? A datum is a given (as in French dnnée) data is plural f datum Data may have many different frms Set, Bag, List, Table, etc. Many f these frms are amenable t data analysis Nne f these frms is suitable fr sta<s<cal analysis Sta<s<cs perate n variables, nt data A variable is a func<n mapping data bjects t values A randm variable is a variable whse values are each assciated with a prbability p (0 p 1) Visualiza<ns perate n data r variables 6 Cpyright 2016 Leland Wilkinsn

7 Visualizing Visualiza<ns represent data Tallies, stem- and- leaf plts, histgrams, pie charts, bar charts, Sta<s<cal visualiza<ns represent variables Prbability plts, density plts, Sta<s<cal visualiza<ns aid diagnsis f mdels Des a variable derive frm a given distribu<n? Are there utliers and ther anmalies? Are there trends (r peridicity, etc.) acrss <me? Are there rela<nships between variables? Are there clusters f pints (cases)? 7 Cpyright 2016 Leland Wilkinsn

8 Explring Explratry Data Analysis (Jhn W. Tukey, EDA) Summaries Transfrma<ns Smthing Rbustness Interac<vity What EDA is nt Lelng the data speak fr itself Fishing expedi<ns Null hypthesis tes<ng Qualita<ve Data Analysis Mixed methds Old wine in new bjles 8 Cpyright 2016 Leland Wilkinsn

9 Summarizing We summarize t remve irrelevant detail We summarize batches f data in a few numbers We summarize variables thrugh their distribu<ns The best summaries preserve imprtant infrma<n All summaries sacrifice infrma<n (lssy) Summaries Lca<n Ppular: mean, median, mde Others: weighted mean, trimmed mean, Spread Ppular: sd, range Others: Interquar<le Range, Median Abslute Devia<n, Shape Skewness Kurtsis 9 Cpyright 2016 Leland Wilkinsn

10 Distribu<ns A prbability func<n is a nnnega<ve func<n Its area (r mass) is 1 Distribu<ns are families f prbability func<ns Mst sta<s<cal methds depend n distribu<ns Nnparametric methds are distribu<n- free The Nrmal (Gaussian) distribu<n is mst ppular Other distribu<ns (Binmial, Pissn, ) are Xen used We use the Nrmal because f the Central Limit Therem Variables based n real data are rarely nrmally distributed But sums r means f randm variables tend t be S if we are drawing inferences abut means, Nrmal is usually OK This invlves a leap f faith 10 Cpyright 2016 Leland Wilkinsn

11 Inference Inference invlves drawing cnclusins frm evidence In lgic, the evidence is a set f premises In data analysis, the evidence is a set f data In sta<s<cs, the evidence is a sample frm a ppula<n A ppula<n is assumed t have a distribu<n The sample is assumed t be randm (Sme<mes there are ways arund that) The ppula<n may be the same size as the sample (nt usually a gd idea) There are tw histrical appraches t sta<s<cal inference Frequen<st Bayesian There are many widespread abuses f sta<s<cal inference We cherry pick ur results (scien<sts, jurnals, reprters, ) We didn t have a big enugh sample t detect a real difference We think a large sample guarantees accuracy (the bigger the bejer) 11 Cpyright 2016 Leland Wilkinsn

12 Predic<ng Mst sta<s<cal predic<n mdels take ne f tw frms y = Σ j (β j x j ) + ε (addi<ve func<n) y = f(x j, ε) (nnlinear func<n) The dis<nc<n is imprtant The first frm is called an addi<ve mdel The secnd frm is called a nnlinear mdel Addi<ve mdels can be curvilinear (if terms are nnlinear) Nnlinear mdels cannt be transfrmed t linear Examples f linear r linearizable mdels are y =β 0 + β 1 x β p x p + ε y =αe βx+ ε Examples f nnlinear mdels are y =β 1 x 1 / β 2 x 2 + ε y = lgβ 1 x 1 ε 12 Cpyright 2016 Leland Wilkinsn

13 Smthing Sme<mes we want t smth variables r rela<ns Tukey phrased this as data = smth + rugh The smthed versin shuld shw pajerns nt evident in raw data Many f these methds are nnparametric Sme are parametric But we use them t discver, nt t cnfirm 13 Cpyright 2016 Leland Wilkinsn

14 Time Series Time series sta<s<cs invlve randm prcesses ver <me Spa<al sta<s<cs invlve randm prcesses ver space Bth invlve similar mathema<cal mdels When there is n tempral r spa<al influence, these bil dwn t rdinary sta<s<cal methds DO NOT USE i.i.d. methds n tempral/spa<al data These require stchas<c mdels, nt trend lines measurements at each <me/space pint are nt independent 1.0 Autcrrelatin Plt Sales Year Quarterly US Ecmmerce Retail Sales, Seasnally Adjusted Crrelatin Lag Cpyright 2016 Leland Wilkinsn

15 Cmparing Sta<s<cal methds exist fr cmparing 2 r mre grups The classical apprach is Analysis f Variance (ANOVA) This methd invented by Sir Rnald Fisher It revlu<nized industrial/scien<fic experiments The researcher was able t examine mre than ne treatment at a <me With nly tw grups, results f Student s t- test and F- test are equivalent Mul<variate Analysis f Variance (MANOVA) This is ANOVA fr mre than ne dependent variable (utcme) Hierarchical mdeling is fr nested data There are several frms f this mul<level mdeling 15 Cpyright 2016 Leland Wilkinsn

16 Reducing Reducing takes many variables and reduces them t a smaller number f variables There are many ways t d this Principal cmpnents (PC) cnstructs rthgnal weighted cmpsites based n crrela<ns (cvariances) amng variables Mul<dimensinal Scaling (MDS) embeds them in a lw- dimensinal space based n distances between variables Manifld learning prjects them nt a lw- dimensinal nnlinear manifld Randm prjec<n is like principal cmpnents except the weights are randm. 16 Cpyright 2016 Leland Wilkinsn

17 Gruping We can create grups f variables r grups f cases These methds invlve what we call Cluster Analysis Hierarchical methds make trees f nested clusters Nn- hierarchical methds grup cases int k clusters These k clusters may be discrete r verlapping Tw cnsidera<ns are especially imprtant Distance/Dissimilarity measure Agglmera<n r splilng rule The cllec<n f clustering methds is huge Early applica<ns were fr numerical taxnmy in bilgy 17 Cpyright 2016 Leland Wilkinsn

18 Learning Machine Learning (ML) methds lk fr pajerns that persist acrss a large cllec<n f data bjects ML learns frm new data Key cncepts Curse f dimensinality Randm prjec<ns Regulariza<n Kernels Btstrap aggrega<n Bs<ng Ensembles Valida<n Methds Supervised Classifica<n (Discriminant Analysis, Supprt Vectr Machines, Trees, Set Cvers) Predic<n (Regressin, Trees, Neural Netwrks) Unsupervised Neural Netwrks Clustering Prjec<ns (PC, MDS, Manifld Learning) 18 Cpyright 2016 Leland Wilkinsn

19 Anmalies Anmalies are, literally, lack f a law (nms) The best- knwn anmaly is an utlier This presumes a distribu<n with tail(s) All utliers are anmalies, but nt all anmalies are utliers Iden<fying utliers is nt simple Almst every sxware system and sta<s<cs text gets it wrng Other anmalies dn t invlve distribu<ns Cding errrs in data Misspellings Singular events OXen anmalies in residuals are mre interes<ng than the es<mated values 19 Cpyright 2016 Leland Wilkinsn

20 Analyzing What Sta<s<cs is nt mathema<cs machine learning cmputer science prbability thery Sta<s<cal reasning is ra<nal Sta<s<cs cndi<ns cnclusins Sta<s<cs factrs ut randmness Wise wrds David Mre Stephen S<gler TFSI 20 Cpyright 2016 Leland Wilkinsn

21 References Sta<s<cs andrewgelman.cm statsblgs.cm jerrydallal.cm Visualiza<n flwingdata.cm eagereyes.rg Machine Learning hunch.net nlpers.blgspt.cm Math qumdcumque.wrdpress.cm terryta.wrdpress.cm 21 Cpyright 2016 Leland Wilkinsn

22 References Abelsn, R.P. (2005). Statistics as Principled Argument. Hillsdale, N.J.: L. Erlbaum. DeVeaux, R.D., Velleman, P., and Bck, D.E. (2013). Intr Stats (4 th Ed.). New Yrk: Pearsn. Freedman, D.A., Pisani, R. and Purves, R,A. (1978). Statistics. New Yrk: W.W. Nrtn. 22 Cpyright 2016 Leland Wilkinsn

Data Analysis, Statistics, Machine Learning

Data Analysis, Statistics, Machine Learning Data Analysis, Statistics, Machine Learning Leland Wilkinsn Adjunct Prfessr UIC Cmputer Science Chief Scien

More information

Data Analysis, Statistics, Machine Learning

Data Analysis, Statistics, Machine Learning Data Analysis, Statistics, Machine Learning Leland Wilkinsn Adjunct Prfessr UIC Cmputer Science Chief Scien

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM The general linear mdel and Statistical Parametric Mapping I: Intrductin t the GLM Alexa Mrcm and Stefan Kiebel, Rik Hensn, Andrew Hlmes & J-B J Pline Overview Intrductin Essential cncepts Mdelling Design

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

7 TH GRADE MATH STANDARDS

7 TH GRADE MATH STANDARDS ALGEBRA STANDARDS Gal 1: Students will use the language f algebra t explre, describe, represent, and analyze number expressins and relatins 7 TH GRADE MATH STANDARDS 7.M.1.1: (Cmprehensin) Select, use,

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!)

The Law of Total Probability, Bayes Rule, and Random Variables (Oh My!) The Law f Ttal Prbability, Bayes Rule, and Randm Variables (Oh My!) Administrivia Hmewrk 2 is psted and is due tw Friday s frm nw If yu didn t start early last time, please d s this time. Gd Milestnes:

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

The standards are taught in the following sequence.

The standards are taught in the following sequence. B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date AP Statistics Practice Test Unit Three Explring Relatinships Between Variables Name Perid Date True r False: 1. Crrelatin and regressin require explanatry and respnse variables. 1. 2. Every least squares

More information

Elements of Machine Intelligence - I

Elements of Machine Intelligence - I ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011 The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin

More information

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge

More information

SAMPLING DYNAMICAL SYSTEMS

SAMPLING DYNAMICAL SYSTEMS SAMPLING DYNAMICAL SYSTEMS Melvin J. Hinich Applied Research Labratries The University f Texas at Austin Austin, TX 78713-8029, USA (512) 835-3278 (Vice) 835-3259 (Fax) hinich@mail.la.utexas.edu ABSTRACT

More information

Experimental Design Initial GLM Intro. This Time

Experimental Design Initial GLM Intro. This Time Eperimental Design Initial GLM Intr This Time GLM General Linear Mdel Single subject fmri mdeling Single Subject fmri Data Data at ne vel Rest vs. passive wrd listening Is there an effect? Linear in

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

EASTERN ARIZONA COLLEGE Introduction to Statistics

EASTERN ARIZONA COLLEGE Introduction to Statistics EASTERN ARIZONA COLLEGE Intrductin t Statistics Curse Design 2014-2015 Curse Infrmatin Divisin Scial Sciences Curse Number PSY 220 Title Intrductin t Statistics Credits 3 Develped by Adam Stinchcmbe Lecture/Lab

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

Math 9 Year End Review Package. (b) = (a) Side length = 15.5 cm ( area ) (b) Perimeter = 4xside = 62 m

Math 9 Year End Review Package. (b) = (a) Side length = 15.5 cm ( area ) (b) Perimeter = 4xside = 62 m Math Year End Review Package Chapter Square Rts and Surface Area KEY. Methd #: cunt the number f squares alng the side ( units) Methd #: take the square rt f the area. (a) 4 = 0.7. = 0.. _Perfect square

More information

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method. Lessn Plan Reach: Ask the students if they ever ppped a bag f micrwave ppcrn and nticed hw many kernels were unppped at the bttm f the bag which made yu wnder if ther brands pp better than the ne yu are

More information

Lecture 10, Principal Component Analysis

Lecture 10, Principal Component Analysis Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16 Principal Cmpnent Analysis Lecture 10, Principal

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

MATHEMATICS SYLLABUS SECONDARY 5th YEAR

MATHEMATICS SYLLABUS SECONDARY 5th YEAR Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE

More information

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS

CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS CHAPTER 4 DIAGNOSTICS FOR INFLUENTIAL OBSERVATIONS 1 Influential bservatins are bservatins whse presence in the data can have a distrting effect n the parameter estimates and pssibly the entire analysis,

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

ENSC Discrete Time Systems. Project Outline. Semester

ENSC Discrete Time Systems. Project Outline. Semester ENSC 49 - iscrete Time Systems Prject Outline Semester 006-1. Objectives The gal f the prject is t design a channel fading simulatr. Upn successful cmpletin f the prject, yu will reinfrce yur understanding

More information

1b) =.215 1c).080/.215 =.372

1b) =.215 1c).080/.215 =.372 Practice Exam 1 - Answers 1. / \.1/ \.9 (D+) (D-) / \ / \.8 / \.2.15/ \.85 (T+) (T-) (T+) (T-).080.020.135.765 1b).080 +.135 =.215 1c).080/.215 =.372 2. The data shwn in the scatter plt is the distance

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

NGSS High School Physics Domain Model

NGSS High School Physics Domain Model NGSS High Schl Physics Dmain Mdel Mtin and Stability: Frces and Interactins HS-PS2-1: Students will be able t analyze data t supprt the claim that Newtn s secnd law f mtin describes the mathematical relatinship

More information

Math 10 - Exam 1 Topics

Math 10 - Exam 1 Topics Math 10 - Exam 1 Tpics Types and Levels f data Categrical, Discrete r Cntinuus Nminal, Ordinal, Interval r Rati Descriptive Statistics Stem and Leaf Graph Dt Plt (Interpret) Gruped Data Relative and Cumulative

More information

ELT COMMUNICATION THEORY

ELT COMMUNICATION THEORY ELT 41307 COMMUNICATION THEORY Matlab Exercise #2 Randm variables and randm prcesses 1 RANDOM VARIABLES 1.1 ROLLING A FAIR 6 FACED DICE (DISCRETE VALIABLE) Generate randm samples fr rlling a fair 6 faced

More information

AIP Logic Chapter 4 Notes

AIP Logic Chapter 4 Notes AIP Lgic Chapter 4 Ntes Sectin 4.1 Sectin 4.2 Sectin 4.3 Sectin 4.4 Sectin 4.5 Sectin 4.6 Sectin 4.7 4.1 The Cmpnents f Categrical Prpsitins There are fur types f categrical prpsitins. Prpsitin Letter

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A. SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST Mark C. Ott Statistics Research Divisin, Bureau f the Census Washingtn, D.C. 20233, U.S.A. and Kenneth H. Pllck Department f Statistics, Nrth Carlina State

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

(for students at grades 7 and 8, Gymnasium)

(for students at grades 7 and 8, Gymnasium) Kanguru Sans Frntières Kanguru Maths 009 Level: 7-8 (fr students at grades 7 and 8, Gymnasium) pints questins: ) Amng these numbers, which ne is even? 009 9 Β) 008 + 009 C) 000 9 D) 000 9 Ε) 000 + 9 )

More information

Activity Guide Loops and Random Numbers

Activity Guide Loops and Random Numbers Unit 3 Lessn 7 Name(s) Perid Date Activity Guide Lps and Randm Numbers CS Cntent Lps are a relatively straightfrward idea in prgramming - yu want a certain chunk f cde t run repeatedly - but it takes a

More information

Statistics Statistical method Variables Value Score Type of Research Level of Measurement...

Statistics Statistical method Variables Value Score Type of Research Level of Measurement... Lecture 1 Displaying data... 12 Statistics... 13 Statistical methd... 13 Variables... 13 Value... 15 Scre... 15 Type f Research... 15 Level f Measurement... 15 Numeric/Quantitative variables... 15 Ordinal/Rank-rder

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 11: Mdeling with systems f ODEs In Petre Department f IT, Ab Akademi http://www.users.ab.fi/ipetre/cmpmd/ Mdeling with differential equatins Mdeling strategy Fcus

More information

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India

CHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce

More information

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart

Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Key Wrds: Autregressive, Mving Average, Runs Tests, Shewhart Cntrl Chart Perfrmance f Sensitizing Rules n Shewhart Cntrl Charts with Autcrrelated Data Sandy D. Balkin Dennis K. J. Lin y Pennsylvania State University, University Park, PA 16802 Sandy Balkin is a graduate student

More information

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs

Admissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department

More information

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment Science 10: The Great Geyser Experiment A cntrlled experiment Yu will prduce a GEYSER by drpping Ments int a bttle f diet pp Sme questins t think abut are: What are yu ging t test? What are yu ging t measure?

More information

Eric Klein and Ning Sa

Eric Klein and Ning Sa Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure

More information

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1 Data Mining: Cncepts and Techniques Classificatin and Predictin Chapter 6.4-6 February 8, 2007 CSE-4412: Data Mining 1 Chapter 6 Classificatin and Predictin 1. What is classificatin? What is predictin?

More information

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky

BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS Christpher Cstell, Andrew Slw, Michael Neubert, and Stephen Plasky Intrductin The central questin in the ecnmic analysis f climate change plicy cncerns

More information

Source Coding and Compression

Source Coding and Compression Surce Cding and Cmpressin Heik Schwarz Cntact: Dr.-Ing. Heik Schwarz heik.schwarz@hhi.fraunhfer.de Heik Schwarz Surce Cding and Cmpressin September 22, 2013 1 / 60 PartI: Surce Cding Fundamentals Heik

More information

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y= Intrductin t Vectrs I 21 Intrductin t Vectrs I 22 I. Determine the hrizntal and vertical cmpnents f the resultant vectr by cunting n the grid. X= y= J. Draw a mangle with hrizntal and vertical cmpnents

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

Data Analysis, Statistics, Machine Learning

Data Analysis, Statistics, Machine Learning Data Analysis, Statistics, Machine Learning Leland Wilkinsn Adjunct Prfessr UIC Cmputer Science Chief Scien

More information

Do big losses in judgmental adjustments affect experts behaviour? Fotios Petropoulos, Robert Fildes and Paul Goodwin

Do big losses in judgmental adjustments affect experts behaviour? Fotios Petropoulos, Robert Fildes and Paul Goodwin D big lsses in judgmental adjustments affect experts behaviur? Ftis Petrpuls, Rbert Fildes and Paul Gdwin This material has been created and cpyrighted by Lancaster Centre fr Frecasting, Lancaster University

More information

Data mining/machine learning large data sets. STA 302 or 442 (Applied Statistics) :, 1

Data mining/machine learning large data sets. STA 302 or 442 (Applied Statistics) :, 1 Data mining/machine learning large data sets STA 302 r 442 (Applied Statistics) :, 1 Data mining/machine learning large data sets high dimensinal spaces STA 302 r 442 (Applied Statistics) :, 2 Data mining/machine

More information

Checking the resolved resonance region in EXFOR database

Checking the resolved resonance region in EXFOR database Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,

More information

Multiple Source Multiple. using Network Coding

Multiple Source Multiple. using Network Coding Multiple Surce Multiple Destinatin Tplgy Inference using Netwrk Cding Pegah Sattari EECS, UC Irvine Jint wrk with Athina Markpulu, at UCI, Christina Fraguli, at EPFL, Lausanne Outline Netwrk Tmgraphy Gal,

More information

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th,

Phys. 344 Ch 7 Lecture 8 Fri., April. 10 th, Phys. 344 Ch 7 Lecture 8 Fri., April. 0 th, 009 Fri. 4/0 8. Ising Mdel f Ferrmagnets HW30 66, 74 Mn. 4/3 Review Sat. 4/8 3pm Exam 3 HW Mnday: Review fr est 3. See n-line practice test lecture-prep is t

More information

Bayesian nonparametric modeling approaches for quantile regression

Bayesian nonparametric modeling approaches for quantile regression Bayesian nnparametric mdeling appraches fr quantile regressin Athanasis Kttas Department f Applied Mathematics and Statistics University f Califrnia, Santa Cruz Department f Statistics Athens University

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

Excessive Social Imbalances and the Performance of Welfare States in the EU. Frank Vandenbroucke, Ron Diris and Gerlinde Verbist

Excessive Social Imbalances and the Performance of Welfare States in the EU. Frank Vandenbroucke, Ron Diris and Gerlinde Verbist Excessive Scial Imbalances and the Perfrmance f Welfare States in the EU Frank Vandenbrucke, Rn Diris and Gerlinde Verbist Child pverty in the Eurzne, SILC 2008 35.00 30.00 25.00 20.00 15.00 10.00 5.00.00

More information

Kinetic Model Completeness

Kinetic Model Completeness 5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

Homology groups of disks with holes

Homology groups of disks with holes Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.

More information

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION

NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION NUROP Chinese Pinyin T Chinese Character Cnversin NUROP CONGRESS PAPER CHINESE PINYIN TO CHINESE CHARACTER CONVERSION CHIA LI SHI 1 AND LUA KIM TENG 2 Schl f Cmputing, Natinal University f Singapre 3 Science

More information

INTRODUCTION TO MACHINE LEARNING FOR MEDICINE

INTRODUCTION TO MACHINE LEARNING FOR MEDICINE Fall 2017 INTRODUCTION TO MACHINE LEARNING FOR MEDICINE Carla E. Brdley Prfessr & Dean Cllege f Cmputer and Infrmatin Science Nrtheastern University WHAT IS MACHINE LEARNING/DATA MINING? Figure is frm

More information

COMP9444 Neural Networks and Deep Learning 3. Backpropagation

COMP9444 Neural Networks and Deep Learning 3. Backpropagation COMP9444 Neural Netwrks and Deep Learning 3. Backprpagatin Tetbk, Sectins 4.3, 5.2, 6.5.2 COMP9444 17s2 Backprpagatin 1 Outline Supervised Learning Ockham s Razr (5.2) Multi-Layer Netwrks Gradient Descent

More information

SPH3U1 Lesson 06 Kinematics

SPH3U1 Lesson 06 Kinematics PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.

More information

Hubble s Law PHYS 1301

Hubble s Law PHYS 1301 1 PHYS 1301 Hubble s Law Why: The lab will verify Hubble s law fr the expansin f the universe which is ne f the imprtant cnsequences f general relativity. What: Frm measurements f the angular size and

More information

Physics 2010 Motion with Constant Acceleration Experiment 1

Physics 2010 Motion with Constant Acceleration Experiment 1 . Physics 00 Mtin with Cnstant Acceleratin Experiment In this lab, we will study the mtin f a glider as it accelerates dwnhill n a tilted air track. The glider is supprted ver the air track by a cushin

More information

MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION

MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION Silvia de Castr García Directres: Dr. Ricard Pérez Martínez, Dra. Ana María Pérez García 16/03/2018 Machine Learning fr cluster-galaxy classificatin

More information

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes

Chemistry 20 Lesson 11 Electronegativity, Polarity and Shapes Chemistry 20 Lessn 11 Electrnegativity, Plarity and Shapes In ur previus wrk we learned why atms frm cvalent bnds and hw t draw the resulting rganizatin f atms. In this lessn we will learn (a) hw the cmbinatin

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

Professional Development. Implementing the NGSS: High School Physics

Professional Development. Implementing the NGSS: High School Physics Prfessinal Develpment Implementing the NGSS: High Schl Physics This is a dem. The 30-min vide webinar is available in the full PD. Get it here. Tday s Learning Objectives NGSS key cncepts why this is different

More information

Unit 1: Introduction to Biology

Unit 1: Introduction to Biology Name: Unit 1: Intrductin t Bilgy Theme: Frm mlecules t rganisms Students will be able t: 1.1 Plan and cnduct an investigatin: Define the questin, develp a hypthesis, design an experiment and cllect infrmatin,

More information

IN a recent article, Geary [1972] discussed the merit of taking first differences

IN a recent article, Geary [1972] discussed the merit of taking first differences The Efficiency f Taking First Differences in Regressin Analysis: A Nte J. A. TILLMAN IN a recent article, Geary [1972] discussed the merit f taking first differences t deal with the prblems that trends

More information

Statistics, Numerical Models and Ensembles

Statistics, Numerical Models and Ensembles Statistics, Numerical Mdels and Ensembles Duglas Nychka, Reinhard Furrer,, Dan Cley Claudia Tebaldi, Linda Mearns, Jerry Meehl and Richard Smith (UNC). Spatial predictin and data assimilatin Precipitatin

More information

City of Angels School Independent Study Los Angeles Unified School District

City of Angels School Independent Study Los Angeles Unified School District City f Angels Schl Independent Study Ls Angeles Unified Schl District INSTRUCTIONAL GUIDE Algebra 1B Curse ID #310302 (CCSS Versin- 06/15) This curse is the secnd semester f Algebra 1, fulfills ne half

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

Do we really need statistics in science?

Do we really need statistics in science? September 16, 2009 D we really need statistics in science? FWF Graduate Seminar Timthy M. Yung, Ph.D. Assciate Prfessr Department f Frestry, Wildlife & Fisheries Frest Prducts Center Dn t wrry, it will

More information