Road map. Machine learning and its antecedents in microarray statistics. Metadata administration. QC/Preprocessing. VJ Carey

Size: px
Start display at page:

Download "Road map. Machine learning and its antecedents in microarray statistics. Metadata administration. QC/Preprocessing. VJ Carey"

Transcription

1 Rad map and its antecedents in micrarray statistics 1 1 Channing Labratry Brigham and Wmen s Hspital Bstn, MA, USA 1 Preprcessing Differential epressin 2 3 June / CSHL C-DATA05 Preprcessing Differential epressin Preprcessing Differential epressin Metadata administratin QC/Preprcessing affy users: use read.phendata at an early stage, and bind this t the affybatch cdna users: place relevant sample-level infrmatin in the targets file use read.miame at an early stage, and bind this t the eprset befre dwnstream wrk begins variatin in eperimental results: bth structural and randm ppulatin-level phenmena individual-level phenmena measurement technique: bth structural and randm we are interested primarily in bilgical variatin technical surces f variatin are an bscuring nuisance

2 Preprcessing Differential epressin Preprcessing: micrarray nrmalizatin premises Preprcessing Differential epressin Preprcessing: micrarray nrmalizatin bjectives within-slide: intensity-dependent bias, crrectible print tip r sectr defects between-slide: changes in verall level, r scale, that have n bilgical basis many f the prminent methds assume majrity f genes are NOT differentially epressed variatin bserved in genes that are nt differentially epressed can be used t epse/estimate bilgically irrelevant variatin within-slide: variatin due t structural factrs such as print tip r sectr can be islated intensity-dependent bias in lg ratis estimable/remvable under assumptin that majrity f genes are nt differentially epressed between-slide: lcatin and/r scale and/r cmplete distributin can be made cmparable Preprcessing Differential epressin Preprcessing: micrarray nrmalizatin tls Preprcessing Differential epressin Preprcessing: micrarray nrmalizatin tls affy, epress: > bgcrrect.methds [1] "mas" "nne" "rma" "rma2" > nrmalize.affybatch.methds [1] "cnstant" "cntrasts" [3] "invariantset" "less" [5] "qspline" "quantiles" [7] "quantiles.rbust" cdna (marray), manrm: > args(manrm) functin (mbatch, nrm = c("printtipless", "nne", "median", "less", "twd", "scaleprinttipmad"), subset = TRUE, span Mlc = TRUE, Mscale = TRUE, ech = FALSE,...) NULL > epress.summary.stat.methds [1] "avgdiff" "liwng" "mas" [4] "medianplish" "playerut"

3 Data structure hierarchies Preprcessing Differential epressin Tls fr differential epressin Preprcessing Differential epressin build frm scanners/cels: affybatch, eprset, marraynrm, RGList instances build by hand: MIAME, phendata instances (widget=true) mutate the affybatch (s that metadata prpagates thrugh prcessing) r eprset: > phendata(myab) <- pdd > phendata(eset) <- pdd > descriptin(eset) <- mymiame use class(bj) and getclass(classname) t learn abut classes t.test(g1, g2): fine fr a single gene acrss tw phentypes r treatments, need either nrmality r relatively large sample fr validity; use av() fr multifactr inference fastt(, inds1, inds2): wrks ver a matri; can fllw with multtest functins t adjust fr multiplicity; can have misleading results with small samples f.sam r (Bicnductr) siggenes functins: imprved analg f t.test that diminishes artifactual significance in small samples limma: general apprach t linear mdels (pssibly multifactr r trends) based n a specified design matri; includes ptins fr crrectins fr multiple testing; limmagui, affylmgui EBarrays, many ther pssibilities Preprcessing Differential epressin QC and claims f differential epressin Stat mdels with additive structure many times scientists ask what is the quantitative threshld f acceptability fr the assay? we lk at varius quality criteria, and smetimes a decisin is clear remember the cncept f sensitivity analysis: if a few chips are questinable, it is helpful t knw that the inferences are nt severely sensitive t their inclusin red the entire analysis withut the questinable chips and cmpare results eplain any imprtant differences befre deciding against the analysis based n the full set data = fit + residual five symbls, many questins start with fit: a mdel instance that has an interpretatin. typically the mdel family cnfers structure n a relatinship between measurements data (Y, p 1 ) Y may dente a phentype, say SBP may dente a crrelate r predictr f the phentype, say CRP epressin

4 a linear mdel a quadratic mdel Y = a + b + e fit = a + b e is a randm variable with 0 mean and specified variance structure putative realizatins f the mdel used fr estimatin f a, b (e.g., by least squares) â is mean SBP fr CRP epressin level 0 ˆb is mean difference in SBP fr tw bservatins differing by ne unit in CRP epressin sme nnlinearity can be accmmdated thrugh y = a + b + c 2 + e parameters a, b, c can be estimated using a linear functin f y, s this is still an instance f the family f linear mdels chsing amng mdels lk at the data! identical bivariate summaries: all mdels are wrng sme are mre useful than thers des the epressin f an asparagine-synthetase like gene vary linearly r nnlinearly in fibrblasts epsed t human serum? prefer apprimate answer t right scientific questin ver eact answer t wrng scientific questin fr each graph, cr(,y) =.816 fr each graph, t.test(,y) has p-value 0.22 fr each graph, the regressin line is estimated as y = 3 +.5, with identical standard errrs beware the cncise statistical summary

5 Basic intentins f machine learning pattern recgnitin: faces are cmple, humans have gd recall capabilities; can image prcessing algrithms be fund t perfrm identificatin? supervised recgnitin: there are labels assciated with bjects that we wish t recver unsupervised recgnitin: n a priri labeling r gruping; seek regularities in data t frm grupings predictin: patients in ER have a cllectin f symptms; under which cnfiguratins is it likely that a heart attack has ccurred r is imminent? ratinalizatin: a recgnitin r predictin algrithm has been cnstructed; des the way in which it emplys the data tell us smething meaningful abut the wrld? bjectivity: the creatin and the deplyment f the algrithm shuld require minimal human invlvement Simple frmalism fr supervised machine learning Training data accumulate in a structure D. D includes a p-vectr f features X and a categrical respnse Y ML is a machine learning prcedure, perating n D t prduce an algrithm (machine) M. M is required t assciate with each X a predicted respnse (an instance f Y, r dubt, r utlier) Typically a test data structure E is used t assess the predictive accuracy f M; we use the features in E t predict respnses Y using M and cmpare them t the true respnses in E Imprtant principle: ML must avid verfitting t D; must try t anticipate the kind f data that will cme but have nt yet been bserved Classical technique: linear discriminant analysis Eample: tw genes in Glub s full data Fisher s linear discriminant analysis (LDA) wrks with a categrical respnse Y and cntinuus inputs X Finds a linear cmbinatin f cmpnents f X fr which the rati (separatin f class-specific means/ within-class variance) is maimized in the case f tw features and tw respnse classes, this criterin determines a line in the plane spanned by the tw features the line separates the data int tw grups; there may be misclassificatin KLK APOA1

6 Upshts Sme etensins f linear discriminants the line drawn abve is regarded as a machine give it the values f APOAI and KLK3 and it will predict r Is it gd? T say hw gd it is we seek t estimate the generalizatin errr Generalizatin errr (GE) is intrinsically unbservable Prcedures fr estimating GE r fr designing learning prcedures that theretically minimize GE are imprtant in machine learning thery and practice Kernel methds f feature transfrmatin when there is n separatin f classes in feature space, transfrm features t a representatin in which there is! Sme etensins f linear discriminants Supprt vectr machines Maimum margin criteria when there are multiple (hyper)planes that can separate the classes, chse the ne maimizing the margin t nearest data pint Supprt vectr machines are learners that eplit rich feature space transfrmatin ptentials emply maimum margin criteria in selecting separating hyperplanes sme thery n generalizatin errr cntrl is available there are varius tuning parameters that need t be set Illustratin with the Glub data:

7 Supprt vectr machine frm package e1071 Classificatin trees: a breast cancer eample SVM classificatin plt KLK APOA1 rpart and tree structured mdeling ptins a filtering f the data > library(rpart) > args(rpart) functin (frmula, data, weights, subset, na.actin = na.rpart, methd, mdel = FALSE, = FALSE, y = TRUE, parms, cntrl, cst,...) NULL > args(rpart.cntrl) ttally artificial, select thse genes with CV greater than 1.2 > library(genefilter) > ff <- filterfun(cv(1.2)) > k <- genefilter(glubmerge, ff) > glubmerge2 <- glubmerge[k, ] functin (minsplit = 20, minbucket = rund(minsplit/3), cp = 0.01, macmpete = 4, masurrgate = 5, usesurrgate = 2, val = 10, surrgatestyle = 0, madepth = 30,...) NULL

8 neural netwrks neural netwrks are very fleible nnlinear mdels architecture f the netwrk, alng with regularizatin, determines the fleibility fleibility f the feed-frward, 1-hidden layer mdel is determined by size and decay parameters A unified interface t machine learning with eprsets basic pattern: [meth]b( es, labelname, traininds, therparms ) > library(mlinterfaces) > nn1 <- nnetb(glubmerge2[1:100, ], ".", + 1:36, size = 5, decay = 0.01) > cnfumat(nn1) predicted given A unified interface t machine learning with eprsets Variable imprtance measures > library(mlinterfaces) > rf1 <- randmfrestb(glubmerge2[1:100, ], + ".", 1:36, imprtance = TRUE) > cnfumat(rf1) predicted given randm frests are sequences f classificatin trees; each is cnstructed frm a randmly reduced dataset, each split f the tree is frmed with a randmly reduced set f variables majrity vting acrss the sequence f trees is used t make classificatins D14664_at AF000234_at D10923_at D10495_at D16626_at AF009426_at D10202_at AF005043_at D13637_at AF009674_at D13264_at AF005037_at D13628_at AFFX M27830_5_at AFFX TrpnX 5_at AB000115_at AFFX DapX 5_at AB002366_at D11428_at D00654_at Mean decrease in accuracy 0.030

9 bsting back t LDA; crss-validatin anther majrity vte ver a sequence f classifiers here each classifier is fairly weak (a small tree) key innvatin: data are re-weighted as sequence grws, dwnweighting the recrds that are easy t classify > gb1 <- gbmb(glubmerge2[1:500, ], ".", + 1:45, n.trees = 5000) > cnfumat(gb1) predicted given > lda2 <- ldab(glubmerge2[1:500, ], ".", + 1:45) > cnfumat(lda2) predicted given > <- val(glubmerge2[1:500, ], ".", + ldab, "LOG", rep(1:6, each = 12)) > table(, glubmerge2$) summary tls fit statistical mdels Overfitting is a real cncern with fleible mdels Crss-validatin (repeatedly setting aside elements f a partitin f the data) and ther appraches t estimatin f generalizatin ability are imprtant interpretatin f fitted mdels can be illuminating; variable imprtance measures? use f classificatin prcesses t define distance/primity metrics seems prmising? Human interventin seems necessary (setting tuning parameters, defining generalizatin errr threshlds) Gd references: Ripley (Pattern Recgnitin), Hastie, Tibshirani and Friedman (Elements f Statistical Learning), Schölkpf and Smla (Learning with Kernels), Vapnik (Nature f statistical learning thery), Duda, Hart and Strk (Pattern Classificatin)

Pattern Recognition 2014 Support Vector Machines

Pattern Recognition 2014 Support Vector Machines Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft

More information

Tree Structured Classifier

Tree Structured Classifier Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients

More information

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels

k-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t

More information

IAML: Support Vector Machines

IAML: Support Vector Machines 1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int

More information

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:

In SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw: In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff

Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:

More information

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017

Resampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017 Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with

More information

Support-Vector Machines

Support-Vector Machines Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material

More information

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification

Agenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification Agenda Artificial Intelligence and its applicatins Lecture 6 Supervised Learning Prfessr Daniel Yeung danyeung@ieee.rg Dr. Patrick Chan patrickchan@ieee.rg Suth China University f Technlgy, China Learning

More information

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.

Internal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9. Sectin 7 Mdel Assessment This sectin is based n Stck and Watsn s Chapter 9. Internal vs. external validity Internal validity refers t whether the analysis is valid fr the ppulatin and sample being studied.

More information

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis

SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical model for microarray data analysis SUPPLEMENTARY MATERIAL GaGa: a simple and flexible hierarchical mdel fr micrarray data analysis David Rssell Department f Bistatistics M.D. Andersn Cancer Center, Hustn, TX 77030, USA rsselldavid@gmail.cm

More information

The blessing of dimensionality for kernel methods

The blessing of dimensionality for kernel methods fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Pierre.Dupnt@ucluvain.be Classifiers define decisin surfaces in sme feature space where the data is either initially represented

More information

Resampling Methods. Chapter 5. Chapter 5 1 / 52

Resampling Methods. Chapter 5. Chapter 5 1 / 52 Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and

More information

Chapter 3: Cluster Analysis

Chapter 3: Cluster Analysis Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA

More information

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank

CAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal

More information

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data

x 1 Outline IAML: Logistic Regression Decision Boundaries Example Data Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares

More information

Elements of Machine Intelligence - I

Elements of Machine Intelligence - I ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011 The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin

More information

Simple Linear Regression (single variable)

Simple Linear Regression (single variable) Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins

More information

COMP 551 Applied Machine Learning Lecture 4: Linear classification

COMP 551 Applied Machine Learning Lecture 4: Linear classification COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted

More information

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition

The Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge

More information

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines

COMP 551 Applied Machine Learning Lecture 11: Support Vector Machines COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse

More information

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came.

CHAPTER 24: INFERENCE IN REGRESSION. Chapter 24: Make inferences about the population from which the sample data came. MATH 1342 Ch. 24 April 25 and 27, 2013 Page 1 f 5 CHAPTER 24: INFERENCE IN REGRESSION Chapters 4 and 5: Relatinships between tw quantitative variables. Be able t Make a graph (scatterplt) Summarize the

More information

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa

PSU GISPOPSCI June 2011 Ordinary Least Squares & Spatial Linear Regression in GeoDa There are tw parts t this lab. The first is intended t demnstrate hw t request and interpret the spatial diagnstics f a standard OLS regressin mdel using GeDa. The diagnstics prvide infrmatin abut the

More information

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification

COMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551

More information

A Matrix Representation of Panel Data

A Matrix Representation of Panel Data web Extensin 6 Appendix 6.A A Matrix Representatin f Panel Data Panel data mdels cme in tw brad varieties, distinct intercept DGPs and errr cmpnent DGPs. his appendix presents matrix algebra representatins

More information

Part 3 Introduction to statistical classification techniques

Part 3 Introduction to statistical classification techniques Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms

More information

What is Statistical Learning?

What is Statistical Learning? What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,

More information

NUMBERS, MATHEMATICS AND EQUATIONS

NUMBERS, MATHEMATICS AND EQUATIONS AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t

More information

Hypothesis Tests for One Population Mean

Hypothesis Tests for One Population Mean Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be

More information

COMP9444 Neural Networks and Deep Learning 3. Backpropagation

COMP9444 Neural Networks and Deep Learning 3. Backpropagation COMP9444 Neural Netwrks and Deep Learning 3. Backprpagatin Tetbk, Sectins 4.3, 5.2, 6.5.2 COMP9444 17s2 Backprpagatin 1 Outline Supervised Learning Ockham s Razr (5.2) Multi-Layer Netwrks Gradient Descent

More information

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment

Name: Block: Date: Science 10: The Great Geyser Experiment A controlled experiment Science 10: The Great Geyser Experiment A cntrlled experiment Yu will prduce a GEYSER by drpping Ments int a bttle f diet pp Sme questins t think abut are: What are yu ging t test? What are yu ging t measure?

More information

Distributions, spatial statistics and a Bayesian perspective

Distributions, spatial statistics and a Bayesian perspective Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics

More information

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM

The general linear model and Statistical Parametric Mapping I: Introduction to the GLM The general linear mdel and Statistical Parametric Mapping I: Intrductin t the GLM Alexa Mrcm and Stefan Kiebel, Rik Hensn, Andrew Hlmes & J-B J Pline Overview Intrductin Essential cncepts Mdelling Design

More information

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >

Bootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) > Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);

More information

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)

COMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d) COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise

More information

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours

STATS216v Introduction to Statistical Learning Stanford University, Summer Practice Final (Solutions) Duration: 3 hours STATS216v Intrductin t Statistical Learning Stanfrd University, Summer 2016 Practice Final (Slutins) Duratin: 3 hurs Instructins: (This is a practice final and will nt be graded.) Remember the university

More information

INTRODUCTION TO MACHINE LEARNING FOR MEDICINE

INTRODUCTION TO MACHINE LEARNING FOR MEDICINE Fall 2017 INTRODUCTION TO MACHINE LEARNING FOR MEDICINE Carla E. Brdley Prfessr & Dean Cllege f Cmputer and Infrmatin Science Nrtheastern University WHAT IS MACHINE LEARNING/DATA MINING? Figure is frm

More information

Comparing Several Means: ANOVA. Group Means and Grand Mean

Comparing Several Means: ANOVA. Group Means and Grand Mean STAT 511 ANOVA and Regressin 1 Cmparing Several Means: ANOVA Slide 1 Blue Lake snap beans were grwn in 12 pen-tp chambers which are subject t 4 treatments 3 each with O 3 and SO 2 present/absent. The ttal

More information

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation

A New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.

More information

, which yields. where z1. and z2

, which yields. where z1. and z2 The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin

More information

Relationship Between Pollination Behavior of Invasive Honeybees and Native Bumblebees

Relationship Between Pollination Behavior of Invasive Honeybees and Native Bumblebees Relatinship Between Pllinatin Behavir f Invasive Hneybees and Native Bumblebees Carlyn Silverman Barnard Cllege, Clumbia University 7 Ec-Infrmatics Summer Institute HJ Andrews Experimental Frest Oregn

More information

Linear Classification

Linear Classification Linear Classificatin CS 54: Machine Learning Slides adapted frm Lee Cper, Jydeep Ghsh, and Sham Kakade Review: Linear Regressin CS 54 [Spring 07] - H Regressin Given an input vectr x T = (x, x,, xp), we

More information

How do scientists measure trees? What is DBH?

How do scientists measure trees? What is DBH? Hw d scientists measure trees? What is DBH? Purpse Students develp an understanding f tree size and hw scientists measure trees. Students bserve and measure tree ckies and explre the relatinship between

More information

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards:

MODULE FOUR. This module addresses functions. SC Academic Elementary Algebra Standards: MODULE FOUR This mdule addresses functins SC Academic Standards: EA-3.1 Classify a relatinship as being either a functin r nt a functin when given data as a table, set f rdered pairs, r graph. EA-3.2 Use

More information

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date

AP Statistics Practice Test Unit Three Exploring Relationships Between Variables. Name Period Date AP Statistics Practice Test Unit Three Explring Relatinships Between Variables Name Perid Date True r False: 1. Crrelatin and regressin require explanatry and respnse variables. 1. 2. Every least squares

More information

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007

CS 477/677 Analysis of Algorithms Fall 2007 Dr. George Bebis Course Project Due Date: 11/29/2007 CS 477/677 Analysis f Algrithms Fall 2007 Dr. Gerge Bebis Curse Prject Due Date: 11/29/2007 Part1: Cmparisn f Srting Algrithms (70% f the prject grade) The bjective f the first part f the assignment is

More information

Writing Guidelines. (Updated: November 25, 2009) Forwards

Writing Guidelines. (Updated: November 25, 2009) Forwards Writing Guidelines (Updated: Nvember 25, 2009) Frwards I have fund in my review f the manuscripts frm ur students and research assciates, as well as thse submitted t varius jurnals by thers that the majr

More information

Math Foundations 20 Work Plan

Math Foundations 20 Work Plan Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant

More information

INSTRUMENTAL VARIABLES

INSTRUMENTAL VARIABLES INSTRUMENTAL VARIABLES Technical Track Sessin IV Sergi Urzua University f Maryland Instrumental Variables and IE Tw main uses f IV in impact evaluatin: 1. Crrect fr difference between assignment f treatment

More information

CONSTRUCTING STATECHART DIAGRAMS

CONSTRUCTING STATECHART DIAGRAMS CONSTRUCTING STATECHART DIAGRAMS The fllwing checklist shws the necessary steps fr cnstructing the statechart diagrams f a class. Subsequently, we will explain the individual steps further. Checklist 4.6

More information

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001)

CN700 Additive Models and Trees Chapter 9: Hastie et al. (2001) CN700 Additive Mdels and Trees Chapter 9: Hastie et al. (2001) Madhusudana Shashanka Department f Cgnitive and Neural Systems Bstn University CN700 - Additive Mdels and Trees March 02, 2004 p.1/34 Overview

More information

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA

Modelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview

More information

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y )

[COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t o m a k e s u r e y o u a r e r e a d y ) (Abut the final) [COLLEGE ALGEBRA EXAM I REVIEW TOPICS] ( u s e t h i s t m a k e s u r e y u a r e r e a d y ) The department writes the final exam s I dn't really knw what's n it and I can't very well

More information

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES

SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES 1 SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES Wlfgang HÄRDLE Ruslan MORO Center fr Applied Statistics and Ecnmics (CASE), Humbldt-Universität zu Berlin Mtivatin 2 Applicatins in Medicine estimatin f

More information

Eric Klein and Ning Sa

Eric Klein and Ning Sa Week 12. Statistical Appraches t Netwrks: p1 and p* Wasserman and Faust Chapter 15: Statistical Analysis f Single Relatinal Netwrks There are fur tasks in psitinal analysis: 1) Define Equivalence 2) Measure

More information

Lab 1 The Scientific Method

Lab 1 The Scientific Method INTRODUCTION The fllwing labratry exercise is designed t give yu, the student, an pprtunity t explre unknwn systems, r universes, and hypthesize pssible rules which may gvern the behavir within them. Scientific

More information

AIP Logic Chapter 4 Notes

AIP Logic Chapter 4 Notes AIP Lgic Chapter 4 Ntes Sectin 4.1 Sectin 4.2 Sectin 4.3 Sectin 4.4 Sectin 4.5 Sectin 4.6 Sectin 4.7 4.1 The Cmpnents f Categrical Prpsitins There are fur types f categrical prpsitins. Prpsitin Letter

More information

Checking the resolved resonance region in EXFOR database

Checking the resolved resonance region in EXFOR database Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,

More information

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction

T Algorithmic methods for data mining. Slide set 6: dimensionality reduction T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,

More information

SAMPLING DYNAMICAL SYSTEMS

SAMPLING DYNAMICAL SYSTEMS SAMPLING DYNAMICAL SYSTEMS Melvin J. Hinich Applied Research Labratries The University f Texas at Austin Austin, TX 78713-8029, USA (512) 835-3278 (Vice) 835-3259 (Fax) hinich@mail.la.utexas.edu ABSTRACT

More information

Lecture 10, Principal Component Analysis

Lecture 10, Principal Component Analysis Principal Cmpnent Analysis Lecture 10, Principal Cmpnent Analysis Ha Helen Zhang Fall 2017 Ha Helen Zhang Lecture 10, Principal Cmpnent Analysis 1 / 16 Principal Cmpnent Analysis Lecture 10, Principal

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 2: Mdeling change. In Petre Department f IT, Åb Akademi http://users.ab.fi/ipetre/cmpmd/ Cntent f the lecture Basic paradigm f mdeling change Examples Linear dynamical

More information

Determining the Accuracy of Modal Parameter Estimation Methods

Determining the Accuracy of Modal Parameter Estimation Methods Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system

More information

NGSS High School Physics Domain Model

NGSS High School Physics Domain Model NGSS High Schl Physics Dmain Mdel Mtin and Stability: Frces and Interactins HS-PS2-1: Students will be able t analyze data t supprt the claim that Newtn s secnd law f mtin describes the mathematical relatinship

More information

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method.

Lesson Plan. Recode: They will do a graphic organizer to sequence the steps of scientific method. Lessn Plan Reach: Ask the students if they ever ppped a bag f micrwave ppcrn and nticed hw many kernels were unppped at the bttm f the bag which made yu wnder if ther brands pp better than the ne yu are

More information

Group Analysis: Hands-On

Group Analysis: Hands-On Grup Analysis: Hands-On Gang Chen SSCC/NIMH/NIH/HHS 3/19/16 1 Make sure yu have the files!! Under directry grup_analysis_hands_n/! Slides: GrupAna_HO.pdf! Data: AFNI_data6/GrupAna_cases/! In case yu dn

More information

Statistical Learning. 2.1 What Is Statistical Learning?

Statistical Learning. 2.1 What Is Statistical Learning? 2 Statistical Learning 2.1 What Is Statistical Learning? In rder t mtivate ur study f statistical learning, we begin with a simple example. Suppse that we are statistical cnsultants hired by a client t

More information

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology

Technical Bulletin. Generation Interconnection Procedures. Revisions to Cluster 4, Phase 1 Study Methodology Technical Bulletin Generatin Intercnnectin Prcedures Revisins t Cluster 4, Phase 1 Study Methdlgy Release Date: Octber 20, 2011 (Finalizatin f the Draft Technical Bulletin released n September 19, 2011)

More information

Chapter 15 & 16: Random Forests & Ensemble Learning

Chapter 15 & 16: Random Forests & Ensemble Learning Chapter 15 & 16: Randm Frests & Ensemble Learning DD3364 Nvember 27, 2012 Ty Prblem fr Bsted Tree Bsted Tree Example Estimate this functin with a sum f trees with 9-terminal ndes by minimizing the sum

More information

Support Vector Machines and Flexible Discriminants

Support Vector Machines and Flexible Discriminants 12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal

More information

Study Group Report: Plate-fin Heat Exchangers: AEA Technology

Study Group Report: Plate-fin Heat Exchangers: AEA Technology Study Grup Reprt: Plate-fin Heat Exchangers: AEA Technlgy The prblem under study cncerned the apparent discrepancy between a series f experiments using a plate fin heat exchanger and the classical thery

More information

Statistics, Numerical Models and Ensembles

Statistics, Numerical Models and Ensembles Statistics, Numerical Mdels and Ensembles Duglas Nychka, Reinhard Furrer,, Dan Cley Claudia Tebaldi, Linda Mearns, Jerry Meehl and Richard Smith (UNC). Spatial predictin and data assimilatin Precipitatin

More information

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1

Data Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1 Data Mining: Cncepts and Techniques Classificatin and Predictin Chapter 6.4-6 February 8, 2007 CSE-4412: Data Mining 1 Chapter 6 Classificatin and Predictin 1. What is classificatin? What is predictin?

More information

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling?

NAME: Prof. Ruiz. 1. [5 points] What is the difference between simple random sampling and stratified random sampling? CS4445 ata Mining and Kwledge iscery in atabases. B Term 2014 Exam 1 Nember 24, 2014 Prf. Carlina Ruiz epartment f Cmputer Science Wrcester Plytechnic Institute NAME: Prf. Ruiz Prblem I: Prblem II: Prblem

More information

Computational modeling techniques

Computational modeling techniques Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins

More information

The standards are taught in the following sequence.

The standards are taught in the following sequence. B L U E V A L L E Y D I S T R I C T C U R R I C U L U M MATHEMATICS Third Grade In grade 3, instructinal time shuld fcus n fur critical areas: (1) develping understanding f multiplicatin and divisin and

More information

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter

Midwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline

More information

SPH3U1 Lesson 06 Kinematics

SPH3U1 Lesson 06 Kinematics PROJECTILE MOTION LEARNING GOALS Students will: Describe the mtin f an bject thrwn at arbitrary angles thrugh the air. Describe the hrizntal and vertical mtins f a prjectile. Slve prjectile mtin prblems.

More information

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression

3.4 Shrinkage Methods Prostate Cancer Data Example (Continued) Ridge Regression 3.3.4 Prstate Cancer Data Example (Cntinued) 3.4 Shrinkage Methds 61 Table 3.3 shws the cefficients frm a number f different selectin and shrinkage methds. They are best-subset selectin using an all-subsets

More information

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression

4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression 4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw

More information

AP Statistics Notes Unit Two: The Normal Distributions

AP Statistics Notes Unit Two: The Normal Distributions AP Statistics Ntes Unit Tw: The Nrmal Distributins Syllabus Objectives: 1.5 The student will summarize distributins f data measuring the psitin using quartiles, percentiles, and standardized scres (z-scres).

More information

Smoothing, penalized least squares and splines

Smoothing, penalized least squares and splines Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin

More information

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y=

making triangle (ie same reference angle) ). This is a standard form that will allow us all to have the X= y= Intrductin t Vectrs I 21 Intrductin t Vectrs I 22 I. Determine the hrizntal and vertical cmpnents f the resultant vectr by cunting n the grid. X= y= J. Draw a mangle with hrizntal and vertical cmpnents

More information

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank

MATCHING TECHNIQUES. Technical Track Session VI. Emanuela Galasso. The World Bank MATCHING TECHNIQUES Technical Track Sessin VI Emanuela Galass The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Emanuela Galass fr the purpse f this wrkshp When can we use

More information

Weathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science

Weathering. Title: Chemical and Mechanical Weathering. Grade Level: Subject/Content: Earth and Space Science Weathering Title: Chemical and Mechanical Weathering Grade Level: 9-12 Subject/Cntent: Earth and Space Science Summary f Lessn: Students will test hw chemical and mechanical weathering can affect a rck

More information

Evaluation of Classification Procedures for Estimating Wheat Acreage in Kansas

Evaluation of Classification Procedures for Estimating Wheat Acreage in Kansas Purdue University Purdue e-pubs LARS Sympsia Labratry fr Applicatins f Remte Sensing 1-1-1976 Evaluatin f Classificatin Prcedures fr Estimating Wheat Acreage in Kansas L. M. Flres D. T. Register Fllw this

More information

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A.

SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST. Mark C. Otto Statistics Research Division, Bureau of the Census Washington, D.C , U.S.A. SIZE BIAS IN LINE TRANSECT SAMPLING: A FIELD TEST Mark C. Ott Statistics Research Divisin, Bureau f the Census Washingtn, D.C. 20233, U.S.A. and Kenneth H. Pllck Department f Statistics, Nrth Carlina State

More information

Assessment Primer: Writing Instructional Objectives

Assessment Primer: Writing Instructional Objectives Assessment Primer: Writing Instructinal Objectives (Based n Preparing Instructinal Objectives by Mager 1962 and Preparing Instructinal Objectives: A critical tl in the develpment f effective instructin

More information

Inference in the Multiple-Regression

Inference in the Multiple-Regression Sectin 5 Mdel Inference in the Multiple-Regressin Kinds f hypthesis tests in a multiple regressin There are several distinct kinds f hypthesis tests we can run in a multiple regressin. Suppse that amng

More information

Five Whys How To Do It Better

Five Whys How To Do It Better Five Whys Definitin. As explained in the previus article, we define rt cause as simply the uncvering f hw the current prblem came int being. Fr a simple causal chain, it is the entire chain. Fr a cmplex

More information

Multiple Source Multiple. using Network Coding

Multiple Source Multiple. using Network Coding Multiple Surce Multiple Destinatin Tplgy Inference using Netwrk Cding Pegah Sattari EECS, UC Irvine Jint wrk with Athina Markpulu, at UCI, Christina Fraguli, at EPFL, Lausanne Outline Netwrk Tmgraphy Gal,

More information

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme

Enhancing Performance of MLP/RBF Neural Classifiers via an Multivariate Data Distribution Scheme Enhancing Perfrmance f / Neural Classifiers via an Multivariate Data Distributin Scheme Halis Altun, Gökhan Gelen Nigde University, Electrical and Electrnics Engineering Department Nigde, Turkey haltun@nigde.edu.tr

More information

Data mining/machine learning large data sets. STA 302 or 442 (Applied Statistics) :, 1

Data mining/machine learning large data sets. STA 302 or 442 (Applied Statistics) :, 1 Data mining/machine learning large data sets STA 302 r 442 (Applied Statistics) :, 1 Data mining/machine learning large data sets high dimensinal spaces STA 302 r 442 (Applied Statistics) :, 2 Data mining/machine

More information

REGRESSION DISCONTINUITY (RD) Technical Track Session V. Dhushyanth Raju Julieta Trias The World Bank

REGRESSION DISCONTINUITY (RD) Technical Track Session V. Dhushyanth Raju Julieta Trias The World Bank REGRESSION DISCONTINUITY (RD) Technical Track Sessin V Dhushyanth Raju Julieta Trias The Wrld Bank These slides cnstitute supprting material t the Impact Evaluatin in Practice Handbk : Gertler, P. J.;

More information

Fall 2013 Physics 172 Recitation 3 Momentum and Springs

Fall 2013 Physics 172 Recitation 3 Momentum and Springs Fall 03 Physics 7 Recitatin 3 Mmentum and Springs Purpse: The purpse f this recitatin is t give yu experience wrking with mmentum and the mmentum update frmula. Readings: Chapter.3-.5 Learning Objectives:.3.

More information

MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION

MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION MACHINE LEARNING FOR CLUSTER- GALAXY CLASSIFICATION Silvia de Castr García Directres: Dr. Ricard Pérez Martínez, Dra. Ana María Pérez García 16/03/2018 Machine Learning fr cluster-galaxy classificatin

More information

Testing Groups of Genes

Testing Groups of Genes Testing Grups f Genes Part II: Scring Gene Ontlgy Terms Manuela Hummel, LMU München Adrian Alexa, MPI Saarbrücken NGFN-Curses in Practical DNA Micrarray Analysis Heidelberg, March 6, 2008 Bilgical questins

More information

Least Squares Optimal Filtering with Multirate Observations

Least Squares Optimal Filtering with Multirate Observations Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical

More information