17. Introduction to Tree and Neural Network Regression
|
|
- Georgia Strickland
- 5 years ago
- Views:
Transcription
1 17. Introduction to Tree and Neural Network Regression As we have repeated often throughout this book, the classical multiple regression model is clearly wrong in many ways. One way that the model is wrong is in the assumption of a normal distribution. Instead, the distribution of Y X = x is correctly given as p(y x), where p(y x) can be any distribution (recall that x = (x1, x2,, xk)). Another way that the classical model is wrong is in the assumption that the conditional mean function is given by E(Y X1 = x1, X2 = x2,, Xk = xk) = 0 + 1x1 + 2x2 + + kxk. Instead, the conditional mean function (assuming the mean is finite) is correctly given as E(Y X1 = x1, X2 = x2,, Xk = xk) = f (x1, x2,, xk), where f (x1, x2,, xk) can have any shape whatsoever in (k + 1) dimensional space, from curved hyperplanes, to functions with bumps, to discontinuous functions, depending upon the Nature of the process that you are studying. As we noted in Chapter 1, Section 1.7, even the simple linear model f (x1) = 0 + 1x1 is usually wrong: We gave a logical argument that there must be some curvature as long as the X1 variable has at least three levels, and as long as there is dependence between Y and X1. This argument becomes more relevant in multidimensional space, where the curvature can take so many forms, including twists, turns and bumps. The goal of neural network and tree regression is to estimate f (x1, x2,, xk) as a general, nonlinear (or non-planar) function. In particular, these methods allow, and can easily find, strong interactions between various combinations of X variables, which can be very tricky to tease out using the classical regression model with interaction terms. By allowing greater generality and flexibility than the restrictive classical (planar) model f (x1, x2,, xk) = 0 + 1x1 + 2x2 + + kxk, these models can, in some cases, also give you predicted values Y ˆ that tend to be closer to future Y values than classical linear models. Since these methods attempt to find unusual model features (twists, turns, and bumps) of the function f (x1, x2,, xk) in high dimensional space, they typically require large amounts of data to be able to estimate such features. With the data revolution, such large data sets are routinely available. Thus, it has become nearly a default position in data science to analyze large data sets using tree and neural network regressions, rather than the classical regression model Tree Regression Not only does tree regression allow complex functional relationships, it also provides output that is very easy to interpret even easier than the classical linear model. Further, the output of tree regression models can provide a clear recipe for action; e.g., in the determination of profiles (particular combinations of X values) that are associated with fraudulent activity. To start, let us show an example of what tree regression can do for you. We will use a data set from a survey of n = 1,020 individuals who were asked how they pronounce data, either as day-tuh or daa-tuh. The raw data show 653/1020 = 64.0% day-tuh and 367/1020 = 36.0%
2 daa-tuh. As in all regression, we would like to know how this distribution (64.0% day-tuh, 36.0% daa-tuh ) changes for particular values of the X variables. In addition to the day-tuh / daa-tuh pronunciation, the survey data also contain demographic information such as gender, age, region of the U.S., and other variables. You can use the rpart function to construct tree regression estimates; rpart is short for recursive partitioning, which describes the algorithm that is used. Code to fit a tree model is as follows: pron = read.csv(" attach(pron) Y = ifelse(q4 ==1, 1,0) library(rpart) ft = rpart(y ~ Q1 +Q2+Q3 + Q5 + Q6 +Q7 +Q8 + Q9 +Q11 + Q12 +Q13 + Age + Gender) plot(ft) text(ft) The results are shown in Figure Figure Results of tree regression to predict pronunciation of data. Numbers below nodes are proportions of day-tuh pronunciation.
3 To interpret Figure , note the following. There is an automatic variable selection algorithm inside the software, and it picked only the variables Q7, Q2, and Age as predictors of pronunciation. The variable Q7 is colleague pronunciation with Q7 = 1 indicating colleague pronounces day-tuh, and Q7 = 2 indicating colleague pronounces daa-tuh. The first split is on Q The left side of the split corresponds to truth of Q7 1.5, which means that Q7 = 2, or colleague pronounces daa-tuh. The right side corresponds to falseness of Q7 1.5, which means that Q7 = 1, or colleague pronounces day-tuh. Below the right split, there is just one entry, , which is the mean of Y for this condition. You can verify this as mean(y[q7 < 1.5]), which gives you Since the means of 0/1 data are proportions, this result means that 86.86% of the respondents who said their colleagues say day-tuh also say day-tuh. Down the left hand split, which initially only considers people whose colleagues say daa-tuh, the next split involves Q2, which is region, and takes the values 1,2,3,4 for West, South, Midwest, and Northeast regions, respectively. The left split again indicates truth of Q2 < 3.5, meaning that the left split contains data from regions 1, 2 and 3. The entry 0.2 means that 20% of the subset determined by (i) colleague says daa-tuh and (ii) person lives in regions 1,2 or 3, pronounce day-tuh. In other words, in this subset, 80% pronounce data like their colleagues do. To verify this result manually, enter the command mean(y[(q7 >= 1.5) & (Q2 < 3.5)]). The right hand split on Q2 < 3.5 indicates that the condition is false, so that Q With these data, the only possibility here is that Q2 = 4, i.e., people live in the northeast region. Among these people (whose colleagues say daa-tuh because of the higher-level split), younger people are likely to ignore their colleagues (57.78% say day-tuh ), but older people are likely to mimic their colleagues (18.42% say day-tuh. ) This is an example of the kind of high-level interaction that tree regression can tease out: The effect of colleague s pronunciation of data depends on both region and age, a three-way interaction. (Self-study problem: Find the percentages 57.78% and 18.42% by hand. ) As you can see, tree regression is very useful for identifying pockets based on the X variables where the Y variable differs greatly from pocket to pocket. If the Y variable was an indication of fraudulent activity, the pockets of observations where Y is high, as shown by the tree output, would identify specific observations to investigate further. To understand what the tree regression model is doing in more detail, consider the charity data set again, and consider the prediction of Y = charitable contributions (in log scale), as a function of a single X variable, Income (again in log scale). The following R code shows the default analysis from rpart, as well as a simplified analysis that will facilitate explanation. ## Initial analysis. char = read.csv(" attach(char) library(rpart)
4 fit1 = rpart(charity ~ INCOME) plot(fit1); text(fit1, cex=.7) This code produces the following tree: Figure Tree regression using rpart to predict log charitable contributions in terms of log income (denoted INCOME in the figure). Figure tells you that, as you might expect, there tends to be higher charitable contributions among people with higher income. But rather than express that relationship as a linear function, as in the classical regression model, the tree regression expresses that relationship in terms of ranges of values of INCOME where contributions are either lower or higher. To understand the method further, consider the following code, which provides a simplified analysis. ## Simplified analysis. char = read.csv(" attach(char) library(rpart) fit2 = rpart(charity ~ INCOME, maxdepth=1) plot(fit2); text(fit2, cex=.7)
5 Figure Tree regression using rpart with maxdepth=1 to predict log charitable contributions in terms of log income (denoted INCOME in the figure). In Figure , notice that we are just looking at the first split from Figure The split is the same, at INCOME < Notes on these two analyses are as follows: The split value, 10.91, is chosen to maximize separation between the Y variable (here, log charitable contributions), between the two groups. Different measures of separation are possible, such as the F statistic or the log likelihood statistic. The default measure of separation used by rpart is somewhat unusual (the Gini coefficient). The method is called recursive partitioning because the further splits shown in Figure use the same splitting method as with the first split, except that the method is used on the subsets of data. For example, after the split on INCOME < 10.91, the subset of data where INCOME is < is subject to the same splitting algorithm, finding the next optimal split as INCOME <10.32 as shown in Figure The right hand side of that split indicates INCOME 10.32, but since this split is down the path where INCOME < 10.91, the actual range of INCOME in this group is INCOME < The predicted values ( Y ˆ ) from the model are constant over the intervals as shown in the following code and Figure.
6 R Code for Figure Income.plot = seq(8.5, 12,.001) Income.pred = data.frame(income.plot) names(income.pred) = c("income") Yhat1 = predict(fit1, Income.pred) Yhat2 = predict(fit2, Income.pred) par(mfrow=c(1,2)) plot(income, CHARITY, pch = ".", cex=1.5) points(income.plot, Yhat1, type="l", lwd=2) abline(lsfit(income, CHARITY), lwd=1.5, lty=2) plot(income, CHARITY, pch=".", cex=1.5) points(income.plot, Yhat2, type="l", lwd=2) abline(lsfit(income, CHARITY), lwd=1.5, lty=2) Figure Scatterplots of (ln(income), ln(charitable Contributions)), with predictions of mean charitable contributions using tree regression (solid lines) and ordinary least squares (dashed lines). Left panel: Default tree from rpart. Right panel: Tree pruned to depth = 1. As Figure shows, the tree regression estimates of the conditional mean function are flat line segments over interval ranges. Clearly, these functions are not very good as estimates of the true mean function, because Nature favors continuity over discontinuity. On the other hand, the results are very simple to use and interpret. In multiple regression, the tree functions are not flat lines, but instead flat planes or hyperplanes, which are defined over various rectangular regions defined by combinations of the X variables. Picture a city building that has been built in pieces, with different parts added on over time, all with different heights. That s what the tree regression function looks like in multiple regression.
7 The most interesting and useful applications of tree regression occur when there are many X variables, and where the nature of the relationship between Y and the X variables is highly nonlinear, involving high level interactions. With simpler, well behaved examples, such as the regression of CHARITY on INCOME as shown above, the simple linear regression is obviously much better as shown in Figure Neural Network Regression Neural network regression has a similar goal as tree regression; namely, to estimate conditional mean functions that are highly nonlinear, perhaps also involving high-level interactions. An advantage of neural networks over trees is that the functions are estimated as continuous functions rather than discontinuous flat line (or flat plane) segments. Thus, neural network estimates are generally more realistic than tree estimates, because, again, Nature generally favors continuity over discontinuity. A disadvantage of the neural network estimates is that the results are not simple to interpret. Trees are the easiest to interpret, classical regression next easiest, and neural nets are hardest. Thus, instead of trying to interpret the results, users of neural nets usually just treat them as a black box to produce a predicted mean, ˆ Y = ˆf (x1, x2,, xk). Like LOESS, the neural net regression function ˆf (x1, x2,, xk) has a more complicated form than other regression functions, and researchers therefore do not ordinary examine its function form. There are many practical applications where you really do not care about the function form of ˆ Y ; all you care about is getting a ˆ Y that is close to Y. Examples include: The predicted range of your car (in miles or kilometers), which predicts how many miles (or km) your car can travel, given current fuel in the tank, and current (and recent) fuel economy. The predicted EEG reading of a heart patient, given a current (more simply obtained) reading of pulse, sweat, and blood pressure. The credit worthiness of a customer who is applying for a loan (think of your credit score ). These examples are all cases where you want a good prediction of Y, but you don t necessarily care how that prediction was obtained. In such cases, neural network regression can be useful. Universal Approximators First, a word about function approximation. A main goal in regression is to approximate the conditional mean function E(Y X1 = x1, X2 = x2,, Xk = xk) = f (x1, x2,, xk). As noted repeatedly throughout this book, linear, quadratic, interaction, exponential, and other functions are nearly always different from the true f (x1, x2,, xk). But some types of functions can be made arbitrarily close to f (x1, x2,, xk); such functions are called universal approximators.
8 Polynomial functions are universal approximators. That is, given any function f (x1, x2,, xk) defined over a bounded X set, you can approximate f (x1, x2,, xk) arbitrarily well by a polynomial function g (x1, x2,, xk). You might need an extremely high order of polynomial (recall that such functions involve all possible interaction terms up to the given order), but still, there exists a polynomial g(.) such that the maximum difference between f (.) and g (.) is as small as you would like. This result is mathematically proven as Stone-Weierstrauss Theorem. While the Stone-Weierstrauss Theorem seems to suggest that you should use high order polynomial models to estimate regression functions, you know from the variance-bias trade-off that, while such higher order models will be less biased (because of the Stone-Weierstrauss Theorem), such models also require a huge number of estimated parameters. Hence, high order polynomial models will suffer from extremely high variance when estimated using data. In addition, polynomial models can have extremely poor behavior at the extremes of the data, or where data are sparse. At the extremes, the predictions shoot off quickly to positive or negative infinity, because that s how high-order polynomials terms like x 3, x 4, x 5 etc. behave. Where data ae sparse, the behavior can also be erratic: See Figure , for example. There are universal approximators other than polynomials that do not have such wild behavior. Fourier analysis gives you functions g (.) involving sines and cosines that can approximate general functions f (.); such Fourier functions g (.) are also universal approximators, but do not have the wild extrapolation properties of polynomials. Being based on sines and cosines, Fourier functions g (.) are best 1 for approximating functions f (.) that are cyclical; but like polynomials, they can also approximate any function arbitrarily well. There are infinitely many universal approximators, depending of the particular set of basis functions that you use. In polynomial approximators, the basis functions are polynomials; in Fourier analysis, the basis functions involve sines and cosines. Pick an appropriate set of basis functions, and voilá! You can construct a universal approximator. Neural network is just a fancy-sounding name (meant to suggest that it somehow operates like the brain!) for a universal approximator that involves a particular type of basis function, called an activation function in the neural network jargon. The standard activation function is the same one that you use in logistic regression; namely 1/(1 + e x ). Like polynomial functions are constructed of functions involving various orders of polynomial terms, their interactions, and constant multipliers, neural network functions are constructed of logistic functions, their interactions, and constant multipliers. Example: Predicting Charitable Contributions Using a Neural Network Let s see how the neural network works in a real example first, and then we ll explain what it is doing. 1 Best means that they require fewer terms to arrive at a good approximation. Statistically, such parsimony is preferred because there will be fewer constant multipliers to estimate.
9 R Code for Figure char = read.csv(" attach(char) f = as.formula(paste("charity ~ INCOME")) dat = data.frame(charity, INCOME) library(neuralnet) fit.nn = neuralnet(f, data=dat, hidden=c(1,1)) Income.plot = seq(8.5, 12,.001) Income.pred = data.frame(income.plot) names(income.pred) = c("income") Yhat.nn = compute(fit.nn, Income.plot)$net.result plot(income, CHARITY, pch = ".", cex=1.5) points(income.plot, Yhat.nn, type="l", lwd=2) abline(lsfit(income, CHARITY), lwd=1.5, lty=2) Figure Scatterplot of (ln(income), ln(charitable Contributions)) data, with neural network (solid) and ordinary least squares (dashed) fits. Figure shows that the neural network fit is akin to LOESS in that it allows curved relationships. The flattened appearance on the left is due to the use of the logistic basis functions. (Still need to explain what it is doing).
Final Exam. Name: Solution:
Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.
More informationISQS 5349 Spring 2013 Final Exam
ISQS 5349 Spring 2013 Final Exam Name: General Instructions: Closed books, notes, no electronic devices. Points (out of 200) are in parentheses. Put written answers on separate paper; multiple choices
More informationFunction Approximation
1 Function Approximation This is page i Printer: Opaque this 1.1 Introduction In this chapter we discuss approximating functional forms. Both in econometric and in numerical problems, the need for an approximating
More informationChapter 1 Review of Equations and Inequalities
Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve
More informationSupport Vector Machines and Kernel Methods
Support Vector Machines and Kernel Methods Geoff Gordon ggordon@cs.cmu.edu July 10, 2003 Overview Why do people care about SVMs? Classification problems SVMs often produce good results over a wide range
More informationCLASS NOTES: BUSINESS CALCULUS
CLASS NOTES: BUSINESS CALCULUS These notes can be thought of as the logical skeleton of my lectures, although they will generally contain a fuller exposition of concepts but fewer examples than my lectures.
More informationChapter 6. Logistic Regression. 6.1 A linear model for the log odds
Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,
More informationCLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition
CLUe Training An Introduction to Machine Learning in R with an example from handwritten digit recognition Ad Feelders Universiteit Utrecht Department of Information and Computing Sciences Algorithmic Data
More informationStat 502X Exam 2 Spring 2014
Stat 502X Exam 2 Spring 2014 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This exam consists of 12 parts. I'll score it at 10 points per problem/part
More informationMidterm Exam, Spring 2005
10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationIndustrial Engineering Prof. Inderdeep Singh Department of Mechanical & Industrial Engineering Indian Institute of Technology, Roorkee
Industrial Engineering Prof. Inderdeep Singh Department of Mechanical & Industrial Engineering Indian Institute of Technology, Roorkee Module - 04 Lecture - 05 Sales Forecasting - II A very warm welcome
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationChapter 9 Regression with a Binary Dependent Variable. Multiple Choice. 1) The binary dependent variable model is an example of a
Chapter 9 Regression with a Binary Dependent Variable Multiple Choice ) The binary dependent variable model is an example of a a. regression model, which has as a regressor, among others, a binary variable.
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More information2. If the values for f(x) can be made as close as we like to L by choosing arbitrarily large. lim
Limits at Infinity and Horizontal Asymptotes As we prepare to practice graphing functions, we should consider one last piece of information about a function that will be helpful in drawing its graph the
More informationCONTENTS OF DAY 2. II. Why Random Sampling is Important 10 A myth, an urban legend, and the real reason NOTES FOR SUMMER STATISTICS INSTITUTE COURSE
1 2 CONTENTS OF DAY 2 I. More Precise Definition of Simple Random Sample 3 Connection with independent random variables 4 Problems with small populations 9 II. Why Random Sampling is Important 10 A myth,
More information1 What is the area model for multiplication?
for multiplication represents a lovely way to view the distribution property the real number exhibit. This property is the link between addition and multiplication. 1 1 What is the area model for multiplication?
More informationMathematics for Intelligent Systems Lecture 5 Homework Solutions
Mathematics for Intelligent Systems Lecture 5 Homework Solutions Advanced Calculus I: Derivatives and local geometry) Nathan Ratliff Nov 25, 204 Problem : Gradient and Hessian Calculations We ve seen that
More informationDefinition: A "system" of equations is a set or collection of equations that you deal with all together at once.
System of Equations Definition: A "system" of equations is a set or collection of equations that you deal with all together at once. There is both an x and y value that needs to be solved for Systems
More informationContinuum Limit and Fourier Series
Chapter 6 Continuum Limit and Fourier Series Continuous is in the eye of the beholder Most systems that we think of as continuous are actually made up of discrete pieces In this chapter, we show that a
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationAlgebra Exam. Solutions and Grading Guide
Algebra Exam Solutions and Grading Guide You should use this grading guide to carefully grade your own exam, trying to be as objective as possible about what score the TAs would give your responses. Full
More informationAlex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1
Alex s Guide to Word Problems and Linear Equations Following Glencoe Algebra 1 What is a linear equation? It sounds fancy, but linear equation means the same thing as a line. In other words, it s an equation
More informationREVIEW FOR TEST I OF CALCULUS I:
REVIEW FOR TEST I OF CALCULUS I: The first and best line of defense is to complete and understand the homework and lecture examples. Past that my old test might help you get some idea of how my tests typically
More informationChapter 1 Statistical Inference
Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations
More informationQuiz 1. Name: Instructions: Closed book, notes, and no electronic devices.
Quiz 1. Name: Instructions: Closed book, notes, and no electronic devices. 1. What is the difference between a deterministic model and a probabilistic model? (Two or three sentences only). 2. What is the
More informationVocabulary: Samples and Populations
Vocabulary: Samples and Populations Concept Different types of data Categorical data results when the question asked in a survey or sample can be answered with a nonnumerical answer. For example if we
More informationSection 7.4: Inverse Laplace Transform
Section 74: Inverse Laplace Transform A natural question to ask about any function is whether it has an inverse function We now ask this question about the Laplace transform: given a function F (s), will
More informationTree-based methods. Patrick Breheny. December 4. Recursive partitioning Bias-variance tradeoff Example Further remarks
Tree-based methods Patrick Breheny December 4 Patrick Breheny STA 621: Nonparametric Statistics 1/36 Introduction Trees Algorithm We ve seen that local methods and splines both operate locally either by
More informationBusiness Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationFinal Exam - Solutions
Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your
More informationBagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7
Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector
More informationChapter 9. Polynomial Models and Interaction (Moderator) Analysis
Chapter 9. Polynomial Models and Interaction (Moderator) Analysis In Chapter 4, we introduced the quadratic model as a device to test for curvature in the conditional mean function. You could also use
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationHOLLOMAN S AP STATISTICS BVD CHAPTER 08, PAGE 1 OF 11. Figure 1 - Variation in the Response Variable
Chapter 08: Linear Regression There are lots of ways to model the relationships between variables. It is important that you not think that what we do is the way. There are many paths to the summit We are
More information2. FUNCTIONS AND ALGEBRA
2. FUNCTIONS AND ALGEBRA You might think of this chapter as an icebreaker. Functions are the primary participants in the game of calculus, so before we play the game we ought to get to know a few functions.
More information, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1
Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression
More information15. NUMBERS HAVE LOTS OF DIFFERENT NAMES!
get the complete book: http://wwwonemathematicalcatorg/getfulltetfullbookhtm 5 NUMBERS HAVE LOTS OF DIFFERENT NAMES! a fun type of game with numbers one such game playing the game: 3 pets There are lots
More informationChapter 10 Nonlinear Models
Chapter 10 Nonlinear Models Nonlinear models can be classified into two categories. In the first category are models that are nonlinear in the variables, but still linear in terms of the unknown parameters.
More information3 rd class Mech. Eng. Dept. hamdiahmed.weebly.com Fourier Series
Definition 1 Fourier Series A function f is said to be piecewise continuous on [a, b] if there exists finitely many points a = x 1 < x 2
More informationLinear Model Selection and Regularization
Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In
More informationChapter 1. Gaining Knowledge with Design of Experiments
Chapter 1 Gaining Knowledge with Design of Experiments 1.1 Introduction 2 1.2 The Process of Knowledge Acquisition 2 1.2.1 Choosing the Experimental Method 5 1.2.2 Analyzing the Results 5 1.2.3 Progressively
More informationChapter 13 - Inverse Functions
Chapter 13 - Inverse Functions In the second part of this book on Calculus, we shall be devoting our study to another type of function, the exponential function and its close relative the Sine function.
More informationy response variable x 1, x 2,, x k -- a set of explanatory variables
11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate
More informationNP-Completeness Review
CS124 NP-Completeness Review Where We Are Headed Up to this point, we have generally assumed that if we were given a problem, we could find a way to solve it. Unfortunately, as most of you know, there
More informationIn economics, the amount of a good x demanded is a function of the price of that good. In other words,
I. UNIVARIATE CALCULUS Given two sets X and Y, a function is a rule that associates each member of X with eactly one member of Y. That is, some goes in, and some y comes out. These notations are used to
More informationSampling Distribution Models. Chapter 17
Sampling Distribution Models Chapter 17 Objectives: 1. Sampling Distribution Model 2. Sampling Variability (sampling error) 3. Sampling Distribution Model for a Proportion 4. Central Limit Theorem 5. Sampling
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationMA 1128: Lecture 08 03/02/2018. Linear Equations from Graphs And Linear Inequalities
MA 1128: Lecture 08 03/02/2018 Linear Equations from Graphs And Linear Inequalities Linear Equations from Graphs Given a line, we would like to be able to come up with an equation for it. I ll go over
More informationHigh-Dimensional Statistical Learning: Introduction
Classical Statistics Biological Big Data Supervised and Unsupervised Learning High-Dimensional Statistical Learning: Introduction Ali Shojaie University of Washington http://faculty.washington.edu/ashojaie/
More informationChapter 11. Regression with a Binary Dependent Variable
Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score
More informationChapter 6: The Definite Integral
Name: Date: Period: AP Calc AB Mr. Mellina Chapter 6: The Definite Integral v v Sections: v 6.1 Estimating with Finite Sums v 6.5 Trapezoidal Rule v 6.2 Definite Integrals 6.3 Definite Integrals and Antiderivatives
More informationIntroduction to Statistical modeling: handout for Math 489/583
Introduction to Statistical modeling: handout for Math 489/583 Statistical modeling occurs when we are trying to model some data using statistical tools. From the start, we recognize that no model is perfect
More informationof 8 28/11/ :25
Paul's Online Math Notes Home Content Chapter/Section Downloads Misc Links Site Help Contact Me Differential Equations (Notes) / First Order DE`s / Modeling with First Order DE's [Notes] Differential Equations
More informationProof Techniques (Review of Math 271)
Chapter 2 Proof Techniques (Review of Math 271) 2.1 Overview This chapter reviews proof techniques that were probably introduced in Math 271 and that may also have been used in a different way in Phil
More information2 Systems of Linear Equations
2 Systems of Linear Equations A system of equations of the form or is called a system of linear equations. x + 2y = 7 2x y = 4 5p 6q + r = 4 2p + 3q 5r = 7 6p q + 4r = 2 Definition. An equation involving
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationUniversity of California Berkeley CS170: Efficient Algorithms and Intractable Problems November 19, 2001 Professor Luca Trevisan. Midterm 2 Solutions
University of California Berkeley Handout MS2 CS170: Efficient Algorithms and Intractable Problems November 19, 2001 Professor Luca Trevisan Midterm 2 Solutions Problem 1. Provide the following information:
More informationLecture 9: Classification, LDA
Lecture 9: Classification, LDA Reading: Chapter 4 STATS 202: Data mining and analysis October 13, 2017 1 / 21 Review: Main strategy in Chapter 4 Find an estimate ˆP (Y X). Then, given an input x 0, we
More informationNP-Completeness I. Lecture Overview Introduction: Reduction and Expressiveness
Lecture 19 NP-Completeness I 19.1 Overview In the past few lectures we have looked at increasingly more expressive problems that we were able to solve using efficient algorithms. In this lecture we introduce
More informationTutorial on obtaining Taylor Series Approximations without differentiation
Tutorial on obtaining Taylor Series Approximations without differentiation Professor Henry Greenside February 2, 2018 1 Overview An important mathematical technique that is used many times in physics,
More informationLecture 2: Linear regression
Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued
More informationCSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes
CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes Roger Grosse Roger Grosse CSC2541 Lecture 2 Bayesian Occam s Razor and Gaussian Processes 1 / 55 Adminis-Trivia Did everyone get my e-mail
More information17 Neural Networks NEURAL NETWORKS. x XOR 1. x Jonathan Richard Shewchuk
94 Jonathan Richard Shewchuk 7 Neural Networks NEURAL NETWORKS Can do both classification & regression. [They tie together several ideas from the course: perceptrons, logistic regression, ensembles of
More information35 Chapter CHAPTER 4: Mathematical Proof
35 Chapter 4 35 CHAPTER 4: Mathematical Proof Faith is different from proof; the one is human, the other is a gift of God. Justus ex fide vivit. It is this faith that God Himself puts into the heart. 21
More informationRegression Analysis IV... More MLR and Model Building
Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression
More informationAP Calculus Chapter 9: Infinite Series
AP Calculus Chapter 9: Infinite Series 9. Sequences a, a 2, a 3, a 4, a 5,... Sequence: A function whose domain is the set of positive integers n = 2 3 4 a n = a a 2 a 3 a 4 terms of the sequence Begin
More informationCSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18
CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$
More information9 Classification. 9.1 Linear Classifiers
9 Classification This topic returns to prediction. Unlike linear regression where we were predicting a numeric value, in this case we are predicting a class: winner or loser, yes or no, rich or poor, positive
More information1 ** The performance objectives highlighted in italics have been identified as core to an Algebra II course.
Strand One: Number Sense and Operations Every student should understand and use all concepts and skills from the pervious grade levels. The standards are designed so that new learning builds on preceding
More informationChris Piech CS109 CS109 Final Exam. Fall Quarter Dec 14 th, 2017
Chris Piech CS109 CS109 Final Exam Fall Quarter Dec 14 th, 2017 This is a closed calculator/computer exam. You are, however, allowed to use notes in the exam. The last page of the exam is a Standard Normal
More information19. TAYLOR SERIES AND TECHNIQUES
19. TAYLOR SERIES AND TECHNIQUES Taylor polynomials can be generated for a given function through a certain linear combination of its derivatives. The idea is that we can approximate a function by a polynomial,
More information8. Classification and Regression Trees (CART, MRT & RF)
8. Classification and Regression Trees (CART, MRT & RF) Classification And Regression Tree analysis (CART) and its extension to multiple simultaneous response variables, Multivariate Regression Tree analysis
More informationFinding the Gold in Your Data
Finding the Gold in Your Data An introduction to Data Mining Originally presented @ SAS Global Forum David A. Dickey North Carolina State University Decision Trees A divisive method (splits) Start with
More informationACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Lines and Their Equations
ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER 1 017/018 DR. ANTHONY BROWN. Lines and Their Equations.1. Slope of a Line and its y-intercept. In Euclidean geometry (where
More informationAt the start of the term, we saw the following formula for computing the sum of the first n integers:
Chapter 11 Induction This chapter covers mathematical induction. 11.1 Introduction to induction At the start of the term, we saw the following formula for computing the sum of the first n integers: Claim
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More information8/04/2011. last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression)
psyc3010 lecture 7 analysis of covariance (ANCOVA) last lecture: correlation and regression next lecture: standard MR & hierarchical MR (MR = multiple regression) 1 announcements quiz 2 correlation and
More informationSolutions to Math 41 First Exam October 12, 2010
Solutions to Math 41 First Eam October 12, 2010 1. 13 points) Find each of the following its, with justification. If the it does not eist, eplain why. If there is an infinite it, then eplain whether it
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 06 - Regression & Decision Trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom
More informationPhysics 509: Propagating Systematic Uncertainties. Scott Oser Lecture #12
Physics 509: Propagating Systematic Uncertainties Scott Oser Lecture #1 1 Additive offset model Suppose we take N measurements from a distribution, and wish to estimate the true mean of the underlying
More informationGeneralization to Multi-Class and Continuous Responses. STA Data Mining I
Generalization to Multi-Class and Continuous Responses STA 5703 - Data Mining I 1. Categorical Responses (a) Splitting Criterion Outline Goodness-of-split Criterion Chi-square Tests and Twoing Rule (b)
More informationBayesian regression tree models for causal inference: regularization, confounding and heterogeneity
Bayesian regression tree models for causal inference: regularization, confounding and heterogeneity P. Richard Hahn, Jared Murray, and Carlos Carvalho June 22, 2017 The problem setting We want to estimate
More information9. Linear Regression and Correlation
9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,
More informationM15/5/MATME/SP2/ENG/TZ2/XX/M MARKSCHEME. May 2015 MATHEMATICS. Standard level. Paper pages
M15/5/MATME/SP/ENG/TZ/XX/M MARKSCHEME May 015 MATHEMATICS Standard level Paper 18 pages M15/5/MATME/SP/ENG/TZ/XX/M This markscheme is the property of the International Baccalaureate and must not be reproduced
More informationStatistical Prediction
Statistical Prediction P.R. Hahn Fall 2017 1 Some terminology The goal is to use data to find a pattern that we can exploit. y: response/outcome/dependent/left-hand-side x: predictor/covariate/feature/independent
More informationLecture 15: Exploding and Vanishing Gradients
Lecture 15: Exploding and Vanishing Gradients Roger Grosse 1 Introduction Last lecture, we introduced RNNs and saw how to derive the gradients using backprop through time. In principle, this lets us train
More informationWeek 3: Linear Regression
Week 3: Linear Regression Instructor: Sergey Levine Recap In the previous lecture we saw how linear regression can solve the following problem: given a dataset D = {(x, y ),..., (x N, y N )}, learn to
More informationMathematics for Chemists 2 Lecture 14: Fourier analysis. Fourier series, Fourier transform, DFT/FFT
Mathematics for Chemists 2 Lecture 14: Fourier analysis Fourier series, Fourier transform, DFT/FFT Johannes Kepler University Summer semester 2012 Lecturer: David Sevilla Fourier analysis 1/25 Remembering
More informationNotes on Continuous Random Variables
Notes on Continuous Random Variables Continuous random variables are random quantities that are measured on a continuous scale. They can usually take on any value over some interval, which distinguishes
More informationEssential facts about NP-completeness:
CMPSCI611: NP Completeness Lecture 17 Essential facts about NP-completeness: Any NP-complete problem can be solved by a simple, but exponentially slow algorithm. We don t have polynomial-time solutions
More informationRatios, Proportions, Unit Conversions, and the Factor-Label Method
Ratios, Proportions, Unit Conversions, and the Factor-Label Method Math 0, Littlefield I don t know why, but presentations about ratios and proportions are often confused and fragmented. The one in your
More information15. NUMBERS HAVE LOTS OF DIFFERENT NAMES!
5 NUMBERS HAVE LOTS OF DIFFERENT NAMES! a fun type of game with numbers one such game playing the game: 3 pets There are lots of number games that can make you look clairvoyant One such game goes something
More informationRegression I: Mean Squared Error and Measuring Quality of Fit
Regression I: Mean Squared Error and Measuring Quality of Fit -Applied Multivariate Analysis- Lecturer: Darren Homrighausen, PhD 1 The Setup Suppose there is a scientific problem we are interested in solving
More informationMachine Learning Linear Regression. Prof. Matteo Matteucci
Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares
More informationOutline of GLMs. Definitions
Outline of GLMs Definitions This is a short outline of GLM details, adapted from the book Nonparametric Regression and Generalized Linear Models, by Green and Silverman. The responses Y i have density
More information