CLASSIFICATION (DISCRIMINATION)
|
|
- Kory Matthews
- 5 years ago
- Views:
Transcription
1 CLASSIFICATION (DISCRIMINATION) Caren Marzban marzban Generalities This lecture relies on the previous lecture to some etent. So, read that one first. Also, as in the previous lecture, what ou are currentl reading is an epanded version of what is presented at the lecture. Recall, that the problem at hand is to estimate the weights that parameterize the relationship between a set of input variables and a set of output variables (or targets). Actuall, we have to decide on the parameterization, too, since the map also depends on H, the number of hidden nodes. Attributes Features Inputs () Predictand Target (t) Outputs () Map Function Face Name Relation In the previous lecture we considered the regression problem, where the targets are continuous variables. Now, we will see how to handle the case where the targets are discrete, with each value of the target representing some class of objects. For eample, a tornado classification problem could consist of finding the map that relates some set of input variables to, sa, two classes - the class of non-tornadoes, and the class of tornadoes. This wa we can develop a model that can predict whether a set of inputs values corresponds to a tornado or not. To be more precise, the map or the function we are looking for in a classification problem is the (decision) boundar that best demarcates the regions of classes. The figure below shows eamples of a linear boundar (left) and a nonlinear boundar (right) for a noise-free problem.
2 Discriminant Analsis A traditional classification model, called Discriminant Analsis, is an enlightening ancestor of NNs. Let s look at the simple case of one input variable,, and two classes labeled as i =, (e.g., nontornado, and tornado). The model begins b assuming that the likelihood of, when it happens to belong to class=i, is gaussian. In other words, the conditional probabilit of, given class=i, is written as L i () e ( µ i) σ i prob( C = i). where µ i and σ i are the mean and standard deviation of when it belongs to the i th class. Both are estimated from a training set. Note that L + L. For forecasting purposes, this likelihood is the wrong probabilit to look at. We are not interested in the probabilit of finding, sa wind speed = 5 Km/hr, given that there is a tornado on the ground. We want to know the probabilit that there is a tornado on the ground, given that wind speed = 5 Km/hr. This other probabilit is called the posterior probabilit, and the laws of probabilit impl that it is given b p i L i () P i () = p L () + p L (), where p, p are called prior (or climatological) probabilities for the two classes. The are estimated as N /N, and N /N, respectivel, where N i is just the sample size of the i th class, and N = N + N is the total sample size. Note that p + p = and P + P =. Then, when we observe some value of, if P () > P (), then we forecast (or classif) it as a (i.e., a tornado). Actuall, instead of P () > P (), it s more convenient to look at log(p ()/P ()) >, because we can then write the left-hand side as log(p ()/P ()) = D (), where the so-called discriminant function, D (), is given b D () = ( σ ) ( µ σ σ µ ) + ( µ σ σ µ ) + log( σ ) log( p ). σ σ p This equation follows in a straightforward wa b just combining the above equations (homework). Again, an observation from either training set or validation set is forecast as (or assigned to) class (e.g., tornado), if D () >, otherwise it is classified (forecast) as a (e.g., non-tornado). In spite of its simplicit, this is a ver powerful classification model. But as seen from the last equation, the best it can do is to handle quadratic (i.e., ) decision boundaries. It will fail if the boundar underling our data is more nonlinear than that. Of course, the source of this limitation is the starting assumption - normalit. I urge ou to consider using this model, because ou ma be surprised b how well it does. But even if ou never see or hear about this model again, one lesson that ou should not forget is that the distribution of the data is related to the nonlinearit of the boundar. In the above eample, if the distributions are gaussian, then the boundar is quadratic. As a special case, if the distributions are gaussian and equivariant (i.e., σ = σ ), then the boundar is linear. All of this should be reminding ou of the close relationship between gaussian noise and mean square error in linear regression (previous lecture).
3 The following figures show all the above ingredients in both the linear and nonlinear case. Just stud it a bit; I ll go over it during the lecture. It will even be in color then L L 4 L Disc. func. Disc. func. L P P P P NN Discriminant Analsis provides a stepping stone for moving on to classification NNs. First, notice that P () can be re-written in the form (homework) P () =. + e discrim. func. But this is the form of the logistic function discussed previousl. In fact, for the case where the discriminant function is linear, it is eactl the logistic function. And that is nothing but an NN with no hidden nodes. So, ou can see that an NN with hidden nodes - H N in (, ω, H) = g ω i f( ω ij j θ j ) ω, i= is a generalization of Discriminant Analsis. Then, ou should not be surprised to find out that an NN with enough hidden nodes can fit an nonlinear boundar. Additionall, ou should not be surprised to find out that the output of an NN represents posterior probabilit. To strengthen the probabilistic interpretation of NN outputs, in a beautiful paper, Richard & Lippmann (Neural Computation,, 46-8, 99) showed that Neural network classifiers estimate Baesian a- posteriori probabilities. In fact, that s the title of their paper. The eact statement is this: If the activation functions f and g are logistic, then the minimization of cross-entrop - where the target takes values t =,, guarantees j= E = N [t log (, ω) + ( t) log( (, ω))], (, ω) = P () (= prob(c = )). This is a ver important result because it tells ou what ou need to do in order to have an NN that makes probabilistic forecasts. Again, this result should remind ou of the analogous regression result - that the minimization of MSE implies that the NN outputs will be conditional averages: (, ω) =< t >. So, whatever NN simulation ou emplo to perform classification, instruct it to use cross-entrop as the error function, and to use logistic function for all the activation functions. B the wa, for more
4 than two classes, the generalization of the logistic activation function that preserves the probabilistic interpretation of the outputs is called the softma activation function. Method As I said, we still have to worr about all the things we worried about in the regression lecture: Weightdeca, local minima, bootstrap (or something like it), etc. Then, fill in the big table in the previous lecture with different Seed and Bootstrap Trials. The table will allow ou not onl to identif the optimal number of hidden nodes, but also to place error bounds on the performance of the NN. One difference with the regression NN is that the choice of the error function is a little less ad hoc, in the sense that onl cross-entrop (with logistic activation functions) will ield probabilistic outputs. Another difference is that ou ma have to worr about the representation of the different classes in the training set. For eample, if one of the classes is drasticall under-represented in the training set (compared with the other classes), then the NN ma just treat the cases in that class as noise. It is this concern that often prompts some people to balance the classes in the training set. This is actuall not recommended, because the outputs will no longer represent posterior probabilities. For the outputs to have that probabilistic interpretation, it is crucial for the different classes in the training set to be represented eactl proportional to their respective prior (or climatological) probabilities. So, it s better not t dabble with the class sizes. Having said that, there will be times that the above concern will in fact be realized. In that case, it is possible to practice that balancing act, if ou assure that the outputs are scaled in such a wa to resurrect their probabilistic interpretation. In particular, ou would have to multipl each output node b the prior probabilit of the class designated b that output node, and then renormalize to assure that all the probabilities add up to. One final difference is in assessing the performance of a classification NN, especiall if the outputs are arranged to be posterior probabilities. Because then ou are dealing with probabilistic forecasts, and if ou are a meteorologist ou probabl know that the verification of probabilistic forecasts is a topic that requires a whole book. I ll discuss some of the important concepts below, and in m net lecture. But at least check out the following two classic papers b Murph, A. H., and R. L. Winkler: Int. J. Forecasting, 7, (99), and Mon. Wea. Rev., 5, -8 (987). Wilks book is good to have as well (Wilks, D. S., 995: Statistical Methods in the Atmospheric Sciences. Academic Press, NY.) 4
5 Overfitting In practice, we still have to worr about all the nuisances we talked about for the regression NN. Overfitting (or the optimal number of hidden nodes) is the first concern. The following figures illustrate overfitting at work. I have concocted a nois version of the classification eample I showed in the beginning of the lecture. The boundar is the same inverted Meican hat as before (labeled with light circles), but it s nois in the sense that a few data points are on the wrong side of the boundar. Instead of and, I ve used filled and unfilled triangles. So, some filled triangles are above the boundar and some unfilled triangles are below the boundar. I ve trained a series of NNs with different number of hidden nodes: H =, 4, 6, 8,, and 5. The corresponding decision boundar is shown with the dark line. You can see that the H = NN underfits the boundar. H = 4 does better, but H = 6 is alread overfitting in the sense that it thinks the single filled triangle at the top of the upper region is a real data point, rather than noise, which it clearl is. Larger NNs, with H = 8,, and 5, just continue to overfit even more. So, H = 4 is optimal. H = H = H = 6 H = H = H =
6 Practical Application - Hail Size 5 Frequenc Hail Size (mm) Treating hail size as a continuous quantit, we developed an NN that predicts hail size. Given the peaks in the distribution of hail size (above), it is natural to develop an NN that predicts the size class to which a set of input values belong. In fact, it would be useful to find the probabilit of getting hail size of a certain class? The reason for the three peaks in the distribution is mostl human related. It turns out there are two tpes of meteorologists - those who report hail size in millimeters (or hundredth of an inch), and those who classif hail as being coin size, golf ball size, or baseball size. The preprocessing for a classification NN is mostl the same as for a regression NN. Ecluding collinear inputs, removing outliers, transforming to z-scores, etc. One difference is that consideration has to be given to the class. For instance, instead of eamining the distribution of input variables (like we did in the regression NN), we must look at the class-conditional distribution of the input variables. That means that ou must plot the distribution/histogram of the input variables separatel for each class. The figure below shows this for one of the input variables, for the classes of hail size. As epected, the distributions are bell-shaped, and shift to the right as the size of the corresponding class increases. 5 Frequenc Input Variable 6
7 Similarl, in identifing collinear inputs, instead of calculating the r of two inputs, ou should look at the r of two inputs for each class separatel. It s possible that two inputs are highl collinear when the belong to one class, but not at all collinear in another class. Clearl, ou would not want to eclude a member of such a pair, at least not based on their collinearit. Going through the previousl outlined method, we find the following optimal values: The number of inputs nodes = 9, and the number of hidden nodes = 4. The number of output nodes is a bit trick; it depends on how we choose to encode the class information in the target nodes. For instance, it s possible to have one output node, with values,.5, and representing the classes. And if the three classes have an order associated with them, then that s probabl fine. A better wa - and one that preserves the probabilistic interpretation of the outputs - is to use what s called -of-c encoding. In that scheme, there is one output node for each class, and the larger one designates the class. During training, for class cases, the three output nodes are assigned the values (,,) as their target values; class cases are assigned (,,) as their corresponding output nodes, etc. Performance This section is a little long, so make ourself comfortable. And, no, it could not have been placed as an appendi because its content is crucial for properl assessing the performance of an NN. Performance measures come in a variet of tpes, depending on the tpe of forecast (and observation). For eample, if the forecasts are categorical, then one ma have a scalar measure of performance, or a multidimensional measure (like a diagram). Similarl, probabilistic forecasts can be assessed in terms of scalars and/or diagrams. The following table lists an eample of each tpe. Scalar Multi-dimensional Categorical Equitable Threat Score ROC Diagram Probabilistic Ranked Probabilit Score Attributes Diagram ROC stands for Receiver s Operating Characteristic, and will be defined below. Needless to sa, multidimensional measures carr more information than scalar ones; similarl, probabilistic ones contain more information than categorical ones. Then again, the all have their place in the world. At one etreme we have categorical forecasts and observations, like hail classes being classified (forecast) into classes. The starting point for assessing the performance for such a classifier is the contingenc table: C-table = n n n n n n n n n where n is the number of class cases (incorrectl) classified as belonging to class. Etc. In m (nonuniversal) convention, the rows correspond to observations, and the columns represent forecasts. There is an infinit of was to construct scalar measures from the contingenc table, but almost no multidimensional measures. B contrast there are numerous multidimensional measures that can be constructed from a contingenc table. For that reason, the problem of classifing three classes - labeled,, and - is broken down to three -class problems: classification of class or otherwise, class or otherwise, and class or otherwise. Although this treatment of a -class problem can lead to some loss of information, the loss is partiall compensated b the availabilit of multidimensional measures. Then, performance can be represented with three contingenc tables: ( ) n n C-table =, n n 7,
8 A couple of common measures derived from this table are are the Probabilit of Detection and False Alarm Rate (not Ratio - that s something else). If the class we are tring to detect is the one labeled, then POD = n n + n, FAR = n n + n. Note that the total number of class cases is given b N = n + n, and that of class cases is N = n + n. The total sample size is N = N + N. Even though, our NN produces probabilistic forecasts, it is possible to categorize the forecasts b thresholding them; ou simpl pick a number between and (because probabilit lives in that range) and then sa that an forecast less than that number belongs to class, and an forecast larger than the threshold belongs to class. In this wa, one can reduce a continuous quantit like probabilit into a categorical one. Then, the question is which threshold to pick. The ROC diagram handles that problem. In fact, an ROC diagram is a parametric plot of POD versus FAR, as the threshold is varied from to. So, ou pick a threshold, but ou don t have to stick to it; ou var it from to in some increments, calculating POD and FAR at each increment, and then plotting them on an - plot. It is eas to show that a classifier with no abilit to discriminate between two classes ields a diagonal line of slope one and intercept zero; otherwise it bows above the diagonal line. (The area under the curve is often taken as a scalar measure of the classifier s performance, and so, a perfect classifier would have an area of under its ROC curve, while a random classifier would have an area of.5. In addition to its multidimensionalit, another virtue of a ROC diagram is its abilit to epress performance without a specific reference to a unique threshold. A specific choice of the threshold calls for knowledge of the costs of misclassification which are user dependent. In this wa, a ROC diagram offers a user-independent assessment of performance. For the Hail problem, the ROC plots are shown below. The left diagram is for the training set and the right one is for the validation set. (The error bars are the standard error from the bootstrap trials.) It can be seen that performance of class and class forecasts is quite high (given b the amount of bowing awa from the diagonal). However, class forecasts are not doing too well, since according to the validation ROC plot, the are almost as good as random (evidenced b the overlap of the error bars and the diagonal line). So, the NN does good in predicting the largest and the smallest of the hail classes (in that order), but not too well with the midsize class. If ou think about it, it makes sense that the midsize class would be the hardest one to discriminate from the others..8.8 Probabilit of Detection.6.4 Probabilit of Detection False Alarm Rate False Alarm Rate Of course, it is possible to assess the qualit of the forecasts without categorizing the forecasts. Murph and Winkler (referenced, above) have shown us how to do that. The show that the various aspects of the qualit of probabilistic forecasts can be captured b three diagrams - reliabilit (or calibration), refinement, and discrimination diagrams. (A slightl improved version of the reliabilit diagram is an 8
9 attributes diagram; it shows whether or not a given forecast actuall contributes to the skill of the classifier). These are all computed from the joint probabilit of forecasts and observations, but I ll give a simpler description here. I ll return to the joint distribution in the net lecture. A reliabilit diagram displas the etent to which the frequenc of an event at a given forecast probabilit matches the actual forecast probabilit. To construct it, take all the cases that have a forecast of %. Mark this number on the -ais. What fraction of the whole data is the number of cases in this set? Mark that answer on the -ais. We now have one point on the - plane, which is hopefull somewhere near the diagonal, because a % probabilit is supposed to mean that % of the time we get that outcome. Then repeat this for forecasts that are %, %,..., 9%. All of the resulting points should lie somewhere near the diagonal; the closer to the diagonal, the more reliable the forecasts. A refinement diagram displas the sharpness of the forecasts, i.e., the etent to which ver high or ver low probabilities are issued. To construct it, just plot the histogram of all the forecasts. There are man was in which a classifier can be poor in terms of refinement. For eample, a classifier that produces onl two forecasts, sa % and 9%, has poor refinement. A good classifier would produce a range of all forecasts (between and %). I put reliabilit and refinement plots on the same diagram (below). Again, the error bars are the standard error from the bootstrap trials. The diagonal line corresponds to perfect reliabilit. We can see that within the error bars the NN produces forecasts that are perfectl reliable for all three classes. The curve (blue) not along the diagonal is the refinement plot. The class forecasts have a ver reasonable refinement plot in that the are relativel flat in the full range of to %. The refinement plot for class forecasts suggests that the never eceed 6%; below 6% the are reasonabl refined. Class forecasts are reasonabl refinement as well; the peak at % reflects the fact that class cases are rare. The shaded regions are what make these attributes diagrams. See Wilks book (referenced above) for details..8.8 Fraction Observed.6.4 Fraction Observed Class Forecast Probabilit Class Forecast Probabilit.8 Fraction Observed Class Forecast Probabilit 9
10 A discrimination plot ehibits the etent to which the forecasts discriminate between the classes. To construct it, plot the class-conditional distributions (or histograms) of the forecasts. The top figure (below) is almost a tet-book case of what a good classifier s discrimination plot should look like. When class cases are presented to the NN, it produces class forecasts that peak at higher probabilities. At the same time, if class cases are presented to the NN, the class forecasts are mostl in the lower probabilities. Class cases occup the middle ground. Skipping to the bottom diagram, the behavior is almost the complement of that seen in the top diagram. Class forecasts are more frequent when class cases are shown, etc. Finall, in the middle diagram, it can be seen that classes and are reasonabl discriminated b class forecasts. Same with class and. But classes and are not at all discriminated b class forecasts. This, again, points to the problem the NN is having in classifing class cases Frequenc Class Forecast Probabilit Frequenc Class Forecast Probabilit.8.6 Frequenc Class Forecast Probabilit Conclusion We are done. M goal was to teach ou about regression and classification as done with NNs. Although I have covered onl the tip of the iceberg, it should give ou enough to pla with and to get our hands dirt. In the process, we developed a number of NNs for estimating hail size from some measurable attributes. We even ended-up using the same data set for developing both a regression and a classification NN for that task. Together the provide an output like this: Forecast hail size = 4mm, Prob. of coinsize hail = 8%, Prob. of golf ball-size hail = 5%, Prob. of baseball-size hail = 5%. M other goal was to address the issue of performance (or verification). In m eperience, performance does not get nearl the attention it deserves. After all, what s the point of developing a high-powered NN if we don t have a proper wa of assessing if it is indeed high-powered? As if the performance assessment of categorical forecasts is not comple enough, probabilistic forecasts make the business even more comple. The source of the compleit is reall the multidimensionalit of performance itself. But don t loose hope; the situation is relativel under control, and a little reading in the literature should confirm that. Just remember that because of the multidimensionalit of performance, it s better not to quantif performance too much, because quantification usuall requires distillation, which in turn implies loss of information.
INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationINF Anne Solberg One of the most challenging topics in image analysis is recognizing a specific object in an image.
INF 4300 700 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter -6 6inDuda and Hart: attern Classification 303 INF 4300 Introduction to classification One of the most challenging
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More informationBiostatistics in Research Practice - Regression I
Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.
More informationCh 3 Alg 2 Note Sheet.doc 3.1 Graphing Systems of Equations
Ch 3 Alg Note Sheet.doc 3.1 Graphing Sstems of Equations Sstems of Linear Equations A sstem of equations is a set of two or more equations that use the same variables. If the graph of each equation =.4
More informationCSE 546 Midterm Exam, Fall 2014
CSE 546 Midterm Eam, Fall 2014 1. Personal info: Name: UW NetID: Student ID: 2. There should be 14 numbered pages in this eam (including this cover sheet). 3. You can use an material ou brought: an book,
More informationLab 5 Forces Part 1. Physics 211 Lab. You will be using Newton s 2 nd Law to help you examine the nature of these forces.
b Lab 5 Forces Part 1 Phsics 211 Lab Introduction This is the first week of a two part lab that deals with forces and related concepts. A force is a push or a pull on an object that can be caused b a variet
More informationWe have examined power functions like f (x) = x 2. Interchanging x
CHAPTER 5 Eponential and Logarithmic Functions We have eamined power functions like f =. Interchanging and ields a different function f =. This new function is radicall different from a power function
More informationMathematics 309 Conic sections and their applicationsn. Chapter 2. Quadric figures. ai,j x i x j + b i x i + c =0. 1. Coordinate changes
Mathematics 309 Conic sections and their applicationsn Chapter 2. Quadric figures In this chapter want to outline quickl how to decide what figure associated in 2D and 3D to quadratic equations look like.
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More information74 Maths Quest 10 for Victoria
Linear graphs Maria is working in the kitchen making some high energ biscuits using peanuts and chocolate chips. She wants to make less than g of biscuits but wants the biscuits to contain at least 8 g
More information3.7 InveRSe FUnCTIOnS
CHAPTER functions learning ObjeCTIveS In this section, ou will: Verif inverse functions. Determine the domain and range of an inverse function, and restrict the domain of a function to make it one-to-one.
More informationFeedforward Neural Networks
Chapter 4 Feedforward Neural Networks 4. Motivation Let s start with our logistic regression model from before: P(k d) = softma k =k ( λ(k ) + w d λ(k, w) ). (4.) Recall that this model gives us a lot
More informationEDEXCEL ANALYTICAL METHODS FOR ENGINEERS H1 UNIT 2 - NQF LEVEL 4 OUTCOME 4 - STATISTICS AND PROBABILITY TUTORIAL 3 LINEAR REGRESSION
EDEXCEL AALYTICAL METHODS FOR EGIEERS H1 UIT - QF LEVEL 4 OUTCOME 4 - STATISTICS AD PROBABILITY TUTORIAL 3 LIEAR REGRESSIO Tabular and graphical form: data collection methods; histograms; bar charts; line
More informationFunctions. Introduction
Functions,00 P,000 00 0 70 7 80 8 0 000 00 00 Figure Standard and Poor s Inde with dividends reinvested (credit "bull": modification of work b Praitno Hadinata; credit "graph": modification of work b MeasuringWorth)
More information4.7. Newton s Method. Procedure for Newton s Method HISTORICAL BIOGRAPHY
4. Newton s Method 99 4. Newton s Method HISTORICAL BIOGRAPHY Niels Henrik Abel (18 189) One of the basic problems of mathematics is solving equations. Using the quadratic root formula, we know how to
More information5. Zeros. We deduce that the graph crosses the x-axis at the points x = 0, 1, 2 and 4, and nowhere else. And that s exactly what we see in the graph.
. Zeros Eample 1. At the right we have drawn the graph of the polnomial = ( 1) ( 2) ( 4). Argue that the form of the algebraic formula allows ou to see right awa where the graph is above the -ais, where
More informationf(x) = 2x 2 + 2x - 4
4-1 Graphing Quadratic Functions What You ll Learn Scan the tet under the Now heading. List two things ou will learn about in the lesson. 1. Active Vocabular 2. New Vocabular Label each bo with the terms
More informationLecture 5. Equations of Lines and Planes. Dan Nichols MATH 233, Spring 2018 University of Massachusetts.
Lecture 5 Equations of Lines and Planes Dan Nichols nichols@math.umass.edu MATH 233, Spring 2018 Universit of Massachusetts Februar 6, 2018 (2) Upcoming midterm eam First midterm: Wednesda Feb. 21, 7:00-9:00
More information14.1 Systems of Linear Equations in Two Variables
86 Chapter 1 Sstems of Equations and Matrices 1.1 Sstems of Linear Equations in Two Variables Use the method of substitution to solve sstems of equations in two variables. Use the method of elimination
More informationModule 3, Section 4 Analytic Geometry II
Principles of Mathematics 11 Section, Introduction 01 Introduction, Section Analtic Geometr II As the lesson titles show, this section etends what ou have learned about Analtic Geometr to several related
More informationMath 123 Summary of Important Algebra & Trigonometry Concepts Chapter 1 & Appendix D, Stewart, Calculus Early Transcendentals
Math Summar of Important Algebra & Trigonometr Concepts Chapter & Appendi D, Stewart, Calculus Earl Transcendentals Function a rule that assigns to each element in a set D eactl one element, called f (
More information1 GSW Gaussian Elimination
Gaussian elimination is probabl the simplest technique for solving a set of simultaneous linear equations, such as: = A x + A x + A x +... + A x,,,, n n = A x + A x + A x +... + A x,,,, n n... m = Am,x
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More information5.6 RATIOnAl FUnCTIOnS. Using Arrow notation. learning ObjeCTIveS
CHAPTER PolNomiAl ANd rational functions learning ObjeCTIveS In this section, ou will: Use arrow notation. Solve applied problems involving rational functions. Find the domains of rational functions. Identif
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if
More informationECE521 Lecture7. Logistic Regression
ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard
More informationEvaluating Forecast Quality
Evaluating Forecast Quality Simon J. Mason International Research Institute for Climate Prediction Questions How do we decide whether a forecast was correct? How do we decide whether a set of forecasts
More informationIn order to take a closer look at what I m talking about, grab a sheet of graph paper and graph: y = x 2 We ll come back to that graph in a minute.
Module 7: Conics Lesson Notes Part : Parabolas Parabola- The parabola is the net conic section we ll eamine. We talked about parabolas a little bit in our section on quadratics. Here, we eamine them more
More information11.4 Polar Coordinates
11. Polar Coordinates 917 11. Polar Coordinates In Section 1.1, we introduced the Cartesian coordinates of a point in the plane as a means of assigning ordered pairs of numbers to points in the plane.
More informationLESSON #48 - INTEGER EXPONENTS COMMON CORE ALGEBRA II
LESSON #8 - INTEGER EXPONENTS COMMON CORE ALGEBRA II We just finished our review of linear functions. Linear functions are those that grow b equal differences for equal intervals. In this unit we will
More informationMachine Learning Foundations
Machine Learning Foundations ( 機器學習基石 ) Lecture 13: Hazard of Overfitting Hsuan-Tien Lin ( 林軒田 ) htlin@csie.ntu.edu.tw Department of Computer Science & Information Engineering National Taiwan Universit
More informationLab 5 Forces Part 1. Physics 225 Lab. You will be using Newton s 2 nd Law to help you examine the nature of these forces.
b Lab 5 orces Part 1 Introduction his is the first week of a two part lab that deals with forces and related concepts. A force is a push or a pull on an object that can be caused b a variet of reasons.
More informationSection 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y
Paul J. Bruillard MATH 0.970 Problem Set 6 An Introduction to Abstract Mathematics R. Bond and W. Keane Section 3.1: 3b,c,e,i, 4bd, 6, 9, 15, 16, 18c,e, 19a, 0, 1b Section 3.: 1f,i, e, 6, 1e,f,h, 13e,
More informationLinear Equations and Arithmetic Sequences
CONDENSED LESSON.1 Linear Equations and Arithmetic Sequences In this lesson, ou Write eplicit formulas for arithmetic sequences Write linear equations in intercept form You learned about recursive formulas
More informationCONTINUOUS SPATIAL DATA ANALYSIS
CONTINUOUS SPATIAL DATA ANALSIS 1. Overview of Spatial Stochastic Processes The ke difference between continuous spatial data and point patterns is that there is now assumed to be a meaningful value, s
More informationCopyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS
Copright, 8, RE Kass, EN Brown, and U Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS Chapter 6 Random Vectors and Multivariate Distributions 6 Random Vectors In Section?? we etended
More informationLinear regression Class 25, Jeremy Orloff and Jonathan Bloom
1 Learning Goals Linear regression Class 25, 18.05 Jerem Orloff and Jonathan Bloom 1. Be able to use the method of least squares to fit a line to bivariate data. 2. Be able to give a formula for the total
More information8.7 Systems of Non-Linear Equations and Inequalities
8.7 Sstems of Non-Linear Equations and Inequalities 67 8.7 Sstems of Non-Linear Equations and Inequalities In this section, we stud sstems of non-linear equations and inequalities. Unlike the sstems of
More informationMathematics. Mathematics 1. hsn.uk.net. Higher HSN21000
Higher Mathematics UNIT Mathematics HSN000 This document was produced speciall for the HSN.uk.net website, and we require that an copies or derivative works attribute the work to Higher Still Notes. For
More information1 History of statistical/machine learning. 2 Supervised learning. 3 Two approaches to supervised learning. 4 The general learning procedure
Overview Breiman L (2001). Statistical modeling: The two cultures Statistical learning reading group Aleander L 1 Histor of statistical/machine learning 2 Supervised learning 3 Two approaches to supervised
More informationLesson 29 MA Nick Egbert
Lesson 9 MA 16 Nick Egbert Overview In this lesson we build on the previous two b complicating our domains of integration and discussing the average value of functions of two variables. Lesson So far the
More informationResearch Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.
Research Design - - Topic 15a Introduction to Multivariate Analses 009 R.C. Gardner, Ph.D. Major Characteristics of Multivariate Procedures Overview of Multivariate Techniques Bivariate Regression and
More informationChapter 6. Exploring Data: Relationships
Chapter 6 Exploring Data: Relationships For All Practical Purposes: Effective Teaching A characteristic of an effective instructor is fairness and consistenc in grading and evaluating student performance.
More information1.6 ELECTRONIC STRUCTURE OF THE HYDROGEN ATOM
1.6 ELECTRONIC STRUCTURE OF THE HYDROGEN ATOM 23 How does this wave-particle dualit require us to alter our thinking about the electron? In our everda lives, we re accustomed to a deterministic world.
More informationThe Force Table Introduction: Theory:
1 The Force Table Introduction: "The Force Table" is a simple tool for demonstrating Newton s First Law and the vector nature of forces. This tool is based on the principle of equilibrium. An object is
More informationDIFFERENTIAL EQUATIONS First Order Differential Equations. Paul Dawkins
DIFFERENTIAL EQUATIONS First Order Paul Dawkins Table of Contents Preface... First Order... 3 Introduction... 3 Linear... 4 Separable... 7 Eact... 8 Bernoulli... 39 Substitutions... 46 Intervals of Validit...
More informationUnit 2 Notes Packet on Quadratic Functions and Factoring
Name: Period: Unit Notes Packet on Quadratic Functions and Factoring Notes #: Graphing quadratic equations in standard form, verte form, and intercept form. A. Intro to Graphs of Quadratic Equations: a
More informationPerformance Evaluation
Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,
More informationL20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs
L0: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU
More informationFitting a Straight Line to Data
Fitting a Straight Line to Data Thanks for your patience. Finally we ll take a shot at real data! The data set in question is baryonic Tully-Fisher data from http://astroweb.cwru.edu/sparc/btfr Lelli2016a.mrt,
More informationTopic 3 Notes Jeremy Orloff
Topic 3 Notes Jerem Orloff 3 Line integrals and auch s theorem 3.1 Introduction The basic theme here is that comple line integrals will mirror much of what we ve seen for multivariable calculus line integrals.
More information10.2 The Unit Circle: Cosine and Sine
0. The Unit Circle: Cosine and Sine 77 0. The Unit Circle: Cosine and Sine In Section 0.., we introduced circular motion and derived a formula which describes the linear velocit of an object moving on
More informationPolynomial and Rational Functions
Polnomial and Rational Functions Figure -mm film, once the standard for capturing photographic images, has been made largel obsolete b digital photograph. (credit film : modification of work b Horia Varlan;
More informationf x, y x 2 y 2 2x 6y 14. Then
SECTION 11.7 MAXIMUM AND MINIMUM VALUES 645 absolute minimum FIGURE 1 local maimum local minimum absolute maimum Look at the hills and valles in the graph of f shown in Figure 1. There are two points a,
More informationPhysics Gravitational force. 2. Strong or color force. 3. Electroweak force
Phsics 360 Notes on Griffths - pluses and minuses No tetbook is perfect, and Griffithsisnoeception. Themajorplusisthat it is prett readable. For minuses, see below. Much of what G sas about the del operator
More informationUnit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents
Unit NOTES Honors Common Core Math Da : Properties of Eponents Warm-Up: Before we begin toda s lesson, how much do ou remember about eponents? Use epanded form to write the rules for the eponents. OBJECTIVE
More informationDependence and scatter-plots. MVE-495: Lecture 4 Correlation and Regression
Dependence and scatter-plots MVE-495: Lecture 4 Correlation and Regression It is common for two or more quantitative variables to be measured on the same individuals. Then it is useful to consider what
More informationData Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture - 17 K - Nearest Neighbor I Welcome to our discussion on the classification
More informationHST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007
MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationt s time we revisit our friend, the equation of a line: y = mx + b
CH PARALLEL AND PERPENDICULAR LINES Introduction I t s time we revisit our friend, the equation of a line: mx + b SLOPE -INTERCEPT To be precise, b is not the -intercept; b is the -coordinate of the -intercept.
More informationPolynomial and Rational Functions
Polnomial and Rational Functions Figure -mm film, once the standard for capturing photographic images, has been made largel obsolete b digital photograph. (credit film : modification of ork b Horia Varlan;
More informationChapter 18 Quadratic Function 2
Chapter 18 Quadratic Function Completed Square Form 1 Consider this special set of numbers - the square numbers or the set of perfect squares. 4 = = 9 = 3 = 16 = 4 = 5 = 5 = Numbers like 5, 11, 15 are
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationLESSON #28 - POWER FUNCTIONS COMMON CORE ALGEBRA II
1 LESSON #8 - POWER FUNCTIONS COMMON CORE ALGEBRA II Before we start to analze polnomials of degree higher than two (quadratics), we first will look at ver simple functions known as power functions. The
More informationOverview. IAML: Linear Regression. Examples of regression problems. The Regression Problem
3 / 38 Overview 4 / 38 IAML: Linear Regression Nigel Goddard School of Informatics Semester The linear model Fitting the linear model to data Probabilistic interpretation of the error function Eamples
More informationLaurie s Notes. Overview of Section 3.5
Overview of Section.5 Introduction Sstems of linear equations were solved in Algebra using substitution, elimination, and graphing. These same techniques are applied to nonlinear sstems in this lesson.
More information1.2 Relations. 20 Relations and Functions
0 Relations and Functions. Relations From one point of view, all of Precalculus can be thought of as studing sets of points in the plane. With the Cartesian Plane now fresh in our memor we can discuss
More informationCoordinate geometry. + bx + c. Vertical asymptote. Sketch graphs of hyperbolas (including asymptotic behaviour) from the general
A Sketch graphs of = a m b n c where m = or and n = or B Reciprocal graphs C Graphs of circles and ellipses D Graphs of hperbolas E Partial fractions F Sketch graphs using partial fractions Coordinate
More informationUnit 10 - Graphing Quadratic Functions
Unit - Graphing Quadratic Functions PREREQUISITE SKILLS: students should be able to add, subtract and multipl polnomials students should be able to factor polnomials students should be able to identif
More informationChapter 1 Review of Equations and Inequalities
Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve
More informationF6 Solving Inequalities
UNIT F6 Solving Inequalities: Tet F6 Solving Inequalities F6. Inequalities on a Number Line An inequalit involves one of the four smbols >,, < or The following statements illustrate the meaning of each
More informationFunctions. Introduction
Functions,00 P,000 00 0 970 97 980 98 990 99 000 00 00 Figure Standard and Poor s Inde with dividends reinvested (credit "bull": modification of work b Praitno Hadinata; credit "graph": modification of
More informationLESSON #42 - INVERSES OF FUNCTIONS AND FUNCTION NOTATION PART 2 COMMON CORE ALGEBRA II
LESSON #4 - INVERSES OF FUNCTIONS AND FUNCTION NOTATION PART COMMON CORE ALGEBRA II You will recall from unit 1 that in order to find the inverse of a function, ou must switch and and solve for. Also,
More informationMethods for Advanced Mathematics (C3) Coursework Numerical Methods
Woodhouse College 0 Page Introduction... 3 Terminolog... 3 Activit... 4 Wh use numerical methods?... Change of sign... Activit... 6 Interval Bisection... 7 Decimal Search... 8 Coursework Requirements on
More informationSection 1.2: Relations, from College Algebra: Corrected Edition by Carl Stitz, Ph.D. and Jeff Zeager, Ph.D. is available under a Creative Commons
Section.: Relations, from College Algebra: Corrected Edition b Carl Stitz, Ph.D. and Jeff Zeager, Ph.D. is available under a Creative Commons Attribution-NonCommercial-ShareAlike.0 license. 0, Carl Stitz.
More informationCubic and quartic functions
3 Cubic and quartic functions 3A Epanding 3B Long division of polnomials 3C Polnomial values 3D The remainder and factor theorems 3E Factorising polnomials 3F Sum and difference of two cubes 3G Solving
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationIntroduction. So, why did I even bother to write this?
Introduction This review was originally written for my Calculus I class, but it should be accessible to anyone needing a review in some basic algebra and trig topics. The review contains the occasional
More informationReview Topics for MATH 1400 Elements of Calculus Table of Contents
Math 1400 - Mano Table of Contents - Review - page 1 of 2 Review Topics for MATH 1400 Elements of Calculus Table of Contents MATH 1400 Elements of Calculus is one of the Marquette Core Courses for Mathematical
More informationMathematical Notation Math Introduction to Applied Statistics
Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and can be printed and given to the
More information6. This sum can be rewritten as 4( ). We then recall the formula n =
. c = 9b = 3 b = 3 a 3 = a = = 6.. (3,, ) = 3 + + 3 = 9 + + 3 = 6 6. 3. We see that this is equal to. 3 = ( +.) 3. Using the fact that (x + ) 3 = x 3 + 3x + 3x + and replacing x with., we find that. 3
More informationChapter 4 Analytic Trigonometry
Analtic Trigonometr Chapter Analtic Trigonometr Inverse Trigonometric Functions The trigonometric functions act as an operator on the variable (angle, resulting in an output value Suppose this process
More information4 The Cartesian Coordinate System- Pictures of Equations
The Cartesian Coordinate Sstem- Pictures of Equations Concepts: The Cartesian Coordinate Sstem Graphs of Equations in Two Variables -intercepts and -intercepts Distance in Two Dimensions and the Pthagorean
More informationAutonomous Equations / Stability of Equilibrium Solutions. y = f (y).
Autonomous Equations / Stabilit of Equilibrium Solutions First order autonomous equations, Equilibrium solutions, Stabilit, Longterm behavior of solutions, direction fields, Population dnamics and logistic
More informationThe American School of Marrakesh. Algebra 2 Algebra 2 Summer Preparation Packet
The American School of Marrakesh Algebra Algebra Summer Preparation Packet Summer 016 Algebra Summer Preparation Packet This summer packet contains eciting math problems designed to ensure our readiness
More informationPolynomial Functions of Higher Degree
SAMPLE CHAPTER. NOT FOR DISTRIBUTION. 4 Polynomial Functions of Higher Degree Polynomial functions of degree greater than 2 can be used to model data such as the annual temperature fluctuations in Daytona
More informationLecture 2. Judging the Performance of Classifiers. Nitin R. Patel
Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only
More informationUnit 3 Notes Mathematical Methods
Unit 3 Notes Mathematical Methods Foundational Knowledge Created b Triumph Tutoring Copright info Copright Triumph Tutoring 07 Triumph Tutoring Pt Ltd ABN 60 607 0 507 First published in 07 All rights
More informationPolynomial and Rational Functions
Polnomial and Rational Functions Figure -mm film, once the standard for capturing photographic images, has been made largel obsolete b digital photograph. (credit film : modification of ork b Horia Varlan;
More information12 Statistical Justifications; the Bias-Variance Decomposition
Statistical Justifications; the Bias-Variance Decomposition 65 12 Statistical Justifications; the Bias-Variance Decomposition STATISTICAL JUSTIFICATIONS FOR REGRESSION [So far, I ve talked about regression
More informationACCUPLACER MATH 0311 OR MATH 0120
The University of Teas at El Paso Tutoring and Learning Center ACCUPLACER MATH 0 OR MATH 00 http://www.academics.utep.edu/tlc MATH 0 OR MATH 00 Page Factoring Factoring Eercises 8 Factoring Answer to Eercises
More informationTangent Lines. Limits 1
Limits Tangent Lines The concept of the tangent line to a circle dates back at least to the earl das of Greek geometr, that is, at least 5 ears. The tangent line to a circle with centre O at a point A
More informationCourse 15 Numbers and Their Properties
Course Numbers and Their Properties KEY Module: Objective: Rules for Eponents and Radicals To practice appling rules for eponents when the eponents are rational numbers Name: Date: Fill in the blanks.
More informationMA 1128: Lecture 08 03/02/2018. Linear Equations from Graphs And Linear Inequalities
MA 1128: Lecture 08 03/02/2018 Linear Equations from Graphs And Linear Inequalities Linear Equations from Graphs Given a line, we would like to be able to come up with an equation for it. I ll go over
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationModel verification / validation A distributions-oriented approach
Model verification / validation A distributions-oriented approach Dr. Christian Ohlwein Hans-Ertel-Centre for Weather Research Meteorological Institute, University of Bonn, Germany Ringvorlesung: Quantitative
More informationMethods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro
Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }
More informationES.182A Topic 36 Notes Jeremy Orloff
ES.82A Topic 36 Notes Jerem Orloff 36 Vector fields and line integrals in the plane 36. Vector analsis We now will begin our stud of the part of 8.2 called vector analsis. This is the stud of vector fields
More information