Statistics for classification

Size: px
Start display at page:

Download "Statistics for classification"

Transcription

1 AstroInformatics

2 Statistics for classification Una rappresentazione utile è la matrice di confusione. L elemento sulla riga i e sulla colonna j è il numero assoluto oppure la percentuale di casi della classe vera i che il classificatore ha classificato nella classe j. Sulla diagonale principale ci sono i casi classificati correttamente. Gli altri sono errori. A B C Totale A ,0% B ,7% C ,2% Totale ,0% Nel training set ci sono 200 casi. Nella classe A ci sono 87 casi: 60 classificati correttamente come A 27 classificati erroneamente, dei quali 14 come B e 13 come C Sulla classe A l accuratezza è 60 / 87 = 69,0%. Sulla classe B è 34 / 60 = 56,7% e sulla classe C è 42 / 53 = 79,2%. L accuratezza media è ( ) / 200 = 136 / 200 = 68,0%. Gli errori sono il 32%, cioè 64 casi su 200. Il valore di questa classificazione dipende non solo dalle percentuali, ma anche dal costo delle singole tipologie di errore. Ad es. se C è la classe che è più importante classificare bene, il risultato è considerabile positivo M. Brescia - Data Mining - lezione 4 2

3 Confusion Matrix A binary classifier has two possible output classes. The response is also known as: o Output variable; o Label; o Target; o Dependent variable. Let's now define the most basic terms: true positives (TP): predicted yes (correct), and they are really correct. true negatives (TN): We predicted no (wrong), and they are really wrong. false positives (FP): We predicted yes, but they are really wrong. false negatives (FN): We predicted no, but they are really correct. M. Brescia - Data Mining - lezione 4 3

4 Classification estimators There is a list of basic rates often computed from a confusion matrix for a binary classifier: Classification accuracy: fraction of patterns (objects) correctly classified, with respect to the total number of objects in the sample; Purity/Completeness: fraction of objects correctly classified, for each class; Contamination: fraction of objects erroneously classified, for each class DICE: The Sorensen Dice index, also known as F1-score, is a statistic used for comparing the similarity of two samples. DICE = 2 X Y X + Y = 2 AB A 2 + B 2 = 2TP 2TP + FP + FN 5 basic quality evaluation criteria, by exploiting the output representation through the confusion matrix M. Brescia - Data Mining - lezione 4 4

5 Classification estimators More in general: Accuracy: Overall, how often is the classifier correct? (TP+TN)/total = (100+50)/165 = 0.91 Misclassification Rate: Overall, how often is it wrong? (FP+FN)/total = (10+5)/165 = 0.09 equivalent to 1 - Accuracy also known as "Error Rate" True Positive Rate: When it's actually yes, how often does it predict yes? TP/actual yes = 100/105 = 0.95 also known as "Sensitivity" or "Recall or Completeness False Positive Rate: When it's actually no, how often does it predict yes? FP/actual no = 10/60 = 0.17 Specificity: When it's actually no, how often does it predict no? TN/actual no = 50/60 = 0.83 equivalent to 1 - FPR Precision (Purity): When it predicts yes, how often is it correct? TP/predicted yes = 100/110 = 0.91 Prevalence: How often does the yes condition actually occur in our sample? actual yes/total = 105/165 = 0.64 M. Brescia - Data Mining - lezione 4 5

6 ROC curve Together with DICE estimator, another useful operator is the ROC curve. ROC (Receiver Operating Characteristics) is a statistical estimator used to assess the predictive power of a binary classifier (Logistic Regression Model). It comes from Signal Theory, but heavily used in Analytics and it is suitable graph that summarizes the performance of a classifier over all possible thresholds. It is generated by plotting the True Positive Rate (y-axis) against the False Positive Rate (x-axis) as you vary the threshold for assigning observations to a given class. M. Brescia - Data Mining - lezione 4 6

7 ROC curve To draw a ROC curve, only TPR and FPR are needed. TPR defines how many correct positive results occur among all positive samples available during the test. FPR, on the other hand, defines how many incorrect positive results occur among all negative samples available during the test. A ROC space is defined by FPR and TPR as x and y axes respectively, which shows relative trade-offs between true positive (benefits) and false positive (costs). Since TPR is equivalent to sensitivity and FPR is equal to 1 specificity, the ROC graph is sometimes called the sensitivity vs (1 specificity) plot. Each prediction result or instance of a confusion matrix represents one point in the ROC space. M. Brescia - Data Mining - lezione 4 7

8 ROC curve The best possible prediction method would yield a point in the upper left corner or coordinate (0,1) of the ROC space, representing 100% sensitivity (no false negatives) and 100% specificity (no false positives). The (0,1) point is also called a perfect classification. A completely random guess would give a point along a diagonal line (the so-called line of no-discrimination) from the left bottom to the top right corners (regardless of the positive and negative base rates). An intuitive example of random guessing is a decision by flipping coins. As the sample size increases, a random classifier's ROC point migrates towards (0.5,0.5). M. Brescia - Data Mining - lezione 4 8

9 ROC classifier estimation in order to compare arbitrary classifiers, the Receiver Operating Characteristic or ROC curve plots may give a quick evaluation on their behavior. The overall effectiveness of the algorithm is measured by the area under the ROC curve, where an area of 1 represents a perfect classification, while an area of.5 indicates a useless result (i.e. like a flipped coin). It is obtained by varying the threshold used to discriminate among classes. If the target labels is the [0,1] range, the ROC plot is built by calculate the couple TPR and FPR for each value of a threshold, i.e. (0, 0.1, 0.15, 0.2, 0.95, 1) and plotting all these results, by describing a curve. The ROC value is therefore the area under that curve. M. Brescia - Data Mining - lezione 4 9

10 Probability Density Function In regression experiments, where the goal is to predict a distribution based on a restricted sample of a true population (KB or Knowledge Base), the usual mechanism is to infer the knowledge acquired on the true sample through a model able to learn the hidden and unknown correlation between data parameter space and the expected output. A typical example in astrophysics is the prediction of the photometric redshift for millions sky objects by learning the hidden correlation between the multi-band photometric fluxes and the spectroscopic redshift (almost precise, thus considered true redshift). The true redshift is usually known for a very limited sample of objects (spectroscopically observed). The advantage to predict the photo-z is that on real cases the photometric catalogues are quite easier to be obtained rather than to use very complex and expensive spectroscopic observation runs and reduction. By forcing a model F to predict a single-point estimation y=f(x) may yield largely inaccurate prediction errors (outliers). While the prediction based on a Probability Density Function PDF(x) may reduce or minimize physical bias (systematic errors) and the occurrence of outliers as well. In other words, PDF improves performance, at the price of a more complex mechanism to infer and calculate the prediction. M. Brescia - Data Mining - lezione 4 10

11 Probability Density Function The importance of PDF is that in most real life problems it is impossible to answer to questions like: Given a distribution function of a random variable X, what is the probability that X is exactly equal to a certain value n? We could better answer to questions like: what is the probability the X is between n-a and n+a? This corresponds to calculate the area under the interval a and +a, e.g. the probability will be an integral value, not a single point!!! There exists a plethora of statistical methods which can produce a PDF for analytical problems approached by traditional models (deterministic/probabilistic models). But in the case of models for which an analytical expression y=f(x) does not exist (such as machine learning models), it is extremely difficult to find a well posed PDF(x), since it is intrinsically complex to split error due to the model itself from error embedded in the data. And, important, a PDF is well-posed only for known (continuous) probability distributions. M. Brescia - Data Mining - lezione 4 11

12 Confidence Statistics As said before, p(z) cannot be verified on a single-point basis. A large sample, however, does support p(z) verification. Let s assume to have a population of observed objects N. For a limited amount of them we know their real nature. For others we applied an estimation model which predicted their nature with a residual uncertainty, i.e. it provides a probability p(z). We want to verify such estimation reliability and accuracy. What we expect? About 1% of the predictions should be quite perfect, i.e. their p(z) extremely close to the real value at least with 99% of confidence level. What happens if such amount occurs for, let say, about 40%? the model estimation suffers of overconfidence, e.g. too much precise prediction with respect to the supported evidence! What happens in the opposite case (i.e. less than 0.5%)? the model is underconfident! In many astrophysical cases, astronomers spend much time to calibrate their measurements, by taking strongly under control the error budget of the observation quality. They remove most of the bias (systematic effects) sources, tune the physical models, increase the statistics of the used samples, comparing with past experience and using empirical (magic) rules of thumbs, etc This means that in most cases the results are overconfident. M. Brescia - Data Mining - lezione 4 12

13 Confidence statistics The key of the idea is the concept of the confidence interval Let s suppose to have a statistical variable X distributed on a population with mean µ and variance σ 2. We want to build a confidence interval for µ at level 1-α based on a simple random sample (x 1 x n ) with dimension n. The quantity 1-α is called confidence level. In practice we want to find an interval of values which contain the true value of statistical estimator µ. First, we have to distinguish the case when the variance is known from that one when it is unknown. M. Brescia - Data Mining - lezione 4 13

14 Confidence Statistics Variance known (rare case in the real world) The sampling mean x ҧ = 1 σ n i=1 n x i (1) is a random variable distributed approximately like a Gaussian N μ, σ2 approximation improves by extending the sample dimension n. σ 2 n measures the precision of the estimator (1). n and this When n σ 2 the (1) is more precise. Hence standard x~n ҧ μ, σ2 n Z = implies that X μ σ x μ ҧ σ 2 n ~N 0,1 we can use the z-score of a normal M. Brescia - Data Mining - lezione 4 14

15 Confidence Statistics Therefore, for each probability value 1-α, we can write: P z α/2 x μ ҧ σ 2 n z α/2 = 1 α (2) Where z α/2 is the quantile of the gaussian distribution of order 1-α/2, i.e. the point leaving a left area under the gaussian equal to 1-α/2. The values of the quantile are usually tabulated for each distribution. M. Brescia - Data Mining - lezione 4 15

16 Confidence Statistics Quantiles of a standard Gaussian distribution. The table reports the quantiles p 0 +p 1 of a distribution N(0,1). Remind that a standard Gaussian is symmetric around zero. Therefore the quantiles with p<0.5 can be obtained by considering -p=-(1-p) (see example below). example To obtain the quantile of a N(0,1) means to calculate P N 0,1 x = = p 0 +p 1 = => find the cross value between 0.90 and => x = 1.96 The symmetric value corresponds to calculate P N 0,1 x = = Therefore x = 1.96 is the quantile and of the distribution N(0,1). M. Brescia - Data Mining - lezione 4 16

17 Confidence Statistics The confidence interval can then be built by expanding the formula P z α/2 x μ ҧ z α/2 = 1 α (2) σ 2 n confidence interval(s) (3) In other words, the probability that the intervals (3) contain the true value of the mean µ of the population is approximately equal to the confidence level 1-α. M. Brescia - Data Mining - lezione 4 17

18 Confidence Statistics The confidence level 1-α indicates the «level» of the coverage given by the confidence intervals (3). In other words, it always exists a residual probability α that the sampling data come from a population with mean outside that intervals. σ Consider that the (3) is centered on the estimate of mean xҧ with a radius equal to z 2 α/2, n which length depends on the desired level of coverage (i.e. depending on the chosen quantile) and on the precision degree of the estimator measured by standard error of the estimate. σ 2 n, which is called We speak about multiple confidence interval(s) because any choice of the quantile determines a different confidence interval. Let s see an example M. Brescia - Data Mining - lezione 4 18

19 Confidence Statistics - Example From an observed image, after reduction, we calculated the absolute magnitudes of the brightest stars present in that sky region. Then we know that these magnitudes are distributed with a variance of σ 2 = 16 squared magnitudes. We want to calculate a confidence interval with confidence level of 95% (~2σ) for the mean of magnitudes. Let s consider 10 stars with absolute magnitudes: 7.36, 11.91, 12.91, 9.77, 5.99, 10.91, 9.57, 11.01, 6.11, We start from the sampling mean and its standard error: σ 2 ഥm = 1 σ 10 i=1 10 m i = and = 16 = Since we fixed a confidence level of 95%, then 1 α = 0.95 and consequently α = 0.05 Therefore the desired quantile is z α/2 = z 0.05/2 = z = 1.96 The radius of the confidence interval is indeed given by z α/2 σ 2 n = = Therefore the confidence interval is [( ), ( )] = [7.2866, ] We have 95% of confidence that the true value of the mean magnitude of the bright stars in that sky region is between 7.29 and M. Brescia - Data Mining - lezione 4 19

20 Confidence Statistics - Example What happens if we increase the confidence level at about 3σ (99.7%)? We start from the sampling mean and its standard error: ഥm = 1 σ 10 i=1 10 m i = and σ2 = 16 = Since now the confidence level is 99.7%, then 1 α = and consequently α = Therefore the desired quantile is z α/2 = z 0.003/2 = z The radius of the confidence interval is indeed given by z α/2 σ 2 n = = Therefore the confidence interval is [( ), ( )] = [5.6041, ] We have 99.7% of confidence that the true value of the mean magnitude of the bright stars in that sky region is between 5.60 and In practice, by increasing the confidence level the radius of the confidence level is increased. This is obvious, since a better confidence implies to enlarge the interval for the true value of the mean estimator M. Brescia - Data Mining - lezione 4 20

21 Confidence Statistics What does it changes if the variance is unknown? In the most frequent cases of the real world, the precise estimate of the variance of a population is difficult (if not impossible) Variance unknown (real world) The formula σ 2 = 1 σ n n 1 i=1 x i xҧ 2 = n n 1 1 σ n i=1 n x 2 i xҧ 2 (4) is the sampling variance corrected by the factor n n 1, due to the fact that, for small samples the sampling variance is a distorted estimate, whose precision increases with the n sample dimension. For large samples 1 and the (4) becomes the standard expression of the variance. n 1 In such cases, to obtain a correct confidence interval for the mean µ of the population, we must consider that the distribution of the random variable ഥx μ follows the t of Student σ 2 n with n-1 Degrees of Freedom (DoF), where n is the dimension of the extracted sample. M. Brescia - Data Mining - lezione 4 21

22 Student T-distribution The Student t-distribution describes small samples drawn from a full population normally distributed. It s useful to evaluate the difference between two sample means, to assess their statistical significance. It occurs whenever it is considered the following random variable: (5) t-student with the variance of the compared sample S 2 = 1 σ N N 1 i=1 X i തX 2 This statistics, if the sample is Gaussian, is the ratio between a normal standard N(0,1) and a Chi-square divided by n-1. The t-student is symmetric like a Gaussian around zero but has «heavier tails» than a normal distribution, i.e. values far from 0 have a higher probability to be extracted than in the case of a standard Gaussian distribution. But such differences decrease by increasing the dimension n of the sample. In the figure, the t-student with degrees of freedom (ν) approximates the Gaussian N(0,1). Gaussian M. Brescia - Data Mining - lezione 4 22

23 Student T-distribution The construction of a confidence interval for the mean estimator is similar to the previous case and here the quantiles play a key role. By taking the quantile of the t- Student (n-1 DoF) distribution of order 1-α/2, defined as t n 1,α/2 the confidence interval will be derived by the usual chain of inequalities: We obtain that it is approximately equal to 1-α the probability that the interval below contains the true value of the mean µ of the population. The radius is usually smaller than the one obtained previously. That s because the sample provides always an estimate of the variance smaller than the previous case. M. Brescia - Data Mining - lezione 4 23

24 Recap of Confidence Statistics Known variance By summarising (under the hypothesis that a population is normally distributed) Unknown variance In many real situations it is preferred to infer an interval of any parameter estimate, rather than a single value. Such interval should indicate also the error associated to the estimate. A confidence interval of any parameter ϴ (such as the mean or the variance) of a population is an interval, bounded by two limits L inf and L sup, with a defined probability (1- α) to contain the true parameter of the population: p L inf < θ < L sup = 1 α where 1-α is the confidence level and α error probability M. Brescia - Data Mining - lezione 4 24

25 PDF statistics As underlined before, p(z) cannot be verified on a single-point basis. A large sample, however, does support p(z) verification. Let s assume to have a population of observed objects N. For a limited amount of them we know their real nature. For others we applied an estimation model which predicted their nature with a residual uncertainty, i.e. it provides a probability p(z). We want to verify such estimation reliability and accuracy. The key concept is the confidence interval (CI) We can analyze the over/under-confidence of our model prediction by checking if any x% of the samples have their true value within their x% CI, y% have the true value within their y% CI, etc. This can be done by calculating and plotting the Empirical Cumulative Distribution Function (ECDF) after having obtained the p(z) for our model (known as posterior probability). M. Brescia - Data Mining - lezione 4 25

26 Empirical Cumulative Distribution Function An empirical cumulative distribution function (CDF) is a non-parametric estimator of the underlying CDF of a random variable. It assigns a probability to each datum, orders the data from smallest to largest in value, and calculates the sum of the assigned probabilities up to and including each datum. The result is a step function that increases at each datum. The ECDF is usually denoted by F x or P X x and is defined as F(x) = n 1 I(x i x) i=1..n I()is the Indicator function I x i x = ቊ 1, x i x 0, x i > x Essentially, to calculate the value of F x at x, simply (1) count the number of data less than or equal to x; (2) divide the number found by the total number n of data in the sample. M. Brescia - Data Mining - lezione 4 26

27 ECDF Empirical Cumulative Distribution Function The ECDF is useful because: it approximates the true CDF well if the sample size (the number of data) is large, and knowing the distribution is helpful for statistical inference; a plot of the ECDF can be visually compared to known CDFs of frequently used distributions to check if the data came from one of those common distributions; it can visually display how fast the CDF increases to 1; hence, it can be useful to get a feel for the data; for example check for over- or under- confidence of any prediction (Wittman et al. 2016). M. Brescia - Data Mining - lezione 4 27

Lecture 4 Discriminant Analysis, k-nearest Neighbors

Lecture 4 Discriminant Analysis, k-nearest Neighbors Lecture 4 Discriminant Analysis, k-nearest Neighbors Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University. Email: fredrik.lindsten@it.uu.se fredrik.lindsten@it.uu.se

More information

Performance Evaluation and Comparison

Performance Evaluation and Comparison Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Cross Validation and Resampling 3 Interval Estimation

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Model Accuracy Measures

Model Accuracy Measures Model Accuracy Measures Master in Bioinformatics UPF 2017-2018 Eduardo Eyras Computational Genomics Pompeu Fabra University - ICREA Barcelona, Spain Variables What we can measure (attributes) Hypotheses

More information

Pointwise Exact Bootstrap Distributions of Cost Curves

Pointwise Exact Bootstrap Distributions of Cost Curves Pointwise Exact Bootstrap Distributions of Cost Curves Charles Dugas and David Gadoury University of Montréal 25th ICML Helsinki July 2008 Dugas, Gadoury (U Montréal) Cost curves July 8, 2008 1 / 24 Outline

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

Performance evaluation of binary classifiers

Performance evaluation of binary classifiers Performance evaluation of binary classifiers Kevin P. Murphy Last updated October 10, 2007 1 ROC curves We frequently design systems to detect events of interest, such as diseases in patients, faces in

More information

Introduction to Supervised Learning. Performance Evaluation

Introduction to Supervised Learning. Performance Evaluation Introduction to Supervised Learning Performance Evaluation Marcelo S. Lauretto Escola de Artes, Ciências e Humanidades, Universidade de São Paulo marcelolauretto@usp.br Lima - Peru Performance Evaluation

More information

Performance Evaluation

Performance Evaluation Performance Evaluation David S. Rosenberg Bloomberg ML EDU October 26, 2017 David S. Rosenberg (Bloomberg ML EDU) October 26, 2017 1 / 36 Baseline Models David S. Rosenberg (Bloomberg ML EDU) October 26,

More information

Performance Evaluation and Hypothesis Testing

Performance Evaluation and Hypothesis Testing Performance Evaluation and Hypothesis Testing 1 Motivation Evaluating the performance of learning systems is important because: Learning systems are usually designed to predict the class of future unlabeled

More information

Stephen Scott.

Stephen Scott. 1 / 35 (Adapted from Ethem Alpaydin and Tom Mitchell) sscott@cse.unl.edu In Homework 1, you are (supposedly) 1 Choosing a data set 2 Extracting a test set of size > 30 3 Building a tree on the training

More information

Anomaly Detection. Jing Gao. SUNY Buffalo

Anomaly Detection. Jing Gao. SUNY Buffalo Anomaly Detection Jing Gao SUNY Buffalo 1 Anomaly Detection Anomalies the set of objects are considerably dissimilar from the remainder of the data occur relatively infrequently when they do occur, their

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

How do we compare the relative performance among competing models?

How do we compare the relative performance among competing models? How do we compare the relative performance among competing models? 1 Comparing Data Mining Methods Frequent problem: we want to know which of the two learning techniques is better How to reliably say Model

More information

Bayesian Decision Theory

Bayesian Decision Theory Introduction to Pattern Recognition [ Part 4 ] Mahdi Vasighi Remarks It is quite common to assume that the data in each class are adequately described by a Gaussian distribution. Bayesian classifier is

More information

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification

More information

Performance Evaluation

Performance Evaluation Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Example:

More information

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro

Methods and Criteria for Model Selection. CS57300 Data Mining Fall Instructor: Bruno Ribeiro Methods and Criteria for Model Selection CS57300 Data Mining Fall 2016 Instructor: Bruno Ribeiro Goal } Introduce classifier evaluation criteria } Introduce Bias x Variance duality } Model Assessment }

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: June 9, 2018, 09.00 14.00 RESPONSIBLE TEACHER: Andreas Svensson NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

Evaluating Classifiers. Lecture 2 Instructor: Max Welling

Evaluating Classifiers. Lecture 2 Instructor: Max Welling Evaluating Classifiers Lecture 2 Instructor: Max Welling Evaluation of Results How do you report classification error? How certain are you about the error you claim? How do you compare two algorithms?

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 2 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA

More information

Hypothesis tests

Hypothesis tests 6.1 6.4 Hypothesis tests Prof. Tesler Math 186 February 26, 2014 Prof. Tesler 6.1 6.4 Hypothesis tests Math 186 / February 26, 2014 1 / 41 6.1 6.2 Intro to hypothesis tests and decision rules Hypothesis

More information

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio

Class 4: Classification. Quaid Morris February 11 th, 2011 ML4Bio Class 4: Classification Quaid Morris February 11 th, 211 ML4Bio Overview Basic concepts in classification: overfitting, cross-validation, evaluation. Linear Discriminant Analysis and Quadratic Discriminant

More information

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7

Bagging. Ryan Tibshirani Data Mining: / April Optional reading: ISL 8.2, ESL 8.7 Bagging Ryan Tibshirani Data Mining: 36-462/36-662 April 23 2013 Optional reading: ISL 8.2, ESL 8.7 1 Reminder: classification trees Our task is to predict the class label y {1,... K} given a feature vector

More information

Machine Learning Concepts in Chemoinformatics

Machine Learning Concepts in Chemoinformatics Machine Learning Concepts in Chemoinformatics Martin Vogt B-IT Life Science Informatics Rheinische Friedrich-Wilhelms-Universität Bonn BigChem Winter School 2017 25. October Data Mining in Chemoinformatics

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Performance Evaluation

Performance Evaluation Performance Evaluation Confusion Matrix: Detected Positive Negative Actual Positive A: True Positive B: False Negative Negative C: False Positive D: True Negative Recall or Sensitivity or True Positive

More information

EC2001 Econometrics 1 Dr. Jose Olmo Room D309

EC2001 Econometrics 1 Dr. Jose Olmo Room D309 EC2001 Econometrics 1 Dr. Jose Olmo Room D309 J.Olmo@City.ac.uk 1 Revision of Statistical Inference 1.1 Sample, observations, population A sample is a number of observations drawn from a population. Population:

More information

Machine Learning (CS 567) Lecture 2

Machine Learning (CS 567) Lecture 2 Machine Learning (CS 567) Lecture 2 Time: T-Th 5:00pm - 6:20pm Location: GFS118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

CSC314 / CSC763 Introduction to Machine Learning

CSC314 / CSC763 Introduction to Machine Learning CSC314 / CSC763 Introduction to Machine Learning COMSATS Institute of Information Technology Dr. Adeel Nawab More on Evaluating Hypotheses/Learning Algorithms Lecture Outline: Review of Confidence Intervals

More information

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline

Regularization. CSCE 970 Lecture 3: Regularization. Stephen Scott and Vinod Variyam. Introduction. Outline Other Measures 1 / 52 sscott@cse.unl.edu learning can generally be distilled to an optimization problem Choose a classifier (function, hypothesis) from a set of functions that minimizes an objective function

More information

Introduction to Logistic Regression

Introduction to Logistic Regression Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop

More information

Evaluation & Credibility Issues

Evaluation & Credibility Issues Evaluation & Credibility Issues What measure should we use? accuracy might not be enough. How reliable are the predicted results? How much should we believe in what was learned? Error on the training data

More information

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each)

Q1 (12 points): Chap 4 Exercise 3 (a) to (f) (2 points each) Q1 (1 points): Chap 4 Exercise 3 (a) to (f) ( points each) Given a table Table 1 Dataset for Exercise 3 Instance a 1 a a 3 Target Class 1 T T 1.0 + T T 6.0 + 3 T F 5.0-4 F F 4.0 + 5 F T 7.0-6 F T 3.0-7

More information

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1

CptS 570 Machine Learning School of EECS Washington State University. CptS Machine Learning 1 CptS 570 Machine Learning School of EECS Washington State University CptS 570 - Machine Learning 1 IEEE Expert, October 1996 CptS 570 - Machine Learning 2 Given sample S from all possible examples D Learner

More information

Evaluating Forecast Quality

Evaluating Forecast Quality Evaluating Forecast Quality Simon J. Mason International Research Institute for Climate Prediction Questions How do we decide whether a forecast was correct? How do we decide whether a set of forecasts

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Classifier performance evaluation

Classifier performance evaluation Classifier performance evaluation Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských partyzánu 1580/3, Czech Republic

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

ECE521 Lecture7. Logistic Regression

ECE521 Lecture7. Logistic Regression ECE521 Lecture7 Logistic Regression Outline Review of decision theory Logistic regression A single neuron Multi-class classification 2 Outline Decision theory is conceptually easy and computationally hard

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2014/2015 Lesson 16 8 April 2015 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

Introduction. Chapter 1

Introduction. Chapter 1 Chapter 1 Introduction In this book we will be concerned with supervised learning, which is the problem of learning input-output mappings from empirical data (the training dataset). Depending on the characteristics

More information

FINAL: CS 6375 (Machine Learning) Fall 2014

FINAL: CS 6375 (Machine Learning) Fall 2014 FINAL: CS 6375 (Machine Learning) Fall 2014 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run out of room for

More information

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II)

Contents Lecture 4. Lecture 4 Linear Discriminant Analysis. Summary of Lecture 3 (II/II) Summary of Lecture 3 (I/II) Contents Lecture Lecture Linear Discriminant Analysis Fredrik Lindsten Division of Systems and Control Department of Information Technology Uppsala University Email: fredriklindsten@ituuse Summary of lecture

More information

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel

Lecture 2. Judging the Performance of Classifiers. Nitin R. Patel Lecture 2 Judging the Performance of Classifiers Nitin R. Patel 1 In this note we will examine the question of how to udge the usefulness of a classifier and how to compare different classifiers. Not only

More information

Machine Learning and Data Mining. Linear classification. Kalev Kask

Machine Learning and Data Mining. Linear classification. Kalev Kask Machine Learning and Data Mining Linear classification Kalev Kask Supervised learning Notation Features x Targets y Predictions ŷ = f(x ; q) Parameters q Program ( Learner ) Learning algorithm Change q

More information

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis

Part I. Linear Discriminant Analysis. Discriminant analysis. Discriminant analysis Week 5 Based in part on slides from textbook, slides of Susan Holmes Part I Linear Discriminant Analysis October 29, 2012 1 / 1 2 / 1 Nearest centroid rule Suppose we break down our data matrix as by the

More information

Randomized Decision Trees

Randomized Decision Trees Randomized Decision Trees compiled by Alvin Wan from Professor Jitendra Malik s lecture Discrete Variables First, let us consider some terminology. We have primarily been dealing with real-valued data,

More information

Confidence intervals CE 311S

Confidence intervals CE 311S CE 311S PREVIEW OF STATISTICS The first part of the class was about probability. P(H) = 0.5 P(T) = 0.5 HTTHHTTTTHHTHTHH If we know how a random process works, what will we see in the field? Preview of

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Chapter 5: HYPOTHESIS TESTING

Chapter 5: HYPOTHESIS TESTING MATH411: Applied Statistics Dr. YU, Chi Wai Chapter 5: HYPOTHESIS TESTING 1 WHAT IS HYPOTHESIS TESTING? As its name indicates, it is about a test of hypothesis. To be more precise, we would first translate

More information

Least Squares Classification

Least Squares Classification Least Squares Classification Stephen Boyd EE103 Stanford University November 4, 2017 Outline Classification Least squares classification Multi-class classifiers Classification 2 Classification data fitting

More information

Smart Home Health Analytics Information Systems University of Maryland Baltimore County

Smart Home Health Analytics Information Systems University of Maryland Baltimore County Smart Home Health Analytics Information Systems University of Maryland Baltimore County 1 IEEE Expert, October 1996 2 Given sample S from all possible examples D Learner L learns hypothesis h based on

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

2008 Winton. Statistical Testing of RNGs

2008 Winton. Statistical Testing of RNGs 1 Statistical Testing of RNGs Criteria for Randomness For a sequence of numbers to be considered a sequence of randomly acquired numbers, it must have two basic statistical properties: Uniformly distributed

More information

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier

Classifier Evaluation. Learning Curve cleval testc. The Apparent Classification Error. Error Estimation by Test Set. Classifier Classifier Learning Curve How to estimate classifier performance. Learning curves Feature curves Rejects and ROC curves True classification error ε Bayes error ε* Sub-optimal classifier Bayes consistent

More information

Diagnostics. Gad Kimmel

Diagnostics. Gad Kimmel Diagnostics Gad Kimmel Outline Introduction. Bootstrap method. Cross validation. ROC plot. Introduction Motivation Estimating properties of an estimator. Given data samples say the average. x 1, x 2,...,

More information

Naive Bayes classification

Naive Bayes classification Naive Bayes classification Christos Dimitrakakis December 4, 2015 1 Introduction One of the most important methods in machine learning and statistics is that of Bayesian inference. This is the most fundamental

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Hypothesis Evaluation

Hypothesis Evaluation Hypothesis Evaluation Machine Learning Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Hypothesis Evaluation Fall 1395 1 / 31 Table of contents 1 Introduction

More information

Empirical Evaluation (Ch 5)

Empirical Evaluation (Ch 5) Empirical Evaluation (Ch 5) how accurate is a hypothesis/model/dec.tree? given 2 hypotheses, which is better? accuracy on training set is biased error: error train (h) = #misclassifications/ S train error

More information

Machine Learning Lecture 7

Machine Learning Lecture 7 Course Outline Machine Learning Lecture 7 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Statistical Learning Theory 23.05.2016 Discriminative Approaches (5 weeks) Linear Discriminant

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions ASTR 511/O Connell Lec 6 1 STATISTICS OF OBSERVATIONS & SAMPLING THEORY References: Bevington Data Reduction & Error Analysis for the Physical Sciences LLM: Appendix B Warning: the introductory literature

More information

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Review. DS GA 1002 Statistical and Mathematical Models.   Carlos Fernandez-Granda Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 3: Detection Theory January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology Detection theory

More information

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007

HST.582J / 6.555J / J Biomedical Signal and Image Processing Spring 2007 MIT OpenCourseWare http://ocw.mit.edu HST.582J / 6.555J / 16.456J Biomedical Signal and Image Processing Spring 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Machine Learning Recitation 8 Oct 21, Oznur Tastan

Machine Learning Recitation 8 Oct 21, Oznur Tastan Machine Learning 10601 Recitation 8 Oct 21, 2009 Oznur Tastan Outline Tree representation Brief information theory Learning decision trees Bagging Random forests Decision trees Non linear classifier Easy

More information

Introduction to Signal Detection and Classification. Phani Chavali

Introduction to Signal Detection and Classification. Phani Chavali Introduction to Signal Detection and Classification Phani Chavali Outline Detection Problem Performance Measures Receiver Operating Characteristics (ROC) F-Test - Test Linear Discriminant Analysis (LDA)

More information

Linear Classifiers as Pattern Detectors

Linear Classifiers as Pattern Detectors Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2013/2014 Lesson 18 23 April 2014 Contents Linear Classifiers as Pattern Detectors Notation...2 Linear

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

Machine Learning, Fall 2009: Midterm

Machine Learning, Fall 2009: Midterm 10-601 Machine Learning, Fall 009: Midterm Monday, November nd hours 1. Personal info: Name: Andrew account: E-mail address:. You are permitted two pages of notes and a calculator. Please turn off all

More information

Machine Learning Linear Regression. Prof. Matteo Matteucci

Machine Learning Linear Regression. Prof. Matteo Matteucci Machine Learning Linear Regression Prof. Matteo Matteucci Outline 2 o Simple Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression o Multi Variate Regession Model Least Squares

More information

CSC321 Lecture 4 The Perceptron Algorithm

CSC321 Lecture 4 The Perceptron Algorithm CSC321 Lecture 4 The Perceptron Algorithm Roger Grosse and Nitish Srivastava January 17, 2017 Roger Grosse and Nitish Srivastava CSC321 Lecture 4 The Perceptron Algorithm January 17, 2017 1 / 1 Recap:

More information

INTERVAL ESTIMATION AND HYPOTHESES TESTING

INTERVAL ESTIMATION AND HYPOTHESES TESTING INTERVAL ESTIMATION AND HYPOTHESES TESTING 1. IDEA An interval rather than a point estimate is often of interest. Confidence intervals are thus important in empirical work. To construct interval estimates,

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

7 Estimation. 7.1 Population and Sample (P.91-92)

7 Estimation. 7.1 Population and Sample (P.91-92) 7 Estimation MATH1015 Biostatistics Week 7 7.1 Population and Sample (P.91-92) Suppose that we wish to study a particular health problem in Australia, for example, the average serum cholesterol level for

More information

Content by Week Week of October 14 27

Content by Week Week of October 14 27 Content by Week Week of October 14 27 Learning objectives By the end of this week, you should be able to: Understand the purpose and interpretation of confidence intervals for the mean, Calculate confidence

More information

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels

SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels SVMs: Non-Separable Data, Convex Surrogate Loss, Multi-Class Classification, Kernels Karl Stratos June 21, 2018 1 / 33 Tangent: Some Loose Ends in Logistic Regression Polynomial feature expansion in logistic

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers

10/05/2016. Computational Methods for Data Analysis. Massimo Poesio SUPPORT VECTOR MACHINES. Support Vector Machines Linear classifiers Computational Methods for Data Analysis Massimo Poesio SUPPORT VECTOR MACHINES Support Vector Machines Linear classifiers 1 Linear Classifiers denotes +1 denotes -1 w x + b>0 f(x,w,b) = sign(w x + b) How

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

CC283 Intelligent Problem Solving 28/10/2013

CC283 Intelligent Problem Solving 28/10/2013 Machine Learning What is the research agenda? How to measure success? How to learn? Machine Learning Overview Unsupervised Learning Supervised Learning Training Testing Unseen data Data Observed x 1 x

More information

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan

Ensemble Methods. NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan Ensemble Methods NLP ML Web! Fall 2013! Andrew Rosenberg! TA/Grader: David Guy Brizan How do you make a decision? What do you want for lunch today?! What did you have last night?! What are your favorite

More information

Decision Tree Learning Lecture 2

Decision Tree Learning Lecture 2 Machine Learning Coms-4771 Decision Tree Learning Lecture 2 January 28, 2008 Two Types of Supervised Learning Problems (recap) Feature (input) space X, label (output) space Y. Unknown distribution D over

More information

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career.

9/2/2010. Wildlife Management is a very quantitative field of study. throughout this course and throughout your career. Introduction to Data and Analysis Wildlife Management is a very quantitative field of study Results from studies will be used throughout this course and throughout your career. Sampling design influences

More information