Soft and hard models. Jiří Militky Computer assisted statistical modeling in the

Size: px
Start display at page:

Download "Soft and hard models. Jiří Militky Computer assisted statistical modeling in the"

Transcription

1 Soft and hard models Jiří Militky Computer assisted statistical s modeling in the applied research

2 Monsters Giants Moore s law: processing capacity doubles every 18 months : CPU, cache, memory It s more aggressive: Disk storage capacity doubles every 9 months Eample of etreme model dependence for out-of-sample predictions What do the two laws combined produce? A rapidly growing gap between our ability to generate data, and our ability to make use of it. Disk TB Shipped per Year 1E Disk Trend (Jim Porter) 1E+6 1E+5 1E+4 disk TB growth: 112%/y EaByte Moore's Law: 58.7%/y 1E

3 Basic terms y = (n) Models represent the relationships between variables (m) 1 1 (n) X (m) b Independent variable -value Predictor Input Eplanatory Dependent variable y-value Predictand Output t Response

4 High Bias - Low Variance Models Nonlinear regression Low Bias - High Variance a overfitting - modeling the random component

5 Style of analysis Data eploration Simplification of data structures Interactive model selection Interpretation of results Depth contours :Multidimensional analog of the median

6 Dimensionality problem I setosa versicolor Basic characteristics of multivariate data is their dimension (number of elements). High dimensions bring about huge problems in their statistical analysis. Variables reduction variables have often variability on the noise level and can be therefore ecluded from data (bring no information). There are some redundancies due to near linear dependencies between some variables or due to linkages arising from their physical essence. In both cases it is possible to replace the original set by reduced number of uncorrelated new variables. virginica

7 Dimensionality problem II Multivariate curse number of data necessary for achieving the multivariate estimates precision is epoe eponential function uco of number of variables. Empty py space phenomena multivariate data are concentrated on the peripheral part of variables space. Distance problem distance between objects is often weighted by the strength of the mutual links between variables,

8 Distance Euclidean distance ) ( ) ( 2 A i T A i i d = Euclidean distance ) ( ) ( A i A i M h l bi di t ) ( ) ( 1 2 A i T A i i d = S Mahalanobis distance ) ( ) ( A i A i i d S

9 Curse of dimensionality Consider n points scattered at random in a K-dimensional unit sphere. Let D be the distance between the centre of the sphere to the closest point Median of distribution of D: Smoothing doesn t n=1 n=2 n=5 n=1 work in high dimensions: points are K= too far apart K= Solution: pick K= estimate of f() from a class of functions that K= is fleible enough to K= match f() reasonably K= closely

10 Curse of dimensionality Multivariate normal distribution X ~ MVN p (, I) Gaussian kernel density estimation Bandwidth chosen to minimize MSE at the mean Suppose want: E[( pˆ( ) p( )) 2 p( ) 2 <.1 =

11 Data Projection PCA PP Usually, for 2D projection first two PCA are used. The information from the last two PC can be interesting as well. These projections preserve angles and distances between objects (points). On the other hand there is here no objective criterion for revealing the hidden structures in data.. The linear projections of multivariate i t dt data (projection pursuit) PP satisfy to some criterion called projection inde IP(C i ). The projection vectors C i, maimizing IP(C ( i ) under the constraints C it C i = 1 are here computed. Pdf of data in the projection f P (). 2 p IP( C) = f p ( ) d Projection on these vectors is then C it X.

12 Hard and soft models According to the actual type of task, an approach to building the model f(, β ) is chosen. For the so called hard models the main aim is to select adequate function f(, β)thi This function is typically in the eplicit form and it is used instead of original. The so called soft models are in fact used for approimation of unknown function given by table of values { i, y i }, i = 1,..., n. Function f(, β) is here often replaced by a linear combination of some elementary functions

13 overfitting Soft models Smoothing (low dimensional problems) Loess splines Regression models in high dimensions PPR additive interaction Neural nets Trees MARS

14 Linear models and etensions y = f( 1,, p ) + error n 2 ( y = i y fit ) χ 2 σ i= 1 i 2

15 Various models-similar results

16 Data define stepwise function Convolution Kernels KN 1 ( ) = c ep 2 2 Kernel Gaussian Centered around zero Epanechnikov c ( 1 ) 1 Symmetric KT ( ) = otherwise Has finite support Area under curve equals 1 Tent K U K c 1 ( ) = otherwise E 2 ( ) c 1 1 ( ) = otherwise Bo New (averaged) points by convolving kernel with data New value for P 1 is: 1. Slide kernel over n 1 all points New value P( ) for = P1 K( is: -i ) P 2 n i = 1 2. Watch for overlap (Area of Overlap) P at beginning and end Data P i

17 Five Fourier Basis Functions Fleible 1.5 φ k (t) linear models We need fleible method for constructing a function y = f() that can track local curvature. We pick a system of K basis functions f k (), and call this the basis for f(). We epress f() as a weighted sum of these basis functions: f() = a 1 f 1 () + a 2 f 2 () + + a K f K () The coefficients a 1,, a K determine the shape of the function t Powers: 1,, 2, and so on. They are the basis functions for polynomials. These are not very fleible, e, and are used only for simple problems. Fourier series: 1, sin(ω), cos(ω), sin(2ω), cos(2ω), and so on for a fied known frequency ω. These are used for periodic functions. B-spline functions: These have now more or less replaced polynomials for non-periodic problems.

18 Continuity C C 1 C 2

19 Ameasureof roughness When we want acceleration to be smooth, we measure roughness at the level of acceleration: What do we mean by smooth? A function that is smooth has limited curvature. Curvature depends on the second derivative. A straight line is completely smooth. We can measure the roughness of a function y() by integrating its squared second derivative. The second derivative i notation is D 2 f(). 2 2 PEN( f ) [ D f ( )] d = 2 4 PEN( f ) D f ( ) d = Penalized least squares ( ) 2 PLS ( f ) = yi f i +λ PEN ( f ) i Parameter λ controls roughness. When λ =, only fitting the data matters. As λ increases, more emphasis on is placed on penalizing roughness. As λ, only roughness matters, and functions having zero roughness are used.

20 1.9.8 Smoothing y cubic splines.3.2 True Response.1 Noisy Response Estimated Response Basis functions are composed from cubic polynomial segments, and they belong to the class C 2 [a, b]. ] Generally a function from the class C m [a, b] is continuous, in the interval [a, b] in functional values and in the first m derivatives. Finding the best smoothing function g() leads to the minimization of the modified sum of squares n i=1= 1 2 '' ( y f( )) + α f ( ) d i i 2 Measures degree of ft to data Measures smoothness Controls tradeoff between smoothness and closeness to data

21 Runge data eample The 11 points in interval [-1, 1] were generated from corrupted Runge function 1 y = N c (, ) c =.1 alpha =.62 * data... model ---spline c =.5 alpha = c =.25 alpha = c =.75 alpha =

22 Regression B splines Splines Truncated polynomials m n j m m = j + m j j f ( ) = β h ( ) j= i= 1 j j j= 1 S ( ) a b ( ξ ) + for > ( ) + = for The corresponding model is linear in the parameters a and b and contains in total (n + m + 1) parameters. For the case when the number and position of knots are estimated the corresponding model is nonlinear B-spline basis functions h j ()

23 B splines

24 Universal 95% CI confidence bands Bootstrap :Randomly draw datasets with replacement from the training data. Each dataset has the same size as the original training set. Parametric Bootstrap Simulate new data by adding noise to the predicted values Bagging Bootstrap Aggregating

25 Monotonicity Any strictly monotonic function y() must satisfy a simple linear differential equation: 2 1 D y ( ) = w ( ) D y ( ) Because of strict monotonicity, the first derivative D 1 y() will never be, and function w() )is therefore simply D 2 y()/d 1 y(). Any strictly monotonic function y() must be epressible in the form t u y( ) = β + β1 ep w( v) dv du Unconstrained function w(v) could be a B-spline

26 Neural networks Brain Number of neurons: ~ 1 1 Connections per neuron: ~1 4 to 1 5 Neuron switching time: ~.1 second Scene recognition time: ~.1 second Perceptron 1 σ ( ) = 1+ e

27 Statistics vs. Neural Networks Model network Estimation learning Regression supervised learning Observations training set Independent var. inputs Dependent var. outputs t Parameters synaptic weights m f ( ) = w jh j ( ) Logistic basis (artificial neural networks) j= 1 1 h j () basis functions -hidden units h( ) = T 1 + ep b b ( ) w j weigth functions

28 Regression Neural Networks Neural networks are very useful when there is no idea of the functional relationship between the dependent and independent variables If you there is an idea, it will be better to use a regression model Neural networks are not based on the functional relationship between the independent variables (predictors) and the data alone define dfi the functional lform.

29 Classical Neural Networks Nodes (neurons) connected in layers

30 Varying parameters I

31 Varying parameters II

32 1 Radial Basis Transfer Function Radial basis.8 functions RBF.4 Output a Response decreases or Input p increases monotonically with distance from 1.4 Weighted Sum of Radial Basis Transfer Functions central point 1.2 h( ) = ep Gaussian RBF: ( c ) 2 r 2 Output a c.. Center,.2 r..radius Input p

33 RBF Network Single layer NN Each of n component of the input vector feeds forward to m basis functions whise outputs are linearly combined with weights w j

34 1 Prediction 8 Target Output vs. prognosis 6 4 Good prediction Bad prognosis Input Target P = -1:.1:1;.6 Output T = [ ]; -.6 p = -3:.1: Input

35 Regression 15 Rbf tree data tree RBF 1 y pre d. m = 3 5 y pred Rbf tree data c: 8.375, 7, w : , , r: , 6,

36 Runge data eample Results of optimized RBF Neural Network regression (NETLAB) for Runge model with noise level c = hidden nodes 1 hidden nodes Target.5 Target Input hidden nodes Input 1 25 hidden nodes Target.5 Target Input Input

37 Neural networks drawbacks Hard to interpret the individual effects of each predictor variable on the response. Poor etrapolation properties The programs are filled with settings, they must input and a small error will cause an error of predictions also No shape preserving, no possible to add info about limiting behavior Regression performs better when theory or eperience indicates an underlying relationship The connection weights usually do not have obvious interpretations ANNs do not produce an eplicit model even though new cases can be fed into it and new results obtained.

38 Gains from neural networks Can reduce preliminary analysis in modeling discovery of interactions and nonlinear relationships becomes automatic Increases predictive power of models (tunable smoothness) Since they are data dependent performance will improve as sample size increases Fleibility and ease of maintenance. ANNs are very fleible in adapting their behavior to new and changing environments. They are also easier to maintain, with some having the ability to learn from eperience to improve their own performance.

39 Data locations y Maimal spread of eplanatory variables y Small variability leads often to non significance of variable y y Nearly uniform data location (against DOE)

40 Real Nature Eperiment Data data Facts Hypothesis Analysis Various error sources and noise due to eperiments, Parasite variables, False correlations, Compleity of effects, Non additivity Interaction of effects Non linearity A Age W Weight M Manoeuvreability M f ( a + b ) af ( ) + b f ( + y) f( ) + f( y) W

41 Thank you!!!

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18 CSE 417T: Introduction to Machine Learning Final Review Henry Chai 12/4/18 Overfitting Overfitting is fitting the training data more than is warranted Fitting noise rather than signal 2 Estimating! "#$

More information

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92

ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 ARTIFICIAL NEURAL NETWORKS گروه مطالعاتي 17 بهار 92 BIOLOGICAL INSPIRATIONS Some numbers The human brain contains about 10 billion nerve cells (neurons) Each neuron is connected to the others through 10000

More information

Issues and Techniques in Pattern Classification

Issues and Techniques in Pattern Classification Issues and Techniques in Pattern Classification Carlotta Domeniconi www.ise.gmu.edu/~carlotta Machine Learning Given a collection of data, a machine learner eplains the underlying process that generated

More information

In the Name of God. Lectures 15&16: Radial Basis Function Networks

In the Name of God. Lectures 15&16: Radial Basis Function Networks 1 In the Name of God Lectures 15&16: Radial Basis Function Networks Some Historical Notes Learning is equivalent to finding a surface in a multidimensional space that provides a best fit to the training

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machine learning Mid-term eam October 8, 6 ( points) Your name and MIT ID: .5.5 y.5 y.5 a).5.5 b).5.5.5.5 y.5 y.5 c).5.5 d).5.5 Figure : Plots of linear regression results with different types of

More information

Artificial Neuron (Perceptron)

Artificial Neuron (Perceptron) 9/6/208 Gradient Descent (GD) Hantao Zhang Deep Learning with Python Reading: https://en.wikipedia.org/wiki/gradient_descent Artificial Neuron (Perceptron) = w T = w 0 0 + + w 2 2 + + w d d where

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Announcements Machine Learning Lecture 3 Eam dates We re in the process of fiing the first eam date Probability Density Estimation II 9.0.207 Eercises The first eercise sheet is available on L2P now First

More information

Lecture 3: Pattern Classification. Pattern classification

Lecture 3: Pattern Classification. Pattern classification EE E68: Speech & Audio Processing & Recognition Lecture 3: Pattern Classification 3 4 5 The problem of classification Linear and nonlinear classifiers Probabilistic classification Gaussians, mitures and

More information

Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011!

Artificial Neural Networks and Nonparametric Methods CMPSCI 383 Nov 17, 2011! Artificial Neural Networks" and Nonparametric Methods" CMPSCI 383 Nov 17, 2011! 1 Todayʼs lecture" How the brain works (!)! Artificial neural networks! Perceptrons! Multilayer feed-forward networks! Error

More information

ECE662: Pattern Recognition and Decision Making Processes: HW TWO

ECE662: Pattern Recognition and Decision Making Processes: HW TWO ECE662: Pattern Recognition and Decision Making Processes: HW TWO Purdue University Department of Electrical and Computer Engineering West Lafayette, INDIANA, USA Abstract. In this report experiments are

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

From Last Meeting. Studied Fisher Linear Discrimination. - Mathematics. - Point Cloud view. - Likelihood view. - Toy examples

From Last Meeting. Studied Fisher Linear Discrimination. - Mathematics. - Point Cloud view. - Likelihood view. - Toy examples From Last Meeting Studied Fisher Linear Discrimination - Mathematics - Point Cloud view - Likelihood view - Toy eamples - Etensions (e.g. Principal Discriminant Analysis) Polynomial Embedding Aizerman,

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

CSE446: non-parametric methods Spring 2017

CSE446: non-parametric methods Spring 2017 CSE446: non-parametric methods Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin and Luke Zettlemoyer Linear Regression: What can go wrong? What do we do if the bias is too strong? Might want

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Artificial Neural Networks 2

Artificial Neural Networks 2 CSC2515 Machine Learning Sam Roweis Artificial Neural s 2 We saw neural nets for classification. Same idea for regression. ANNs are just adaptive basis regression machines of the form: y k = j w kj σ(b

More information

1 Introduction. 2 Objectives

1 Introduction. 2 Objectives Order in the Black Bo: Consistency and Robustness of Hidden Neuron Activation of Feed Forward Neural Networks and Its Use in Efficient Optimization of Network Structure Sandhya Samarasinghe Abstract Neural

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Decision Trees (Cont.)

Decision Trees (Cont.) Decision Trees (Cont.) R&N Chapter 18.2,18.3 Side example with discrete (categorical) attributes: Predicting age (3 values: less than 30, 30-45, more than 45 yrs old) from census data. Attributes (split

More information

Introduction to Machine Learning Midterm, Tues April 8

Introduction to Machine Learning Midterm, Tues April 8 Introduction to Machine Learning 10-701 Midterm, Tues April 8 [1 point] Name: Andrew ID: Instructions: You are allowed a (two-sided) sheet of notes. Exam ends at 2:45pm Take a deep breath and don t spend

More information

Feedforward Neural Nets and Backpropagation

Feedforward Neural Nets and Backpropagation Feedforward Neural Nets and Backpropagation Julie Nutini University of British Columbia MLRG September 28 th, 2016 1 / 23 Supervised Learning Roadmap Supervised Learning: Assume that we are given the features

More information

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1

22c145-Fall 01: Neural Networks. Neural Networks. Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Neural Networks Readings: Chapter 19 of Russell & Norvig. Cesare Tinelli 1 Brains as Computational Devices Brains advantages with respect to digital computers: Massively parallel Fault-tolerant Reliable

More information

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning

Announcements. CS 188: Artificial Intelligence Spring Classification. Today. Classification overview. Case-Based Reasoning CS 188: Artificial Intelligence Spring 21 Lecture 22: Nearest Neighbors, Kernels 4/18/211 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein Announcements On-going: contest (optional and FUN!) Remaining

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD WHAT IS A NEURAL NETWORK? The simplest definition of a neural network, more properly referred to as an 'artificial' neural network (ANN), is provided

More information

A Practitioner s Guide to Generalized Linear Models

A Practitioner s Guide to Generalized Linear Models A Practitioners Guide to Generalized Linear Models Background The classical linear models and most of the minimum bias procedures are special cases of generalized linear models (GLMs). GLMs are more technically

More information

Content. Learning Goal. Regression vs Classification. Support Vector Machines. SVM Context

Content. Learning Goal. Regression vs Classification. Support Vector Machines. SVM Context Content Andrew Kusiak 39 Seamans Center Iowa City, IA 5-57 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak (Based on the material provided by Professor. Kecman) Introduction to learning from

More information

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling

A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling A Statistical Input Pruning Method for Artificial Neural Networks Used in Environmental Modelling G. B. Kingston, H. R. Maier and M. F. Lambert Centre for Applied Modelling in Water Engineering, School

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

Multivariate Methods in Statistical Data Analysis

Multivariate Methods in Statistical Data Analysis Multivariate Methods in Statistical Data Analysis Web-Site: http://tmva.sourceforge.net/ See also: "TMVA - Toolkit for Multivariate Data Analysis, A. Hoecker, P. Speckmayer, J. Stelzer, J. Therhaag, E.

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs

L20: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs L0: MLPs, RBFs and SPR Bayes discriminants and MLPs The role of MLP hidden units Bayes discriminants and RBFs Comparison between MLPs and RBFs CSCE 666 Pattern Analysis Ricardo Gutierrez-Osuna CSE@TAMU

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Jeff Clune Assistant Professor Evolving Artificial Intelligence Laboratory Announcements Be making progress on your projects! Three Types of Learning Unsupervised Supervised Reinforcement

More information

Linear Model Selection and Regularization

Linear Model Selection and Regularization Linear Model Selection and Regularization Recall the linear model Y = β 0 + β 1 X 1 + + β p X p + ɛ. In the lectures that follow, we consider some approaches for extending the linear model framework. In

More information

Artificial Neural Networks. Edward Gatt

Artificial Neural Networks. Edward Gatt Artificial Neural Networks Edward Gatt What are Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning Very

More information

Chapter 4 Neural Networks in System Identification

Chapter 4 Neural Networks in System Identification Chapter 4 Neural Networks in System Identification Gábor HORVÁTH Department of Measurement and Information Systems Budapest University of Technology and Economics Magyar tudósok körútja 2, 52 Budapest,

More information

Computational Investigation on the Use of FEM and RBF Neural Network in the Inverse Electromagnetic Problem of Parameter Identification

Computational Investigation on the Use of FEM and RBF Neural Network in the Inverse Electromagnetic Problem of Parameter Identification IAENG International Journal of Computer Science, 33:, IJCS_33 3 Computational Investigation on the Use of FEM and RBF Neural Network in the Inverse Electromagnetic Problem of Parameter Identification T

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Lecture 7 Artificial neural networks: Supervised learning

Lecture 7 Artificial neural networks: Supervised learning Lecture 7 Artificial neural networks: Supervised learning Introduction, or how the brain works The neuron as a simple computing element The perceptron Multilayer neural networks Accelerated learning in

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone: Neural Networks Nethra Sambamoorthi, Ph.D Jan 2003 CRMportals Inc., Nethra Sambamoorthi, Ph.D Phone: 732-972-8969 Nethra@crmportals.com What? Saying it Again in Different ways Artificial neural network

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.5. Spring 2010 Instructor: Dr. Masoud Yaghini Outline How the Brain Works Artificial Neural Networks Simple Computing Elements Feed-Forward Networks Perceptrons (Single-layer,

More information

Introduction to Machine Learning and Cross-Validation

Introduction to Machine Learning and Cross-Validation Introduction to Machine Learning and Cross-Validation Jonathan Hersh 1 February 27, 2019 J.Hersh (Chapman ) Intro & CV February 27, 2019 1 / 29 Plan 1 Introduction 2 Preliminary Terminology 3 Bias-Variance

More information

Neural Networks DWML, /25

Neural Networks DWML, /25 DWML, 2007 /25 Neural networks: Biological and artificial Consider humans: Neuron switching time 0.00 second Number of neurons 0 0 Connections per neuron 0 4-0 5 Scene recognition time 0. sec 00 inference

More information

Monte Carlo Integration I

Monte Carlo Integration I Monte Carlo Integration I Digital Image Synthesis Yung-Yu Chuang with slides by Pat Hanrahan and Torsten Moller Introduction L o p,ωo L e p,ωo s f p,ωo,ω i L p,ω i cosθ i i d The integral equations generally

More information

Mining Classification Knowledge

Mining Classification Knowledge Mining Classification Knowledge Remarks on NonSymbolic Methods JERZY STEFANOWSKI Institute of Computing Sciences, Poznań University of Technology COST Doctoral School, Troina 2008 Outline 1. Bayesian classification

More information

18.6 Regression and Classification with Linear Models

18.6 Regression and Classification with Linear Models 18.6 Regression and Classification with Linear Models 352 The hypothesis space of linear functions of continuous-valued inputs has been used for hundreds of years A univariate linear function (a straight

More information

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition

Content. Learning. Regression vs Classification. Regression a.k.a. function approximation and Classification a.k.a. pattern recognition Content Andrew Kusiak Intelligent Systems Laboratory 239 Seamans Center The University of Iowa Iowa City, IA 52242-527 andrew-kusiak@uiowa.edu http://www.icaen.uiowa.edu/~ankusiak Introduction to learning

More information

Article from. Predictive Analytics and Futurism. July 2016 Issue 13

Article from. Predictive Analytics and Futurism. July 2016 Issue 13 Article from Predictive Analytics and Futurism July 2016 Issue 13 Regression and Classification: A Deeper Look By Jeff Heaton Classification and regression are the two most common forms of models fitted

More information

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren

Outline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren 1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18 CSE 417T: Introduction to Machine Learning Lecture 11: Review Henry Chai 10/02/18 Unknown Target Function!: # % Training data Formal Setup & = ( ), + ),, ( -, + - Learning Algorithm 2 Hypothesis Set H

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Machine Learning Basics III

Machine Learning Basics III Machine Learning Basics III Benjamin Roth CIS LMU München Benjamin Roth (CIS LMU München) Machine Learning Basics III 1 / 62 Outline 1 Classification Logistic Regression 2 Gradient Based Optimization Gradient

More information

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014

Decision Trees. Machine Learning CSEP546 Carlos Guestrin University of Washington. February 3, 2014 Decision Trees Machine Learning CSEP546 Carlos Guestrin University of Washington February 3, 2014 17 Linear separability n A dataset is linearly separable iff there exists a separating hyperplane: Exists

More information

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso

Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso Artificial Neural Networks (ANN) Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Fall, 2018 Outline Introduction A Brief History ANN Architecture Terminology

More information

Soft Sensor Modelling based on Just-in-Time Learning and Bagging-PLS for Fermentation Processes

Soft Sensor Modelling based on Just-in-Time Learning and Bagging-PLS for Fermentation Processes 1435 A publication of CHEMICAL ENGINEERING TRANSACTIONS VOL. 70, 2018 Guest Editors: Timothy G. Walmsley, Petar S. Varbanov, Rongin Su, Jiří J. Klemeš Copyright 2018, AIDIC Servizi S.r.l. ISBN 978-88-95608-67-9;

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions

Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks. Cannot approximate (learn) non-linear functions BACK-PROPAGATION NETWORKS Serious limitations of (single-layer) perceptrons: Cannot learn non-linearly separable tasks Cannot approximate (learn) non-linear functions Difficult (if not impossible) to design

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Variance Reduction and Ensemble Methods

Variance Reduction and Ensemble Methods Variance Reduction and Ensemble Methods Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Last Time PAC learning Bias/variance tradeoff small hypothesis

More information

PDEEC Machine Learning 2016/17

PDEEC Machine Learning 2016/17 PDEEC Machine Learning 2016/17 Lecture - Model assessment, selection and Ensemble Jaime S. Cardoso jaime.cardoso@inesctec.pt INESC TEC and Faculdade Engenharia, Universidade do Porto Nov. 07, 2017 1 /

More information

10-701/ Machine Learning, Fall

10-701/ Machine Learning, Fall 0-70/5-78 Machine Learning, Fall 2003 Homework 2 Solution If you have questions, please contact Jiayong Zhang .. (Error Function) The sum-of-squares error is the most common training

More information

Artificial Neural Networks Francesco DI MAIO, Ph.D., Politecnico di Milano Department of Energy - Nuclear Division IEEE - Italian Reliability Chapter

Artificial Neural Networks Francesco DI MAIO, Ph.D., Politecnico di Milano Department of Energy - Nuclear Division IEEE - Italian Reliability Chapter Artificial Neural Networks Francesco DI MAIO, Ph.D., Politecnico di Milano Department of Energy - Nuclear Division IEEE - Italian Reliability Chapter (Chair) STF - China Fellow francesco.dimaio@polimi.it

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Chemometrics: Classification of spectra

Chemometrics: Classification of spectra Chemometrics: Classification of spectra Vladimir Bochko Jarmo Alander University of Vaasa November 1, 2010 Vladimir Bochko Chemometrics: Classification 1/36 Contents Terminology Introduction Big picture

More information

Learning from Data: Regression

Learning from Data: Regression November 3, 2005 http://www.anc.ed.ac.uk/ amos/lfd/ Classification or Regression? Classification: want to learn a discrete target variable. Regression: want to learn a continuous target variable. Linear

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Thomas G. Dietterich tgd@eecs.oregonstate.edu 1 Outline What is Machine Learning? Introduction to Supervised Learning: Linear Methods Overfitting, Regularization, and the

More information

Linear Discriminant Functions

Linear Discriminant Functions Linear Discriminant Functions Linear discriminant functions and decision surfaces Definition It is a function that is a linear combination of the components of g() = t + 0 () here is the eight vector and

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

NONLINEAR CLASSIFICATION AND REGRESSION. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition NONLINEAR CLASSIFICATION AND REGRESSION Nonlinear Classification and Regression: Outline 2 Multi-Layer Perceptrons The Back-Propagation Learning Algorithm Generalized Linear Models Radial Basis Function

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet

MRC: The Maximum Rejection Classifier for Pattern Detection. With Michael Elad, Renato Keshet MRC: The Maimum Rejection Classifier for Pattern Detection With Michael Elad, Renato Keshet 1 The Problem Pattern Detection: Given a pattern that is subjected to a particular type of variation, detect

More information

Learning from Examples

Learning from Examples Learning from Examples Data fitting Decision trees Cross validation Computational learning theory Linear classifiers Neural networks Nonparametric methods: nearest neighbor Support vector machines Ensemble

More information

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION 1 Outline Basic terminology Features Training and validation Model selection Error and loss measures Statistical comparison Evaluation measures 2 Terminology

More information

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2

Nearest Neighbor. Machine Learning CSE546 Kevin Jamieson University of Washington. October 26, Kevin Jamieson 2 Nearest Neighbor Machine Learning CSE546 Kevin Jamieson University of Washington October 26, 2017 2017 Kevin Jamieson 2 Some data, Bayes Classifier Training data: True label: +1 True label: -1 Optimal

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

CS:4420 Artificial Intelligence

CS:4420 Artificial Intelligence CS:4420 Artificial Intelligence Spring 2018 Neural Networks Cesare Tinelli The University of Iowa Copyright 2004 18, Cesare Tinelli and Stuart Russell a a These notes were originally developed by Stuart

More information

Decision-Oriented Environmental Mapping with Radial Basis Function Neural Networks

Decision-Oriented Environmental Mapping with Radial Basis Function Neural Networks Decision-Oriented Environmental Mapping with Radial Basis Function Neural Networks V. Demyanov (1), N. Gilardi (2), M. Kanevski (1,2), M. Maignan (3), V. Polishchuk (1) (1) Institute of Nuclear Safety

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Computational statistics

Computational statistics Computational statistics Lecture 3: Neural networks Thierry Denœux 5 March, 2016 Neural networks A class of learning methods that was developed separately in different fields statistics and artificial

More information

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes

Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes Part I Week 7 Based in part on slides from textbook, slides of Susan Holmes Support Vector Machine, Random Forests, Boosting December 2, 2012 1 / 1 2 / 1 Neural networks Artificial Neural networks: Networks

More information

VBM683 Machine Learning

VBM683 Machine Learning VBM683 Machine Learning Pinar Duygulu Slides are adapted from Dhruv Batra Bias is the algorithm's tendency to consistently learn the wrong thing by not taking into account all the information in the data

More information

Unit III. A Survey of Neural Network Model

Unit III. A Survey of Neural Network Model Unit III A Survey of Neural Network Model 1 Single Layer Perceptron Perceptron the first adaptive network architecture was invented by Frank Rosenblatt in 1957. It can be used for the classification of

More information

M.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011

M.S. Project Report. Efficient Failure Rate Prediction for SRAM Cells via Gibbs Sampling. Yamei Feng 12/15/2011 .S. Project Report Efficient Failure Rate Prediction for SRA Cells via Gibbs Sampling Yamei Feng /5/ Committee embers: Prof. Xin Li Prof. Ken ai Table of Contents CHAPTER INTRODUCTION...3 CHAPTER BACKGROUND...5

More information

Learning with multiple models. Boosting.

Learning with multiple models. Boosting. CS 2750 Machine Learning Lecture 21 Learning with multiple models. Boosting. Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Learning with multiple models: Approach 2 Approach 2: use multiple models

More information

Neural Networks Introduction

Neural Networks Introduction Neural Networks Introduction H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural Networks 1/22 Biological

More information

ECE 5424: Introduction to Machine Learning

ECE 5424: Introduction to Machine Learning ECE 5424: Introduction to Machine Learning Topics: Ensemble Methods: Bagging, Boosting PAC Learning Readings: Murphy 16.4;; Hastie 16 Stefan Lee Virginia Tech Fighting the bias-variance tradeoff Simple

More information

5.6 Nonparametric Logistic Regression

5.6 Nonparametric Logistic Regression 5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE471-571 Pattern Recognition Lecture 13 Decision Tree Hairong Qi, Gonzalez Family Professor Electrical Engineering and Computer Science University of Tennessee, Knoxville http://www.eecs.utk.edu/faculty/qi

More information