The blessing of dimensionality for kernel methods
|
|
- Quentin Barrett
- 5 years ago
- Views:
Transcription
1 fr kernel methds Building classifiers in high dimensinal space Pierre Dupnt Classifiers define decisin surfaces in sme feature space where the data is either initially represented r mapped t Representing r mapping the data in a high dimensinal space may ease the separability between the classes (see Cver s therem) but... discriminatin is nt easier if the mapped pints naturally lie int a lwer dimensinal manifld the higher the dimensin f the feature space the mre parameters may need t be estimated Typeset by FilTEX Outline The curse f dimensinality The curse f dimensinality The 3 cre ideas f kernel methds, and specifically SVMs Hw t avid the curse f dimensinality? sme results frm Vapnik s Statistical Learning Thery Why SVMs are interesting techniques but nt the panacea? If the number f parameters is t large with respect t the number f training samples there is a risk f ver-fitting the training data Over-fitting implies pr generalizatin t crrectly classify new data Additinally, sensitivity t nise and cmputatinal cmpleity may increase with the dimensin f the feature space This prblem is knwn as the curse f dimensinality Hwever... Kernels and regularized risk 3
2 The 3 cre ideas f kernel methds VC dimensin. The s-called kernel trick allws t define an implicit mapping t a higher dimensinal feature space with tw interesting cnsequences there is n need t cmpute anything in the higher dimensinal space the number f parameters t be estimated becmes independent f the dimensin f the feature space. The capacity f the class f discriminant functins cnsidered matters mre than the dimensin f the space they lie int Capacity is a measure f the cmpleity f a class f functins The best knwn capacity cncept is the Vapnik-Chervnenkis (VC) dimensin 3. Cntrlling the capacity f linear discriminants can be dne by maimizing the margin f the hyperplane with respect t the training samples Kernels + capacity cntrl thrugh margin maimizatin the blessing f dimensinality 4 The Vapnik-Chervnenkis dimensin (VC-dim) f a functin class F defined ver an instance space X is the size f the largest subset f X shattered by F. If arbitrarily large finite sets f X can be shattered by F then V C(F) We have seen a set f 3 pints in IR which can be shattered by hyperplanes even thugh they are sets f 3 pints which cannt be shattered? N sets f 4 pints can be shattered by a hyperplane in IR, n matter hw they are placed the VC-dim f hyperplanes in IR is 3 Mre generally, the VC-dim f hyperplanes in IR d is d +? 6 Shattering Empirical Risk A set f samples S is shattered by a functin class F if and nly if fr every pssible +/- labeling f the samples in S there eists sme functin in F which perfectly classify the samples Fr instance, if the functin class F is the set f hyperplanes in IR (= lines) and if we cnsider 3 samples, there are 3 = 8 pssible labellings Fr any such labeling there is a hyperplane classifying crrectly the samples Let g() be a discriminant fr a binary classificatin prblem Ω = {ω, ω } The decisin functin f : X {, } is defined as f = sign (g()) Let,..., n be a training set f n samples with assciated labels z,..., z n By definitin z i = if i is labeled ω, and z i = therwise The zer-ne lss functin f() z defines the crrectness f the classificatin f any sample. The lss is 0 if is crrectly classified, and therwise The average training errr r empirical risk is defined as R emp [f] = n n i= f( i) z i 5 7
3 Bunding the Risk Practical use f the VC-bund? The true risk r prbability f misclassificatin fr any test sample drawn frm P (, z) the (unknwn) jined distributin f samples and class labels is defined as R[f] = f() z dp (, z) Over-fitting ccurs when a functin f minimizing the empirical risk R emp [f] des nt minimize the true risk Frtunately, we can bund the risk if we knw the VC-dim h f the functin class F which f belngs t In particular, if h < n (the number f training samples), then fr all functins f F, independently f the underlying distributin P, with prbability at least δ R[f] R emp [f] + ( h (ln nh ) n + + ln 4 ) δ R[f] R emp [f] + ( h (ln nh ) n + + ln 4 ) δ The abve bund is nt tight because it derives frm a wrst-case analysis and must hld fr any distributin P (, z) In practice, the distributin P (, z) is unknwn but nt arbitrary The interest f this bund is nt s much its practical use but it mtivates t best fit the data with the simplest pssible class f functins 8 0 Interpretatin f the VC-bund Structural risk minimizatin R[f] R emp [f] + n ( h (ln nh ) + + ln 4 ) δ } {{ } capacity r cnfidence term Minimizing bth R emp [f] and the capacity term by chsing the class f functins suitable fr the amunt f training data is the cre f structural risk minimizatin The results hlds nly with prbability (at least) δ because the test data may be particularly difficult When the training set size n the capacity term 0 and R[f] R emp [f] Cnsidering a functin class F with lw VC-dim h reduces the capacity term Hwever if the functin class is t simple (t lw VC-dim) it will be difficult t minimize R emp [f] This prperty can be seen as anther frmulatin f the classical bias-variance trade-ff there is an ptimum t be fund 9 There is n curse f dimensinality but there is a curse f capacity
4 Supprt Vectrs and Maimal Margin Hyperplane Discussin When the data is linearly separable in sme apprpriate feature space, the separating hyperplane is nt unique The maimal margin hyperplane separates the data with the largest margin Fr each separating hyperplane, there is an assciated set f supprt vectrs z z The abve prperty defines the VC-dim f cannical hyperplanes relative t a dataset (nt all hyperplanes in IR d ) The maimal margin ρ needs t be defined a priri (nt strictly equivalent t the SVM ptimizatin prblem) A similar and mre general result hlds fr anther capacity cncept: the fat shattering dimensin Overfitting is still pssible depending n the kernel chice (see later... ) z z margins maimal margin Supprt Vectrs 4 Maimizing the margin is a gd idea Mercer kernels Recall that the VC-dim f hyperplanes in IR d is d + the capacity term increases with the dimensin f the space Hpefully, fr hyperplanes with margin ρ it was shwn that the VC-dim h is bunded h R ρ + where R is the radius f the smallest hypersphere cntaining the data The key advantage f this bund is that it is independent f the dimensin d!!! Maimizing the margin is a way t cntrl the curse f capacity while wrking in very high dimensinal spaces Maimizing the margin is als a way t increase rbustness t nise since perturbatins arund the training pints d nt affect much the decisin bundary A kernel k is a symmetric functin with k(, ) = φ(), φ( ) = k(, ) where φ is a mapping frm the riginal input space X t a feature space Y Mercer Cnditins: A symmetric functin k : X X IR is a kernel if fr any finite subset {,..., n } f X the gram matri K = [k( i, j )] n i,j= is psitive semi-definite (has nn-negative eigenvalues) k(, ) can be thught f as a similarity measure between and which generalizes the simple dt prduct, 3 5
5 Implicit mapping induced by a kernel SVMs pr s If k satisfies the Mercer cnditins, there eists a mapping φ such that k(, ) = φ(), φ( ) We can directly specify k rather than φ there is an implicit mapping t a new feature space Linear kernel k(, ) =, (φ maps t itself) Plynmial kernel k(, ) = (, + c) b with b N, c 0 ( ) Gaussian Radial Basis Functin kernel k(, ) = ep with σ 0 σ Sigmid kernel k(, ) = tanh(κ, + ϑ) with κ > 0 and ϑ < 0 The kernel trick Any learning algrithm that uses the data nly via dt prducts can rely n this implicit mapping by replacing, by k(, ) SVMs are theretically mtivated by Vapnik s statistical learning thery estimatin is a cnve ptimizatin prblem (n multiple lcal minima) primal-dual frmulatin and the duality gap t measure distance t ptimum sparse slutin: nly the supprt vectrs matter in the decisin functin state f the art results n many different datasets the kernel trick allws t build classifiers fr structured data such as strings, trees, graphs, prbability distributins, etc relatively few meta-parameters: C (sft margin frmulatin), σ (RBF kernel), the kernel itself, Hard margin SVMs SVMs are interesting but nt the panacea The SVM estimatin prblem (i.e. finding a maimal margin hyperplane in the feature space) may be frmulated (in its dual frm) as ma W (α) = n α i α i= n α i α j z i z j k( i, j ) i,j= The number f parameters nly depends n the number f training samples n, nt the dimensin f the input r the feature space The decisin functin is defined as: f() = sign i SV α i z i k( i, ) + w 0 which nly depends n the (s-called supprt) vectrs i such that α i 0 many aspects are nt new: The kernel trick is nearly a century ld (Mercer 909) but it was used nly much later t build a classifier (Bser, Guyn and Vapnik; COLT 9) The H and Kashyap algrithm (965) estimates a hyperplane with a large margin (with a minimum-squared errr criterin and withut the kernel trick) SVMs with RBF kernels are clse t RBF netwrks (identical decisin functins, different estimatin prcedures: k-means vs prttypes selected as supprt vectrs) Cmputatinal cst f the training prcedure becmes prhibitive fr very large datasets (but chunking can help) The kernel chice is critical This is a practical cncern but als a theretical issue: The VC-bund applies in the (implicit) feature space!!! 7 9
6 Over-fitting induced by the kernel Kernel chice is a regularizatin chice ( ) Cnsider a RBF kernel k(, ) = ep with σ 0 σ The gram matri K = [k( i, j )] n i,j= tends t ci In ther wrds, training pints are nly cnsidered (very) similar t themselves fitting the training set is easy but generalizatin is likely t be pr The ideal kernel is such that any pair f pints (, ) are cnsidered similar if and nly if they shuld be assciated t the same class label z the design f this kernel wuld require the knwledge f P (, z) t minimize the true risk R[f] Representer therem (see [Schölkpf and Smla, 00], chap. 4) Let H dente the feature space assciated t a kernel k and {,..., n } be a labeled data set Each minimizer f H f the regularized risk R emp [f] + λω[f] admits a representatin f the frm: f() = n i= α ik( i, ) In ther wrds, the kernel chice is a regularizatin chice The RBF kernel can be shwn t penalize derivatives f all rders, and thus enfrce mre r less smthness depending n σ 0 Regularized risk Take hme message True risk minimizatin can be apprimated by minimizing a regularized risk R reg [f] = R emp [f] + λω[f] where Ω[f] penalizes the lack f smthness f the functin f and λ is a regularizatin cnstant Maimizing the margin f classificatin by a hyperplane in feature space is equivalent t minimizing Ω[f] = w the curse f capacity matters mre than the curse f dimensinality maimizing the margin is a gd idea t cntrl the capacity f the functin class cnsidered and t build classifiers rbust t nise ξ j This setting crrespnds t sft-margin SVMs with R emp [f] apprimated by a functin f the slack variables ξ i there is n free lunch in the kernel chice but each kernel crrespnds t a regularizatin peratr w ξ i min w,ξ } w {{} margin maimizatin + C n n i= ξ i }{{} margin errr 3
7 References [Bser et al., 99] Bser, B., Guyn, I., and Vapnik, V. (99). A training algrithm fr ptimal margin classifiers. In Prceedings f the 5th Annual ACM Wrkshp n Cmputatinal Learning Thery, pages 44 5, Pittsburgh, PA, USA. [Cristianini and Shawe-Taylr, 000] Cristianini, N. and Shawe-Taylr, J. (000). Supprt Vectr Machines and ther kernel-based learning methds. Cambridge University Press. [H and Kashyap, 965] H, Y.-C. and Kashyap, R. (965). An algrithm fr linear inequalities and its applicatins. IEEE Transactins n Electrnic Cmputers, EC 4: [Mercer, 909] Mercer, J. (909). Functins f psitive and negative type and their cnnectin t the thery f integral equatins. Philsphical Transactins f The Ryal Sciety Lndn, A 09: [Schölkpf and Smla, 00] Schölkpf, B. and Smla, A. (00). Learning with Kernels: Supprt Vectr Machines, Regularizatin, Optimizatin and Beynd. MIT Press, Cambridge, MA. [Shawe-Taylr and Cristianini, 004] Shawe-Taylr, J. and Cristianini, N. (004). Kernel Methds fr Pattern Analysis. Cambridge University Press. [Vapnik, 000] Vapnik, V. (000). Springer, nd editin. The Nature f Statistical Learning Thery. 5
Pattern Recognition 2014 Support Vector Machines
Pattern Recgnitin 2014 Supprt Vectr Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recgnitin 1 / 55 Overview 1 Separable Case 2 Kernel Functins 3 Allwing Errrs (Sft
More informationSupport-Vector Machines
Supprt-Vectr Machines Intrductin Supprt vectr machine is a linear machine with sme very nice prperties. Haykin chapter 6. See Alpaydin chapter 13 fr similar cntent. Nte: Part f this lecture drew material
More informationIAML: Support Vector Machines
1 / 22 IAML: Supprt Vectr Machines Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester 1 2 / 22 Outline Separating hyperplane with maimum margin Nn-separable training data Epanding the input int
More informationIn SMV I. IAML: Support Vector Machines II. This Time. The SVM optimization problem. We saw:
In SMV I IAML: Supprt Vectr Machines II Nigel Gddard Schl f Infrmatics Semester 1 We sa: Ma margin trick Gemetry f the margin and h t cmpute it Finding the ma margin hyperplane using a cnstrained ptimizatin
More informationCOMP 551 Applied Machine Learning Lecture 9: Support Vector Machines (cont d)
COMP 551 Applied Machine Learning Lecture 9: Supprt Vectr Machines (cnt d) Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Class web page: www.cs.mcgill.ca/~hvanh2/cmp551 Unless therwise
More informationCOMP 551 Applied Machine Learning Lecture 11: Support Vector Machines
COMP 551 Applied Machine Learning Lecture 11: Supprt Vectr Machines Instructr: (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted fr this curse
More informationSURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES
1 SURVIVAL ANALYSIS WITH SUPPORT VECTOR MACHINES Wlfgang HÄRDLE Ruslan MORO Center fr Applied Statistics and Ecnmics (CASE), Humbldt-Universität zu Berlin Mtivatin 2 Applicatins in Medicine estimatin f
More informationx 1 Outline IAML: Logistic Regression Decision Boundaries Example Data
Outline IAML: Lgistic Regressin Charles Suttn and Victr Lavrenk Schl f Infrmatics Semester Lgistic functin Lgistic regressin Learning lgistic regressin Optimizatin The pwer f nn-linear basis functins Least-squares
More informationPart 3 Introduction to statistical classification techniques
Part 3 Intrductin t statistical classificatin techniques Machine Learning, Part 3, March 07 Fabi Rli Preamble ØIn Part we have seen that if we knw: Psterir prbabilities P(ω i / ) Or the equivalent terms
More informationComputational modeling techniques
Cmputatinal mdeling techniques Lecture 4: Mdel checing fr ODE mdels In Petre Department f IT, Åb Aademi http://www.users.ab.fi/ipetre/cmpmd/ Cntent Stichimetric matrix Calculating the mass cnservatin relatins
More informationWhat is Statistical Learning?
What is Statistical Learning? Sales 5 10 15 20 25 Sales 5 10 15 20 25 Sales 5 10 15 20 25 0 50 100 200 300 TV 0 10 20 30 40 50 Radi 0 20 40 60 80 100 Newspaper Shwn are Sales vs TV, Radi and Newspaper,
More informationCOMP 551 Applied Machine Learning Lecture 5: Generative models for linear classification
COMP 551 Applied Machine Learning Lecture 5: Generative mdels fr linear classificatin Instructr: Herke van Hf (herke.vanhf@mail.mcgill.ca) Slides mstly by: Jelle Pineau Class web page: www.cs.mcgill.ca/~hvanh2/cmp551
More informationSupport Vector Machines and Flexible Discriminants
12 Supprt Vectr Machines and Flexible Discriminants This is page 417 Printer: Opaque this 12.1 Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationChapter 3: Cluster Analysis
Chapter 3: Cluster Analysis } 3.1 Basic Cncepts f Clustering 3.1.1 Cluster Analysis 3.1. Clustering Categries } 3. Partitining Methds 3..1 The principle 3.. K-Means Methd 3..3 K-Medids Methd 3..4 CLARA
More informationBootstrap Method > # Purpose: understand how bootstrap method works > obs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(obs) >
Btstrap Methd > # Purpse: understand hw btstrap methd wrks > bs=c(11.96, 5.03, 67.40, 16.07, 31.50, 7.73, 11.10, 22.38) > n=length(bs) > mean(bs) [1] 21.64625 > # estimate f lambda > lambda = 1/mean(bs);
More informationLyapunov Stability Stability of Equilibrium Points
Lyapunv Stability Stability f Equilibrium Pints 1. Stability f Equilibrium Pints - Definitins In this sectin we cnsider n-th rder nnlinear time varying cntinuus time (C) systems f the frm x = f ( t, x),
More informationLecture 2: Supervised vs. unsupervised learning, bias-variance tradeoff
Lecture 2: Supervised vs. unsupervised learning, bias-variance tradeff Reading: Chapter 2 STATS 202: Data mining and analysis September 27, 2017 1 / 20 Supervised vs. unsupervised learning In unsupervised
More informationThe Kullback-Leibler Kernel as a Framework for Discriminant and Localized Representations for Visual Recognition
The Kullback-Leibler Kernel as a Framewrk fr Discriminant and Lcalized Representatins fr Visual Recgnitin Nun Vascncels Purdy H Pedr Mren ECE Department University f Califrnia, San Dieg HP Labs Cambridge
More informationT Algorithmic methods for data mining. Slide set 6: dimensionality reduction
T-61.5060 Algrithmic methds fr data mining Slide set 6: dimensinality reductin reading assignment LRU bk: 11.1 11.3 PCA tutrial in mycurses (ptinal) ptinal: An Elementary Prf f a Therem f Jhnsn and Lindenstrauss,
More informationAdmissibility Conditions and Asymptotic Behavior of Strongly Regular Graphs
Admissibility Cnditins and Asympttic Behavir f Strngly Regular Graphs VASCO MOÇO MANO Department f Mathematics University f Prt Oprt PORTUGAL vascmcman@gmailcm LUÍS ANTÓNIO DE ALMEIDA VIEIRA Department
More informationElements of Machine Intelligence - I
ECE-175A Elements f Machine Intelligence - I Ken Kreutz-Delgad Nun Vascncels ECE Department, UCSD Winter 2011 The curse The curse will cver basic, but imprtant, aspects f machine learning and pattern recgnitin
More informationCOMP 551 Applied Machine Learning Lecture 4: Linear classification
COMP 551 Applied Machine Learning Lecture 4: Linear classificatin Instructr: Jelle Pineau (jpineau@cs.mcgill.ca) Class web page: www.cs.mcgill.ca/~jpineau/cmp551 Unless therwise nted, all material psted
More informationResampling Methods. Cross-validation, Bootstrapping. Marek Petrik 2/21/2017
Resampling Methds Crss-validatin, Btstrapping Marek Petrik 2/21/2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins in R (Springer, 2013) with
More informationinitially lcated away frm the data set never win the cmpetitin, resulting in a nnptimal nal cdebk, [2] [3] [4] and [5]. Khnen's Self Organizing Featur
Cdewrd Distributin fr Frequency Sensitive Cmpetitive Learning with One Dimensinal Input Data Aristides S. Galanpuls and Stanley C. Ahalt Department f Electrical Engineering The Ohi State University Abstract
More informationTree Structured Classifier
Tree Structured Classifier Reference: Classificatin and Regressin Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stne, Chapman & Hall, 98. A Medical Eample (CART): Predict high risk patients
More informationLinear programming III
Linear prgramming III Review 1/33 What have cvered in previus tw classes LP prblem setup: linear bjective functin, linear cnstraints. exist extreme pint ptimal slutin. Simplex methd: g thrugh extreme pint
More informationk-nearest Neighbor How to choose k Average of k points more reliable when: Large k: noise in attributes +o o noise in class labels
Mtivating Example Memry-Based Learning Instance-Based Learning K-earest eighbr Inductive Assumptin Similar inputs map t similar utputs If nt true => learning is impssible If true => learning reduces t
More informationKinetic Model Completeness
5.68J/10.652J Spring 2003 Lecture Ntes Tuesday April 15, 2003 Kinetic Mdel Cmpleteness We say a chemical kinetic mdel is cmplete fr a particular reactin cnditin when it cntains all the species and reactins
More informationContents. This is page i Printer: Opaque this
Cntents This is page i Printer: Opaque this Supprt Vectr Machines and Flexible Discriminants. Intrductin............. The Supprt Vectr Classifier.... Cmputing the Supprt Vectr Classifier........ Mixture
More information, which yields. where z1. and z2
The Gaussian r Nrmal PDF, Page 1 The Gaussian r Nrmal Prbability Density Functin Authr: Jhn M Cimbala, Penn State University Latest revisin: 11 September 13 The Gaussian r Nrmal Prbability Density Functin
More informationMidwest Big Data Summer School: Machine Learning I: Introduction. Kris De Brabanter
Midwest Big Data Summer Schl: Machine Learning I: Intrductin Kris De Brabanter kbrabant@iastate.edu Iwa State University Department f Statistics Department f Cmputer Science June 24, 2016 1/24 Outline
More informationSequential Allocation with Minimal Switching
In Cmputing Science and Statistics 28 (1996), pp. 567 572 Sequential Allcatin with Minimal Switching Quentin F. Stut 1 Janis Hardwick 1 EECS Dept., University f Michigan Statistics Dept., Purdue University
More informationHomology groups of disks with holes
Hmlgy grups f disks with hles THEOREM. Let p 1,, p k } be a sequence f distinct pints in the interir unit disk D n where n 2, and suppse that fr all j the sets E j Int D n are clsed, pairwise disjint subdisks.
More informationImage Processing 1 (IP1) Bildverarbeitung 1
MIN-Fakultät Fachbereich Infrmatik Arbeitsbereich SAV/BV (KOGS) Image Prcessing 1 (IP1) Bildverarbeitung 1 Lecture 15 Pa;ern Recgni=n Winter Semester 2014/15 Dr. Benjamin Seppke Prf. Siegfried S=ehl What
More information4th Indian Institute of Astrophysics - PennState Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur. Correlation and Regression
4th Indian Institute f Astrphysics - PennState Astrstatistics Schl July, 2013 Vainu Bappu Observatry, Kavalur Crrelatin and Regressin Rahul Ry Indian Statistical Institute, Delhi. Crrelatin Cnsider a tw
More informationThe Solution Path of the Slab Support Vector Machine
CCCG 2008, Mntréal, Québec, August 3 5, 2008 The Slutin Path f the Slab Supprt Vectr Machine Michael Eigensatz Jachim Giesen Madhusudan Manjunath Abstract Given a set f pints in a Hilbert space that can
More informationSimple Linear Regression (single variable)
Simple Linear Regressin (single variable) Intrductin t Machine Learning Marek Petrik January 31, 2017 Sme f the figures in this presentatin are taken frm An Intrductin t Statistical Learning, with applicatins
More informationAdministrativia. Assignment 1 due thursday 9/23/2004 BEFORE midnight. Midterm exam 10/07/2003 in class. CS 460, Sessions 8-9 1
Administrativia Assignment 1 due thursday 9/23/2004 BEFORE midnight Midterm eam 10/07/2003 in class CS 460, Sessins 8-9 1 Last time: search strategies Uninfrmed: Use nly infrmatin available in the prblem
More informationPhysical Layer: Outline
18-: Intrductin t Telecmmunicatin Netwrks Lectures : Physical Layer Peter Steenkiste Spring 01 www.cs.cmu.edu/~prs/nets-ece Physical Layer: Outline Digital Representatin f Infrmatin Characterizatin f Cmmunicatin
More informationMargin Distribution and Learning Algorithms
ICML 03 Margin Distributin and Learning Algrithms Ashutsh Garg IBM Almaden Research Center, San Jse, CA 9513 USA Dan Rth Department f Cmputer Science, University f Illinis, Urbana, IL 61801 USA ASHUTOSH@US.IBM.COM
More informationSlide04 (supplemental) Haykin Chapter 4 (both 2nd and 3rd ed): Multi-Layer Perceptrons
Slide04 supplemental) Haykin Chapter 4 bth 2nd and 3rd ed): Multi-Layer Perceptrns CPSC 636-600 Instructr: Ynsuck Che Heuristic fr Making Backprp Perfrm Better 1. Sequential vs. batch update: fr large
More informationSupport Vector Machines and Flexible Discriminants
Supprt Vectr Machines and Flexible Discriminants This is page Printer: Opaque this. Intrductin In this chapter we describe generalizatins f linear decisin bundaries fr classificatin. Optimal separating
More informationAgenda. What is Machine Learning? Learning Type of Learning: Supervised, Unsupervised and semi supervised Classification
Agenda Artificial Intelligence and its applicatins Lecture 6 Supervised Learning Prfessr Daniel Yeung danyeung@ieee.rg Dr. Patrick Chan patrickchan@ieee.rg Suth China University f Technlgy, China Learning
More informationFigure 1a. A planar mechanism.
ME 5 - Machine Design I Fall Semester 0 Name f Student Lab Sectin Number EXAM. OPEN BOOK AND CLOSED NOTES. Mnday, September rd, 0 Write n ne side nly f the paper prvided fr yur slutins. Where necessary,
More information1996 Engineering Systems Design and Analysis Conference, Montpellier, France, July 1-4, 1996, Vol. 7, pp
THE POWER AND LIMIT OF NEURAL NETWORKS T. Y. Lin Department f Mathematics and Cmputer Science San Jse State University San Jse, Califrnia 959-003 tylin@cs.ssu.edu and Bereley Initiative in Sft Cmputing*
More informationDifferentiation Applications 1: Related Rates
Differentiatin Applicatins 1: Related Rates 151 Differentiatin Applicatins 1: Related Rates Mdel 1: Sliding Ladder 10 ladder y 10 ladder 10 ladder A 10 ft ladder is leaning against a wall when the bttm
More informationLead/Lag Compensator Frequency Domain Properties and Design Methods
Lectures 6 and 7 Lead/Lag Cmpensatr Frequency Dmain Prperties and Design Methds Definitin Cnsider the cmpensatr (ie cntrller Fr, it is called a lag cmpensatr s K Fr s, it is called a lead cmpensatr Ntatin
More informationLinear Classification
Linear Classificatin CS 54: Machine Learning Slides adapted frm Lee Cper, Jydeep Ghsh, and Sham Kakade Review: Linear Regressin CS 54 [Spring 07] - H Regressin Given an input vectr x T = (x, x,, xp), we
More informationDistributions, spatial statistics and a Bayesian perspective
Distributins, spatial statistics and a Bayesian perspective Dug Nychka Natinal Center fr Atmspheric Research Distributins and densities Cnditinal distributins and Bayes Thm Bivariate nrmal Spatial statistics
More information1 The limitations of Hartree Fock approximation
Chapter: Pst-Hartree Fck Methds - I The limitatins f Hartree Fck apprximatin The n electrn single determinant Hartree Fck wave functin is the variatinal best amng all pssible n electrn single determinants
More informationCHAPTER 3 INEQUALITIES. Copyright -The Institute of Chartered Accountants of India
CHAPTER 3 INEQUALITIES Cpyright -The Institute f Chartered Accuntants f India INEQUALITIES LEARNING OBJECTIVES One f the widely used decisin making prblems, nwadays, is t decide n the ptimal mix f scarce
More informationBiplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 13 Offprint
Biplts in Practice MICHAEL GREENACRE Prfessr f Statistics at the Pmpeu Fabra University Chapter 13 Offprint CASE STUDY BIOMEDICINE Cmparing Cancer Types Accrding t Gene Epressin Arrays First published:
More informationMATHEMATICS SYLLABUS SECONDARY 5th YEAR
Eurpean Schls Office f the Secretary-General Pedaggical Develpment Unit Ref. : 011-01-D-8-en- Orig. : EN MATHEMATICS SYLLABUS SECONDARY 5th YEAR 6 perid/week curse APPROVED BY THE JOINT TEACHING COMMITTEE
More informationLeast Squares Optimal Filtering with Multirate Observations
Prc. 36th Asilmar Cnf. n Signals, Systems, and Cmputers, Pacific Grve, CA, Nvember 2002 Least Squares Optimal Filtering with Multirate Observatins Charles W. herrien and Anthny H. Hawes Department f Electrical
More informationCOMP9444 Neural Networks and Deep Learning 3. Backpropagation
COMP9444 Neural Netwrks and Deep Learning 3. Backprpagatin Tetbk, Sectins 4.3, 5.2, 6.5.2 COMP9444 17s2 Backprpagatin 1 Outline Supervised Learning Ockham s Razr (5.2) Multi-Layer Netwrks Gradient Descent
More informationMath Foundations 20 Work Plan
Math Fundatins 20 Wrk Plan Units / Tpics 20.8 Demnstrate understanding f systems f linear inequalities in tw variables. Time Frame December 1-3 weeks 6-10 Majr Learning Indicatrs Identify situatins relevant
More informationPressure And Entropy Variations Across The Weak Shock Wave Due To Viscosity Effects
Pressure And Entrpy Variatins Acrss The Weak Shck Wave Due T Viscsity Effects OSTAFA A. A. AHOUD Department f athematics Faculty f Science Benha University 13518 Benha EGYPT Abstract:-The nnlinear differential
More informationAerodynamic Separability in Tip Speed Ratio and Separability in Wind Speed- a Comparison
Jurnal f Physics: Cnference Series OPEN ACCESS Aerdynamic Separability in Tip Speed Rati and Separability in Wind Speed- a Cmparisn T cite this article: M L Gala Sants et al 14 J. Phys.: Cnf. Ser. 555
More informationA New Evaluation Measure. J. Joiner and L. Werner. The problems of evaluation and the needed criteria of evaluation
III-l III. A New Evaluatin Measure J. Jiner and L. Werner Abstract The prblems f evaluatin and the needed criteria f evaluatin measures in the SMART system f infrmatin retrieval are reviewed and discussed.
More informationMODULE 1. e x + c. [You can t separate a demominator, but you can divide a single denominator into each numerator term] a + b a(a + b)+1 = a + b
. REVIEW OF SOME BASIC ALGEBRA MODULE () Slving Equatins Yu shuld be able t slve fr x: a + b = c a d + e x + c and get x = e(ba +) b(c a) d(ba +) c Cmmn mistakes and strategies:. a b + c a b + a c, but
More informationSUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS
1 SUPPORT VECTOR MACHINES FOR BANKRUPTCY ANALYSIS Wlfgang HÄRDLE 2 Ruslan MORO 1,2 Drthea SCHÄFER 1 1 Deutsches Institut für Wirtschaftsfrschung (DIW) 2 Center fr Applied Statistics and Ecnmics (CASE),
More information7 TH GRADE MATH STANDARDS
ALGEBRA STANDARDS Gal 1: Students will use the language f algebra t explre, describe, represent, and analyze number expressins and relatins 7 TH GRADE MATH STANDARDS 7.M.1.1: (Cmprehensin) Select, use,
More informationPre-Calculus Individual Test 2017 February Regional
The abbreviatin NOTA means Nne f the Abve answers and shuld be chsen if chices A, B, C and D are nt crrect. N calculatr is allwed n this test. Arcfunctins (such as y = Arcsin( ) ) have traditinal restricted
More informationInflow Control on Expressway Considering Traffic Equilibria
Memirs f the Schl f Engineering, Okayama University Vl. 20, N.2, February 1986 Inflw Cntrl n Expressway Cnsidering Traffic Equilibria Hirshi INOUYE* (Received February 14, 1986) SYNOPSIS When expressway
More informationModule 4: General Formulation of Electric Circuit Theory
Mdule 4: General Frmulatin f Electric Circuit Thery 4. General Frmulatin f Electric Circuit Thery All electrmagnetic phenmena are described at a fundamental level by Maxwell's equatins and the assciated
More informationCOMP9414/ 9814/ 3411: Artificial Intelligence. 14. Course Review. COMP3411 c UNSW, 2014
COMP9414/ 9814/ 3411: Artificial Intelligence 14. Curse Review COMP9414/9814/3411 14s1 Review 1 Assessment Assessable cmpnents f the curse: Assignment 1 10% Assignment 2 8% Assignment 3 12% Written Eam
More informationDepartment of Economics, University of California, Davis Ecn 200C Micro Theory Professor Giacomo Bonanno. Insurance Markets
Department f Ecnmics, University f alifrnia, Davis Ecn 200 Micr Thery Prfessr Giacm Bnann Insurance Markets nsider an individual wh has an initial wealth f. ith sme prbability p he faces a lss f x (0
More informationResampling Methods. Chapter 5. Chapter 5 1 / 52
Resampling Methds Chapter 5 Chapter 5 1 / 52 1 51 Validatin set apprach 2 52 Crss validatin 3 53 Btstrap Chapter 5 2 / 52 Abut Resampling An imprtant statistical tl Pretending the data as ppulatin and
More informationLHS Mathematics Department Honors Pre-Calculus Final Exam 2002 Answers
LHS Mathematics Department Hnrs Pre-alculus Final Eam nswers Part Shrt Prblems The table at the right gives the ppulatin f Massachusetts ver the past several decades Using an epnential mdel, predict the
More informationDetermining the Accuracy of Modal Parameter Estimation Methods
Determining the Accuracy f Mdal Parameter Estimatin Methds by Michael Lee Ph.D., P.E. & Mar Richardsn Ph.D. Structural Measurement Systems Milpitas, CA Abstract The mst cmmn type f mdal testing system
More informationCAUSAL INFERENCE. Technical Track Session I. Phillippe Leite. The World Bank
CAUSAL INFERENCE Technical Track Sessin I Phillippe Leite The Wrld Bank These slides were develped by Christel Vermeersch and mdified by Phillippe Leite fr the purpse f this wrkshp Plicy questins are causal
More informationComputational modeling techniques
Cmputatinal mdeling techniques Lecture 11: Mdeling with systems f ODEs In Petre Department f IT, Ab Akademi http://www.users.ab.fi/ipetre/cmpmd/ Mdeling with differential equatins Mdeling strategy Fcus
More informationBOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS. Christopher Costello, Andrew Solow, Michael Neubert, and Stephen Polasky
BOUNDED UNCERTAINTY AND CLIMATE CHANGE ECONOMICS Christpher Cstell, Andrew Slw, Michael Neubert, and Stephen Plasky Intrductin The central questin in the ecnmic analysis f climate change plicy cncerns
More informationEngineering Decision Methods
GSOE9210 vicj@cse.unsw.edu.au www.cse.unsw.edu.au/~gs9210 Maximin and minimax regret 1 2 Indifference; equal preference 3 Graphing decisin prblems 4 Dminance The Maximin principle Maximin and minimax Regret
More informationOn Huntsberger Type Shrinkage Estimator for the Mean of Normal Distribution ABSTRACT INTRODUCTION
Malaysian Jurnal f Mathematical Sciences 4(): 7-4 () On Huntsberger Type Shrinkage Estimatr fr the Mean f Nrmal Distributin Department f Mathematical and Physical Sciences, University f Nizwa, Sultanate
More informationSmoothing, penalized least squares and splines
Smthing, penalized least squares and splines Duglas Nychka, www.image.ucar.edu/~nychka Lcally weighted averages Penalized least squares smthers Prperties f smthers Splines and Reprducing Kernels The interplatin
More informationStats Classification Ji Zhu, Michigan Statistics 1. Classification. Ji Zhu 445C West Hall
Stats 415 - Classificatin Ji Zhu, Michigan Statistics 1 Classificatin Ji Zhu 445C West Hall 734-936-2577 jizhu@umich.edu Stats 415 - Classificatin Ji Zhu, Michigan Statistics 2 Examples f Classificatin
More informationHypothesis Tests for One Population Mean
Hypthesis Tests fr One Ppulatin Mean Chapter 9 Ala Abdelbaki Objective Objective: T estimate the value f ne ppulatin mean Inferential statistics using statistics in rder t estimate parameters We will be
More informationCambridge Assessment International Education Cambridge Ordinary Level. Published
Cambridge Assessment Internatinal Educatin Cambridge Ordinary Level ADDITIONAL MATHEMATICS 4037/1 Paper 1 Octber/Nvember 017 MARK SCHEME Maximum Mark: 80 Published This mark scheme is published as an aid
More informationModelling of Clock Behaviour. Don Percival. Applied Physics Laboratory University of Washington Seattle, Washington, USA
Mdelling f Clck Behaviur Dn Percival Applied Physics Labratry University f Washingtn Seattle, Washingtn, USA verheads and paper fr talk available at http://faculty.washingtn.edu/dbp/talks.html 1 Overview
More informationx x
Mdeling the Dynamics f Life: Calculus and Prbability fr Life Scientists Frederick R. Adler cfrederick R. Adler, Department f Mathematics and Department f Bilgy, University f Utah, Salt Lake City, Utah
More informationFive Whys How To Do It Better
Five Whys Definitin. As explained in the previus article, we define rt cause as simply the uncvering f hw the current prblem came int being. Fr a simple causal chain, it is the entire chain. Fr a cmplex
More informationLim f (x) e. Find the largest possible domain and its discontinuity points. Why is it discontinuous at those points (if any)?
THESE ARE SAMPLE QUESTIONS FOR EACH OF THE STUDENT LEARNING OUTCOMES (SLO) SET FOR THIS COURSE. SLO 1: Understand and use the cncept f the limit f a functin i. Use prperties f limits and ther techniques,
More informationCompressibility Effects
Definitin f Cmpressibility All real substances are cmpressible t sme greater r lesser extent; that is, when yu squeeze r press n them, their density will change The amunt by which a substance can be cmpressed
More informationand the Doppler frequency rate f R , can be related to the coefficients of this polynomial. The relationships are:
Algrithm fr Estimating R and R - (David Sandwell, SIO, August 4, 2006) Azimith cmpressin invlves the alignment f successive eches t be fcused n a pint target Let s be the slw time alng the satellite track
More informationChapter Summary. Mathematical Induction Strong Induction Recursive Definitions Structural Induction Recursive Algorithms
Chapter 5 1 Chapter Summary Mathematical Inductin Strng Inductin Recursive Definitins Structural Inductin Recursive Algrithms Sectin 5.1 3 Sectin Summary Mathematical Inductin Examples f Prf by Mathematical
More informationIntroduction: A Generalized approach for computing the trajectories associated with the Newtonian N Body Problem
A Generalized apprach fr cmputing the trajectries assciated with the Newtnian N Bdy Prblem AbuBar Mehmd, Syed Umer Abbas Shah and Ghulam Shabbir Faculty f Engineering Sciences, GIK Institute f Engineering
More informationOptimization Programming Problems For Control And Management Of Bacterial Disease With Two Stage Growth/Spread Among Plants
Internatinal Jurnal f Engineering Science Inventin ISSN (Online): 9 67, ISSN (Print): 9 676 www.ijesi.rg Vlume 5 Issue 8 ugust 06 PP.0-07 Optimizatin Prgramming Prblems Fr Cntrl nd Management Of Bacterial
More informationYou need to be able to define the following terms and answer basic questions about them:
CS440/ECE448 Sectin Q Fall 2017 Midterm Review Yu need t be able t define the fllwing terms and answer basic questins abut them: Intr t AI, agents and envirnments Pssible definitins f AI, prs and cns f
More informationChecking the resolved resonance region in EXFOR database
Checking the reslved resnance regin in EXFOR database Gttfried Bertn Sciété de Calcul Mathématique (SCM) Oscar Cabells OECD/NEA Data Bank JEFF Meetings - Sessin JEFF Experiments Nvember 0-4, 017 Bulgne-Billancurt,
More informationData Mining: Concepts and Techniques. Classification and Prediction. Chapter February 8, 2007 CSE-4412: Data Mining 1
Data Mining: Cncepts and Techniques Classificatin and Predictin Chapter 6.4-6 February 8, 2007 CSE-4412: Data Mining 1 Chapter 6 Classificatin and Predictin 1. What is classificatin? What is predictin?
More informationPhysics 2B Chapter 23 Notes - Faraday s Law & Inductors Spring 2018
Michael Faraday lived in the Lndn area frm 1791 t 1867. He was 29 years ld when Hand Oersted, in 1820, accidentally discvered that electric current creates magnetic field. Thrugh empirical bservatin and
More informationFIELD QUALITY IN ACCELERATOR MAGNETS
FIELD QUALITY IN ACCELERATOR MAGNETS S. Russenschuck CERN, 1211 Geneva 23, Switzerland Abstract The field quality in the supercnducting magnets is expressed in terms f the cefficients f the Furier series
More informationFloating Point Method for Solving Transportation. Problems with Additional Constraints
Internatinal Mathematical Frum, Vl. 6, 20, n. 40, 983-992 Flating Pint Methd fr Slving Transprtatin Prblems with Additinal Cnstraints P. Pandian and D. Anuradha Department f Mathematics, Schl f Advanced
More informationNUMBERS, MATHEMATICS AND EQUATIONS
AUSTRALIAN CURRICULUM PHYSICS GETTING STARTED WITH PHYSICS NUMBERS, MATHEMATICS AND EQUATIONS An integral part t the understanding f ur physical wrld is the use f mathematical mdels which can be used t
More informationMaterials Engineering 272-C Fall 2001, Lecture 7 & 8 Fundamentals of Diffusion
Materials Engineering 272-C Fall 2001, Lecture 7 & 8 Fundamentals f Diffusin Diffusin: Transprt in a slid, liquid, r gas driven by a cncentratin gradient (r, in the case f mass transprt, a chemical ptential
More informationthe results to larger systems due to prop'erties of the projection algorithm. First, the number of hidden nodes must
M.E. Aggune, M.J. Dambrg, M.A. El-Sharkawi, R.J. Marks II and L.E. Atlas, "Dynamic and static security assessment f pwer systems using artificial neural netwrks", Prceedings f the NSF Wrkshp n Applicatins
More informationCS 109 Lecture 23 May 18th, 2016
CS 109 Lecture 23 May 18th, 2016 New Datasets Heart Ancestry Netflix Our Path Parameter Estimatin Machine Learning: Frmally Many different frms f Machine Learning We fcus n the prblem f predictin Want
More information