Random Variables and Densities

Size: px
Start display at page:

Download "Random Variables and Densities"

Transcription

1 Rando Variables and Densities Review: Probabilit and Statistics Sa Roweis Rando variables X represents outcoes or states of world. Instantiations of variables usuall in lower case: We will write p() to ean probabilit(x = ). Saple Space: the space of all possible outcoes/states. (Ma be discrete or continuous or ied.) Probabilit ass (densit) function p() 0 Assigns a non-negative nuber to each point in saple space. Sus (integrates) to unit: p() = 1 or p()d = 1. Intuitivel: how often does occur, how uch do we believe in. Enseble: rando variable + saple space+ probabilit function October 2, 2002 Probabilit We use probabilities p() to represent our beliefs B() about the states of the world. There is a foral calculus for anipulating uncertainties represented b probabilities. An consistent set of beliefs obeing the Co Aios can be apped into probabilities. 1. Rationall ordered degrees of belief: if B() > B() and B() > B(z) then B() > B(z) 2. Belief in and its negation are related: B() = f[b( )] 3. Belief in conjunction depends onl on conditionals: B( and ) = g[b(), B( )] = g[b(), B( )] Epectations, Moents Epectation of a function a() is written E[a] or a E[a] = a = p()a() e.g. ean = p(), variance = ( E[])2 p() Moents are epectations of higher order powers. (Mean is first oent. Autocorrelation is second oent.) Centralized oents have lower oents subtracted awa (e.g. variance, skew, curtosis). Deep fact: Knowledge of all orders of oents copletel defines the entire distribution.

2 Joint Probabilit Ke concept: two or ore rando variables a interact. Thus, the probabilit of one taking on a certain value depends on which value(s) the others are taking. We call this a joint enseble and write p(, ) = prob(x = and Y = ) Conditional Probabilit If we know that soe event has occurred, it changes our belief about the probabilit of other events. This is like taking a slice through the joint table. p( ) = p(, )/p() z z p(, z) p(,,z) Marginal Probabilities We can su out part of a joint distribution to get the arginal distribution of a subset of variables: p() = p(, ) This is like adding slices of the table together. Σ z z p(,) Baes Rule Manipulating the basic definition of conditional probabilit gives one of the ost iportant forulas in probabilit theor: p( ) = p( )p() p() = p( )p() p( )p( ) This gives us a wa of reversing conditional probabilities. Thus, all joint probabilities can be factored b selecting an ordering for the rando variables and using the chain rule : p(,, z,...) = p()p( )p(z, )p(...,, z) Another equivalent definition: p() = p( )p().

3 Independence & Conditional Independence Entrop Two variables are independent iff their joint factors: p(, ) = p()p() p(,) = p() p() Measures the aount of abiguit or uncertaint in a distribution: H(p) = p() log p() Epected value of log p() (a function which depends on p()!). H(p) > 0 unless onl one possible outcoein which case H(p) = 0. Maial value when p is unifor. Tells ou the epected cost if each event costs log p(event) Two variables are conditionall independent given a third one if for all values of the conditioning variable, the resulting slice factors: p(, z) = p( z)p( z) z Watch the contet: e.g. Sipson s parado Be Careful! Define rando variables and saple spaces carefull: e.g. Prisoner s parado Cross Entrop (KL Divergence) An assetric easure of the distancebetween two distributions: KL[p q] = p()[log p() log q()] KL > 0 unless p = q then KL = 0 Tells ou the etra cost if events were generated b p() but instead of charging under p() ou charged under q().

4 Statistics Probabilit: inferring probabilistic quantities for data given fied odels (e.g. prob. of events, arginals, conditionals, etc). Statistics: inferring a odel given fied data observations (e.g. clustering, classification, regression). Man approaches to statistics: frequentist, Baesian, decision theor,... (Conditional) Probabilit Tables For discrete (categorical) quantities, the ost basic paraetrization is the probabilit table which lists p( i = k th value). Since PTs ust be nonnegative and su to 1, for k-ar variables there are k 1 free paraeters. If a discrete variable is conditioned on the values of soe other discrete variables we ake one table for each possible setting of the parents: these are called conditional probabilit tables or CPTs. z z p(,,z) p(, z) Soe (Conditional) Probabilit Functions Probabilit densit functions p() (for continuous variables) or probabilit ass functions p( = k) (for discrete variables) tell us how likel it is to get a particular value for a rando variable (possibl conditioned on the values of soe other variables.) We can consider various tpes of variables: binar/discrete (categorical), continuous, interval, and integer counts. For each tpe we ll see soe basic probabilit odels which are paraetrized failies of distributions. Eponential Fail For (continuous or discrete) rando variable p( η) = h() ep{η T () A(η)} = 1 Z(η) h() ep{η T ()} is an eponential fail distribution with natural paraeter η. Function T () is a sufficient statistic. Function A(η) = log Z(η) is the log noralizer. Ke idea: all ou need to know about the data is captured in the suarizing function T ().

5 Bernoulli For a binar rando variable with p(heads)=π: p( π) = π (1 π) { ( 1 ) } π = ep log + log(1 π) 1 π Eponential fail with: π η = log 1 π T () = A(η) = log(1 π) = log(1 + e η ) h() = 1 The logistic function relates the natural paraeter and the chance of heads 1 π = 1 + e η Multinoial For a set of integer counts on k trials k! p( π) = 1! 2! n! π 1 1 π 2 2 π n n = h() ep i log π i i But the paraeters are constrained: i π i = 1. So we define the last one π n = 1 n 1 i=1 π i. { n 1 ( ) } p( π) = h() ep i=1 log πi π n i + k log π n Eponential fail with: η i = log π i log π n T ( i ) = i A(η) = k log π n = k log i eη i h() = k!/ 1! 2! n! Poisson For an integer count variable with rate λ: p( λ) = λ e λ! = 1 ep{ log λ λ}! The softa function relates the basic and natural paraeters: π i = eη i j eη j Eponential fail with: η = log λ T () = A(η) = λ = e η h() = 1! e.g. nuber of photons that arrive at a piel during a fied interval given ean intensit λ Other count densities: binoial, eponential.

6 Gaussian (noral) For a continuous univariate rando variable: p( µ, σ 2 ) = 1 { ep 1 } 2πσ 2σ2( µ)2 Eponential fail with: = 1 2πσ ep η = [µ/σ 2 ; 1/2σ 2 ] T () = [ ; 2 ] A(η) = log σ + µ/2σ 2 h() = 1/ 2π { µ σ 2 2 2σ 2 µ2 2σ 2 log σ Note: a univariate Gaussian is a two-paraeter distribution with a two-coponent vector of sufficient statistis. } Iportant Gaussian Facts All arginals of a Gaussian are again Gaussian. An conditional of a Gaussian is again Gaussian. p( =0) 0 p(,) p() Σ Multivariate Gaussian Distribution For a continuous vector rando variable: { p( µ, Σ) = 2πΣ 1/2 ep 1 } 2 ( µ) Σ 1 ( µ) Eponential fail with: η = [Σ 1 µ ; 1/2Σ 1 ] T () = [ ; ] A(η) = log Σ /2 + µ Σ 1 µ/2 h() = (2π) n/2 Sufficient statistics: ean vector and correlation atri. Other densities: Student-t, Laplacian. For non-negative values use eponential, Gaa, log-noral. Gaussian Marginals/Conditionals To find these paraeters is ostl linear algebra: Let z = [ ] be norall distributed according to: [ ] ([ ] [ ]) a A C z = N ; b C B where C is the (non-setric) cross-covariance atri between and which has as an rows as the size of and as an coluns as the size of. The arginal distributions are: N (a; A) N (b; B) and the conditional distributions are: N (a + CB 1 ( b); A CB 1 C ) N (b + C A 1 ( a); B C A 1 C)

7 Moents For continuous variables, oent calculations are iportant. We can easil copute oents of an eponential fail distribution b taking the derivatives of the log noralizer A(η). The q th derivative gives the q th centred oent. da(η) = ean dη d 2 A(η) dη 2 = variance When the sufficient statistic is a vector, partial derivatives need to be considered. Generalized Linear Models (GLMs) Generalized Linear Models: p( ) is eponential fail with conditional ean µ = f(θ ). The function f is called the response function. If we chose f to be the inverse of the apping b/w conditional ean and natural paraeters then it is called the canonical response function. η = ψ(µ) f( ) = ψ 1 ( ) Paraeterizing Conditionals When the variable(s) being conditioned on (parents) are discrete, we just have one densit for each possible setting of the parents. e.g. a table of natural paraeters in eponential odels or a table of tables for discrete odels. When the conditioned variable is continuous, its value sets soe of the paraeters for the other variables. A ver coon instance of this for regression is the linear-gaussian : p( ) = gauss(θ ; Σ). For discrete children and continuous parents, we often use a Bernoulli/ultinoial whose paraters are soe function f(θ ). Potential Functions We can be even ore general and define distributions b arbitrar energ functions proportional to the log probabilit. p() ep{ H k ()} k A coon choice is to use pairwise ters in the energ: H() = a i i + w ij i j i pairs ij

8 Special variables If certain variables are alwas observed we a not want to odel their densit. For eaple inputs in regression or classification. This leads to conditional densit estiation. If certain variables are alwas unobserved, the are called hidden or latent variables. The can alwas be arginalized out, but can ake the densit odeling of the observed variables easier. (We ll see ore on this later.) Likelihood Function So far we have focused on the (log) probabilit function p( θ) which assigns a probabilit (densit) to an joint configuration of variables given fied paraeters θ. But in learning we turn this on its head: we have soe fied data and we want to find paraeters. Think of p( θ) as a function of θ for fied : L(θ; ) = p( θ) l(θ; ) = log p( θ) This function is called the (log) likelihood. Chose θ to aiize soe cost function c(θ) which includes l(θ): c(θ) = l(θ; D) c(θ) = l(θ; D) + r(θ) aiu likelihood (ML) aiu a posteriori (MAP)/penalizedML (also cross-validation, Baesian estiators, BIC, AIC,...) Multiple Observations, Coplete Data, IID Sapling A single observation of the data X is rarel useful on its own. Generall we have data including an observations, which creates a set of rando variables: D = { 1, 2,..., M } Two ver coon assuptions: 1. Observations are independentl and identicall distributed according to joint distribution of graphical odel: IID saples. 2. We observe all rando variables in the doain on each observation: coplete data. For IID data: Maiu Likelihood p(d θ) = l(θ; D) = p( θ) log p( θ) Idea of aiu likelihod estiation (MLE): pick the setting of paraeters ost likel to have generated the data we saw: θ ML = arga θ l(θ; D) Ver coonl used in statistics. Often leads to intuitive, appealing, or natural estiators.

9 Eaple: Bernoulli Trials We observe M iid coin flips: D=H,H,T,H,... Model: p(h) = θ p(t ) = (1 θ) Likelihood: l(θ; D) = log p(d θ) = log θ (1 θ) 1 = log θ + log(1 θ) (1 ) = log θn H + log(1 θ)n T Take derivatives and set to zero: l θ = N H θ N T 1 θ θml = N H N H + N T Eaple: Univariate Noral We observe M iid real saples: D=1.18,-.25,.78,... Model: p() = (2πσ 2 ) 1/2 ep{ ( µ) 2 /2σ 2 } Likelihood (using probabilit densit): l(θ; D) = log p(d θ) = M 2 log(2πσ2 ) 1 ( µ) 2 2 σ 2 Take derivatives and set to zero: l µ = (1/σ2 ) ( µ) l σ 2 = M 2σ σ 4 ( µ) 2 µ ML = (1/M) σml 2 = (1/M) 2 µ 2 ML Eaple: Multinoial Eaple: Univariate Noral We observe M iid die rolls (K-sided): D=3,1,K,2,... Model: p(k) = θ k k θ k = 1 Likelihood (for binar indicators [ = k]): l(θ; D) = log p(d θ) = log θ = log θ [ =1] 1... θ [ =k] k = log θ k [ = k] = N k log θ k k k Take derivatives and set to zero (enforcing k θ k = 1): l = N k M θ k θ k θ k = N k M A B

10 Eaple: Linear Regression In linear regression, soe inputs (covariates,parents) and all outputs (responses,children) are continuous valued variables. For each child and setting of discrete parents we use the odel: p(, θ) = gauss( θ, σ 2 ) The likelihood is the failiar squared error cost: l(θ; D) = 1 2σ 2 ( θ ) 2 The ML paraeters can be solved for using linear least-squares: l θ = ( θ ) θml = (X X) 1 X Y Sufficient Statistics A statistic is a function of a rando variable. T (X) is a sufficient statistic for X if T ( 1 ) = T ( 2 ) L(θ; 1 ) = L(θ; 2 ) θ Equivalentl (b the Nean factorization theore) we can write: p( θ) = h (, T ()) g (T (), θ) Eaple: eponential fail odels: p( θ) = h() ep{η T () A(η)} Eaple: Linear Regression Sufficient Statistics are Sus X In the eaples above, the sufficient statistics were erel sus (counts) of the data: Bernoulli: # of heads, tails Multinoial: # of each tpe Gaussian: ean, ean-square Regression: correlations As we will see, this is true for all eponential fail odels: sufficient statistics are average natural paraeters. Onl eponential fail odels have siple sufficient statistics. Y

11 MLE for Eponential Fail Models Recall the probabilit function for eponential odels: p( θ) = h() ep{η T () A(η)} For iid data, sufficient statistic is T ( ): ( ) ( l(η; D) = log p(d η) = log h( ) M A(η)+ η ) T ( ) Take derivatives and set to zero: l η = T ( ) M A(η) η A(η) η = M 1 T ( ) η ML = M 1 T ( ) recalling that the natural oents of an eponential distribution are the derivatives of the log noralizer. Fundaental Operations with Distributions Generate data: draw saples fro the distribution. This often involves generating a uniforl distributed variable in the range [0,1] and transforing it. For ore cople distributions it a involve an iterative procedure that takes a long tie to produce a single saple (e.g. Gibbs sapling, MCMC). Copute log probabilities. When all variables are either observed or arginalized the result is a single nuber which is the log prob of the configuration. Inference: Copute epectations of soe variables given others which are observed or arginalized. Learning. Set the paraeters of the densit functions given soe (partiall) observed data to aiize likelihood or penalized likelihood. Basic Statistical Probles Let s reind ourselves of the basic probles we discussed on the first da: densit estiation, clustering classification and regression. Densit estiation is hardest. If we can do joint densit estiation then we can alwas condition to get what we want: Regression: p( ) = p(, )/p() Classification: p(c ) = p(c, )/p() Clustering: p(c ) = p(c, )/p() c unobserved Learning In AI the bottleneck is often knowledge acquisition. Huan eperts are rare, epensive, unreliable, slow. But we have lots of data. Want to build sstes autoaticall based on data and a sall aount of prior inforation (fro eperts).

12 Known Models Man sstes we build will be essentiall probabilit odels. Assue the prior inforation we have specifies tpe & structure of the odel, as well as the for of the (conditional) distributions or potentials. In this case learning setting paraeters. Also possible to do structure learning to learn odel. Jensen s Inequalit For an concave function f() and an distribution on, E[f()] f(e[]) f(e[]) E[f()] e.g. log() and are concave This allows us to bound epressions like log p() = log z p(, z)

CS Lecture 13. More Maximum Likelihood

CS Lecture 13. More Maximum Likelihood CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood

More information

Problem Set 2 Due Sept, 21

Problem Set 2 Due Sept, 21 EE6: Rando Processes in Sstes Lecturer: Jean C. Walrand Proble Set Due Sept, Fall 6 GSI: Assane Guee This proble set essentiall reviews notions of conditional epectation, conditional distribution, and

More information

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians

Using EM To Estimate A Probablity Density With A Mixture Of Gaussians Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points

More information

Estimating Parameters for a Gaussian pdf

Estimating Parameters for a Gaussian pdf Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3

More information

Lecture 2: Simple Classifiers

Lecture 2: Simple Classifiers CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412

More information

Block designs and statistics

Block designs and statistics Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent

More information

Probability Distributions

Probability Distributions Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples

More information

Lecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan

Lecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan Lecture 2: Parameter Learning in Fully Observed Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Likelihood Functions So far we have focused on the (log) probability

More information

Feature Extraction Techniques

Feature Extraction Techniques Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that

More information

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples

Estimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples Open Journal of Statistics, 4, 4, 64-649 Published Online Septeber 4 in SciRes http//wwwscirporg/ournal/os http//ddoiorg/436/os4486 Estiation of the Mean of the Eponential Distribution Using Maiu Ranked

More information

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices

13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay

More information

Will Monroe August 9, with materials by Mehran Sahami and Chris Piech. image: Arito. Parameter learning

Will Monroe August 9, with materials by Mehran Sahami and Chris Piech. image: Arito. Parameter learning Will Monroe August 9, 07 with aterials by Mehran Sahai and Chris Piech iage: Arito Paraeter learning Announceent: Proble Set #6 Goes out tonight. Due the last day of class, Wednesday, August 6 (before

More information

OBJECTIVES INTRODUCTION

OBJECTIVES INTRODUCTION M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Bayes Decision Rule and Naïve Bayes Classifier

Bayes Decision Rule and Naïve Bayes Classifier Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori

More information

Electromagnetics I Exam No. 3 December 1, 2003 Solution

Electromagnetics I Exam No. 3 December 1, 2003 Solution Electroagnetics Ea No. 3 Deceber 1, 2003 Solution Please read the ea carefull. Solve the folloing 4 probles. Each proble is 1/4 of the grade. To receive full credit, ou ust sho all ork. f cannot understand

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October

More information

The Weierstrass Approximation Theorem

The Weierstrass Approximation Theorem 36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined

More information

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution

Keywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality

More information

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters

The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn

More information

Random Variables and Densities

Random Variables and Densities Random Variables and Densities Review: Probability and Statistics Sam Roweis Random variables X represents outcomes or states of world. Instantiations of variables usually in lower case: We will write

More information

Physics 139B Solutions to Homework Set 3 Fall 2009

Physics 139B Solutions to Homework Set 3 Fall 2009 Physics 139B Solutions to Hoework Set 3 Fall 009 1. Consider a particle of ass attached to a rigid assless rod of fixed length R whose other end is fixed at the origin. The rod is free to rotate about

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis

E0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds

More information

1 Bounding the Margin

1 Bounding the Margin COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Lecture 1: Probabilistic Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan

Lecture 1: Probabilistic Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan Lecture 1: Probabilistic Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Building Intelligent Computers We want intelligent, adaptive, robust behaviour in computers.

More information

1 Proof of learning bounds

1 Proof of learning bounds COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a

More information

What is Probability? (again)

What is Probability? (again) INRODUCTION TO ROBBILITY Basic Concepts and Definitions n experient is any process that generates well-defined outcoes. Experient: Record an age Experient: Toss a die Experient: Record an opinion yes,

More information

Solutions of some selected problems of Homework 4

Solutions of some selected problems of Homework 4 Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next

More information

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13

CSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13 CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture

More information

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine

Lecture October 23. Scribes: Ruixin Qiang and Alana Shine CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single

More information

36-720: Latent Class Models

36-720: Latent Class Models 36-720: Latent Class Models Brian Junker October 17, 2007 Latent Class Models and Simpson s Paradox Latent Class Model for a Table of Counts An Intuitive E-M Algorithm for Fitting Latent Class Models Deriving

More information

3.3 Variational Characterization of Singular Values

3.3 Variational Characterization of Singular Values 3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and

More information

16 Independence Definitions Potential Pitfall Alternative Formulation. mcs-ftl 2010/9/8 0:40 page 431 #437

16 Independence Definitions Potential Pitfall Alternative Formulation. mcs-ftl 2010/9/8 0:40 page 431 #437 cs-ftl 010/9/8 0:40 page 431 #437 16 Independence 16.1 efinitions Suppose that we flip two fair coins siultaneously on opposite sides of a roo. Intuitively, the way one coin lands does not affect the way

More information

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,

Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:

More information

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan

Lecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always

More information

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data

Supplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse

More information

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition

Pattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition

More information

Probability Theory Refresher

Probability Theory Refresher Machine Learning WS24 Module IN264 Sheet 2 Page Machine Learning Worksheet 2 Probabilit Theor Refresher Basic Probabilit Problem : A secret government agenc has developed a scanner which determines whether

More information

Computability and Complexity Random Sources. Computability and Complexity Andrei Bulatov

Computability and Complexity Random Sources. Computability and Complexity Andrei Bulatov Coputabilit and Copleit 29- Rando Sources Coputabilit and Copleit Andrei Bulatov Coputabilit and Copleit 29-2 Rando Choices We have seen several probabilistic algoriths, that is algoriths that ake soe

More information

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS

DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison

More information

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks

Intelligent Systems: Reasoning and Recognition. Artificial Neural Networks Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

X-Ray Notes, Part II

X-Ray Notes, Part II Noll X-ra Notes : Page X-Ra Notes Part Source ssues The Parallel X-ra aging Sste Earlier we consiere a parallel ra sste with an incient intensit that passes through a 3D object having a istribution of

More information

Random Process Examples 1/23

Random Process Examples 1/23 ando Process Eaples /3 E. #: D-T White Noise Let ] be a sequence of V s where each V ] in the sequence is uncorrelated with all the others: E{ ] ] } 0 for This DEFINES a DT White Noise Also called Uncorrelated

More information

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5

RAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5 Course STA0: Statistics and Probability Lecture No to 5 Multiple Choice Questions:. Statistics deals with: a) Observations b) Aggregates of facts*** c) Individuals d) Isolated ites. A nuber of students

More information

The full procedure for drawing a free-body diagram which isolates a body or system consists of the following steps. 8 Chapter 3 Equilibrium

The full procedure for drawing a free-body diagram which isolates a body or system consists of the following steps. 8 Chapter 3 Equilibrium 8 Chapter 3 Equilibriu all effect on a rigid bod as forces of equal agnitude and direction applied b direct eternal contact. Eaple 9 illustrates the action of a linear elastic spring and of a nonlinear

More information

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon

Model Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential

More information

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)

Bayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA) Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu

More information

PHYS 1443 Section 003 Lecture #22

PHYS 1443 Section 003 Lecture #22 PHYS 443 Section 003 Lecture # Monda, Nov. 4, 003. Siple Bloc-Spring Sste. Energ of the Siple Haronic Oscillator 3. Pendulu Siple Pendulu Phsical Pendulu orsion Pendulu 4. Siple Haronic Motion and Unifor

More information

In this chapter, we consider several graph-theoretic and probabilistic models

In this chapter, we consider several graph-theoretic and probabilistic models THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial

More information

Training an RBM: Contrastive Divergence. Sargur N. Srihari

Training an RBM: Contrastive Divergence. Sargur N. Srihari Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and

More information

Biostatistics Department Technical Report

Biostatistics Department Technical Report Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent

More information

Non-Parametric Non-Line-of-Sight Identification 1

Non-Parametric Non-Line-of-Sight Identification 1 Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon

Lecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic

More information

TABLE FOR UPPER PERCENTAGE POINTS OF THE LARGEST ROOT OF A DETERMINANTAL EQUATION WITH FIVE ROOTS. By William W. Chen

TABLE FOR UPPER PERCENTAGE POINTS OF THE LARGEST ROOT OF A DETERMINANTAL EQUATION WITH FIVE ROOTS. By William W. Chen TABLE FOR UPPER PERCENTAGE POINTS OF THE LARGEST ROOT OF A DETERMINANTAL EQUATION WITH FIVE ROOTS By Willia W. Chen The distribution of the non-null characteristic roots of a atri derived fro saple observations

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer

More information

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.

Handout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1. Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized

More information

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.

Probability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J. Probability and Stochastic Processes: A Friendly Introduction for Electrical and oputer Engineers Roy D. Yates and David J. Goodan Proble Solutions : Yates and Goodan,1..3 1.3.1 1.4.6 1.4.7 1.4.8 1..6

More information

A Simple Regression Problem

A Simple Regression Problem A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where

More information

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).

The proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013). A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with

More information

Physics 18 Spring 2011 Homework 3 - Solutions Wednesday February 2, 2011

Physics 18 Spring 2011 Homework 3 - Solutions Wednesday February 2, 2011 Phsics 18 Spring 2011 Hoework 3 - s Wednesda Februar 2, 2011 Make sure our nae is on our hoework, and please bo our final answer. Because we will be giving partial credit, be sure to attept all the probles,

More information

Gaussians. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Gaussians. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. Note to other teachers and users of these slides. Andrew would be delighted if you found this source aterial useful in giing your own lectures. Feel free to use these slides erbati, or to odify the to

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab

Support Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a

More information

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES

TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,

More information

Learnability of Gaussians with flexible variances

Learnability of Gaussians with flexible variances Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007

More information

Multi-Scale/Multi-Resolution: Wavelet Transform

Multi-Scale/Multi-Resolution: Wavelet Transform Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the

More information

Derivative at a point

Derivative at a point Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Derivative at a point Wat you need to know already: Te concept of liit and basic etods for coputing liits. Wat you can

More information

Sampling How Big a Sample?

Sampling How Big a Sample? C. G. G. Aitken, 1 Ph.D. Sapling How Big a Saple? REFERENCE: Aitken CGG. Sapling how big a saple? J Forensic Sci 1999;44(4):750 760. ABSTRACT: It is thought that, in a consignent of discrete units, a certain

More information

Improving Ground Based Telescope Focus through Joint Parameter Estimation. Maj J. Chris Zingarelli USAF AFIT/ENG

Improving Ground Based Telescope Focus through Joint Parameter Estimation. Maj J. Chris Zingarelli USAF AFIT/ENG Iproving Ground Based Telescope Focus through Joint Paraeter Estiation Maj J Chris Zingarelli USAF AFIT/ENG Lt Col Travis Blake DARPA/TTO - Space Systes Dr Stephen Cain USAF AFIT/ENG Abstract-- Space Surveillance

More information

Monte Carlo integration

Monte Carlo integration Monte Carlo integration Eample of a Monte Carlo sampler in D: imagine a circle radius L/ within a square of LL. If points are randoml generated over the square, what s the probabilit to hit within circle?

More information

POSITIVITY CONSTRAINT IMPLEMENTATIONS FOR MULTIFRAME BLIND DECONVOLUTION RECONSTRUCTIONS

POSITIVITY CONSTRAINT IMPLEMENTATIONS FOR MULTIFRAME BLIND DECONVOLUTION RECONSTRUCTIONS POSITIVITY CONSTRAINT IMPLEMENTATIONS FOR MULTIFRAME BLIND DECONVOLUTION RECONSTRUCTIONS A DISSERTATION SUBMITTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAWAI I AT MĀNOA IN PARTIAL FULFILLMENT OF

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging

More information

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines

Intelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes

More information

An Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period

An Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period An Approxiate Model for the Theoretical Prediction of the Velocity... 77 Central European Journal of Energetic Materials, 205, 2(), 77-88 ISSN 2353-843 An Approxiate Model for the Theoretical Prediction

More information

Section 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y

Section 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y Paul J. Bruillard MATH 0.970 Problem Set 6 An Introduction to Abstract Mathematics R. Bond and W. Keane Section 3.1: 3b,c,e,i, 4bd, 6, 9, 15, 16, 18c,e, 19a, 0, 1b Section 3.: 1f,i, e, 6, 1e,f,h, 13e,

More information

1 Generalization bounds based on Rademacher complexity

1 Generalization bounds based on Rademacher complexity COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges

More information

A Note on the Applied Use of MDL Approximations

A Note on the Applied Use of MDL Approximations A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

GEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS

GEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS ACTA UIVERSITATIS LODZIESIS FOLIA OECOOMICA 3(3142015 http://dx.doi.org/10.18778/0208-6018.314.03 Olesii Doronin *, Rostislav Maiboroda ** GEE ESTIMATORS I MIXTURE MODEL WITH VARYIG COCETRATIOS Abstract.

More information

Tracking using CONDENSATION: Conditional Density Propagation

Tracking using CONDENSATION: Conditional Density Propagation Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation

More information

Phys101 Lectures 13, 14 Momentum and Collisions

Phys101 Lectures 13, 14 Momentum and Collisions Phs0 Lectures 3, 4 Moentu and ollisions Ke points: Moentu and ipulse ondition for conservation of oentu and wh How to solve collision probles entre of ass Ref: 7-,,3,4,5,6,7,8,9,0. Page Moentu is a vector:

More information

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.

This model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t. CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

INF Introduction to classifiction Anne Solberg

INF Introduction to classifiction Anne Solberg INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300

More information

Binomial and Poisson Probability Distributions

Binomial and Poisson Probability Distributions Binoial and Poisson Probability Distributions There are a few discrete robability distributions that cro u any ties in hysics alications, e.g. QM, SM. Here we consider TWO iortant and related cases, the

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial

More information

Introduction to Machine Learning. Recitation 11

Introduction to Machine Learning. Recitation 11 Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,

More information

Simple Expressions for Success Run Distributions in Bernoulli Trials

Simple Expressions for Success Run Distributions in Bernoulli Trials Siple Epressions for Success Run Distributions in Bernoulli Trials Marco Muselli Istituto per i Circuiti Elettronici Consiglio Nazionale delle Ricerche via De Marini, 6-16149 Genova, Ital Eail: uselli@ice.ge.cnr.it

More information

Summary of Random Variable Concepts March 17, 2000

Summary of Random Variable Concepts March 17, 2000 Summar of Random Variable Concepts March 17, 2000 This is a list of important concepts we have covered, rather than a review that devives or eplains them. Tpes of random variables discrete A random variable

More information

Pattern Recognition and Machine Learning. Artificial Neural networks

Pattern Recognition and Machine Learning. Artificial Neural networks Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3

More information

Sharp Time Data Tradeoffs for Linear Inverse Problems

Sharp Time Data Tradeoffs for Linear Inverse Problems Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used

More information

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics

Tail Estimation of the Spectral Density under Fixed-Domain Asymptotics Tail Estiation of the Spectral Density under Fixed-Doain Asyptotics Wei-Ying Wu, Chae Young Li and Yiin Xiao Wei-Ying Wu, Departent of Statistics & Probability Michigan State University, East Lansing,

More information