Random Variables and Densities
|
|
- Oscar Barrett
- 6 years ago
- Views:
Transcription
1 Rando Variables and Densities Review: Probabilit and Statistics Sa Roweis Rando variables X represents outcoes or states of world. Instantiations of variables usuall in lower case: We will write p() to ean probabilit(x = ). Saple Space: the space of all possible outcoes/states. (Ma be discrete or continuous or ied.) Probabilit ass (densit) function p() 0 Assigns a non-negative nuber to each point in saple space. Sus (integrates) to unit: p() = 1 or p()d = 1. Intuitivel: how often does occur, how uch do we believe in. Enseble: rando variable + saple space+ probabilit function October 2, 2002 Probabilit We use probabilities p() to represent our beliefs B() about the states of the world. There is a foral calculus for anipulating uncertainties represented b probabilities. An consistent set of beliefs obeing the Co Aios can be apped into probabilities. 1. Rationall ordered degrees of belief: if B() > B() and B() > B(z) then B() > B(z) 2. Belief in and its negation are related: B() = f[b( )] 3. Belief in conjunction depends onl on conditionals: B( and ) = g[b(), B( )] = g[b(), B( )] Epectations, Moents Epectation of a function a() is written E[a] or a E[a] = a = p()a() e.g. ean = p(), variance = ( E[])2 p() Moents are epectations of higher order powers. (Mean is first oent. Autocorrelation is second oent.) Centralized oents have lower oents subtracted awa (e.g. variance, skew, curtosis). Deep fact: Knowledge of all orders of oents copletel defines the entire distribution.
2 Joint Probabilit Ke concept: two or ore rando variables a interact. Thus, the probabilit of one taking on a certain value depends on which value(s) the others are taking. We call this a joint enseble and write p(, ) = prob(x = and Y = ) Conditional Probabilit If we know that soe event has occurred, it changes our belief about the probabilit of other events. This is like taking a slice through the joint table. p( ) = p(, )/p() z z p(, z) p(,,z) Marginal Probabilities We can su out part of a joint distribution to get the arginal distribution of a subset of variables: p() = p(, ) This is like adding slices of the table together. Σ z z p(,) Baes Rule Manipulating the basic definition of conditional probabilit gives one of the ost iportant forulas in probabilit theor: p( ) = p( )p() p() = p( )p() p( )p( ) This gives us a wa of reversing conditional probabilities. Thus, all joint probabilities can be factored b selecting an ordering for the rando variables and using the chain rule : p(,, z,...) = p()p( )p(z, )p(...,, z) Another equivalent definition: p() = p( )p().
3 Independence & Conditional Independence Entrop Two variables are independent iff their joint factors: p(, ) = p()p() p(,) = p() p() Measures the aount of abiguit or uncertaint in a distribution: H(p) = p() log p() Epected value of log p() (a function which depends on p()!). H(p) > 0 unless onl one possible outcoein which case H(p) = 0. Maial value when p is unifor. Tells ou the epected cost if each event costs log p(event) Two variables are conditionall independent given a third one if for all values of the conditioning variable, the resulting slice factors: p(, z) = p( z)p( z) z Watch the contet: e.g. Sipson s parado Be Careful! Define rando variables and saple spaces carefull: e.g. Prisoner s parado Cross Entrop (KL Divergence) An assetric easure of the distancebetween two distributions: KL[p q] = p()[log p() log q()] KL > 0 unless p = q then KL = 0 Tells ou the etra cost if events were generated b p() but instead of charging under p() ou charged under q().
4 Statistics Probabilit: inferring probabilistic quantities for data given fied odels (e.g. prob. of events, arginals, conditionals, etc). Statistics: inferring a odel given fied data observations (e.g. clustering, classification, regression). Man approaches to statistics: frequentist, Baesian, decision theor,... (Conditional) Probabilit Tables For discrete (categorical) quantities, the ost basic paraetrization is the probabilit table which lists p( i = k th value). Since PTs ust be nonnegative and su to 1, for k-ar variables there are k 1 free paraeters. If a discrete variable is conditioned on the values of soe other discrete variables we ake one table for each possible setting of the parents: these are called conditional probabilit tables or CPTs. z z p(,,z) p(, z) Soe (Conditional) Probabilit Functions Probabilit densit functions p() (for continuous variables) or probabilit ass functions p( = k) (for discrete variables) tell us how likel it is to get a particular value for a rando variable (possibl conditioned on the values of soe other variables.) We can consider various tpes of variables: binar/discrete (categorical), continuous, interval, and integer counts. For each tpe we ll see soe basic probabilit odels which are paraetrized failies of distributions. Eponential Fail For (continuous or discrete) rando variable p( η) = h() ep{η T () A(η)} = 1 Z(η) h() ep{η T ()} is an eponential fail distribution with natural paraeter η. Function T () is a sufficient statistic. Function A(η) = log Z(η) is the log noralizer. Ke idea: all ou need to know about the data is captured in the suarizing function T ().
5 Bernoulli For a binar rando variable with p(heads)=π: p( π) = π (1 π) { ( 1 ) } π = ep log + log(1 π) 1 π Eponential fail with: π η = log 1 π T () = A(η) = log(1 π) = log(1 + e η ) h() = 1 The logistic function relates the natural paraeter and the chance of heads 1 π = 1 + e η Multinoial For a set of integer counts on k trials k! p( π) = 1! 2! n! π 1 1 π 2 2 π n n = h() ep i log π i i But the paraeters are constrained: i π i = 1. So we define the last one π n = 1 n 1 i=1 π i. { n 1 ( ) } p( π) = h() ep i=1 log πi π n i + k log π n Eponential fail with: η i = log π i log π n T ( i ) = i A(η) = k log π n = k log i eη i h() = k!/ 1! 2! n! Poisson For an integer count variable with rate λ: p( λ) = λ e λ! = 1 ep{ log λ λ}! The softa function relates the basic and natural paraeters: π i = eη i j eη j Eponential fail with: η = log λ T () = A(η) = λ = e η h() = 1! e.g. nuber of photons that arrive at a piel during a fied interval given ean intensit λ Other count densities: binoial, eponential.
6 Gaussian (noral) For a continuous univariate rando variable: p( µ, σ 2 ) = 1 { ep 1 } 2πσ 2σ2( µ)2 Eponential fail with: = 1 2πσ ep η = [µ/σ 2 ; 1/2σ 2 ] T () = [ ; 2 ] A(η) = log σ + µ/2σ 2 h() = 1/ 2π { µ σ 2 2 2σ 2 µ2 2σ 2 log σ Note: a univariate Gaussian is a two-paraeter distribution with a two-coponent vector of sufficient statistis. } Iportant Gaussian Facts All arginals of a Gaussian are again Gaussian. An conditional of a Gaussian is again Gaussian. p( =0) 0 p(,) p() Σ Multivariate Gaussian Distribution For a continuous vector rando variable: { p( µ, Σ) = 2πΣ 1/2 ep 1 } 2 ( µ) Σ 1 ( µ) Eponential fail with: η = [Σ 1 µ ; 1/2Σ 1 ] T () = [ ; ] A(η) = log Σ /2 + µ Σ 1 µ/2 h() = (2π) n/2 Sufficient statistics: ean vector and correlation atri. Other densities: Student-t, Laplacian. For non-negative values use eponential, Gaa, log-noral. Gaussian Marginals/Conditionals To find these paraeters is ostl linear algebra: Let z = [ ] be norall distributed according to: [ ] ([ ] [ ]) a A C z = N ; b C B where C is the (non-setric) cross-covariance atri between and which has as an rows as the size of and as an coluns as the size of. The arginal distributions are: N (a; A) N (b; B) and the conditional distributions are: N (a + CB 1 ( b); A CB 1 C ) N (b + C A 1 ( a); B C A 1 C)
7 Moents For continuous variables, oent calculations are iportant. We can easil copute oents of an eponential fail distribution b taking the derivatives of the log noralizer A(η). The q th derivative gives the q th centred oent. da(η) = ean dη d 2 A(η) dη 2 = variance When the sufficient statistic is a vector, partial derivatives need to be considered. Generalized Linear Models (GLMs) Generalized Linear Models: p( ) is eponential fail with conditional ean µ = f(θ ). The function f is called the response function. If we chose f to be the inverse of the apping b/w conditional ean and natural paraeters then it is called the canonical response function. η = ψ(µ) f( ) = ψ 1 ( ) Paraeterizing Conditionals When the variable(s) being conditioned on (parents) are discrete, we just have one densit for each possible setting of the parents. e.g. a table of natural paraeters in eponential odels or a table of tables for discrete odels. When the conditioned variable is continuous, its value sets soe of the paraeters for the other variables. A ver coon instance of this for regression is the linear-gaussian : p( ) = gauss(θ ; Σ). For discrete children and continuous parents, we often use a Bernoulli/ultinoial whose paraters are soe function f(θ ). Potential Functions We can be even ore general and define distributions b arbitrar energ functions proportional to the log probabilit. p() ep{ H k ()} k A coon choice is to use pairwise ters in the energ: H() = a i i + w ij i j i pairs ij
8 Special variables If certain variables are alwas observed we a not want to odel their densit. For eaple inputs in regression or classification. This leads to conditional densit estiation. If certain variables are alwas unobserved, the are called hidden or latent variables. The can alwas be arginalized out, but can ake the densit odeling of the observed variables easier. (We ll see ore on this later.) Likelihood Function So far we have focused on the (log) probabilit function p( θ) which assigns a probabilit (densit) to an joint configuration of variables given fied paraeters θ. But in learning we turn this on its head: we have soe fied data and we want to find paraeters. Think of p( θ) as a function of θ for fied : L(θ; ) = p( θ) l(θ; ) = log p( θ) This function is called the (log) likelihood. Chose θ to aiize soe cost function c(θ) which includes l(θ): c(θ) = l(θ; D) c(θ) = l(θ; D) + r(θ) aiu likelihood (ML) aiu a posteriori (MAP)/penalizedML (also cross-validation, Baesian estiators, BIC, AIC,...) Multiple Observations, Coplete Data, IID Sapling A single observation of the data X is rarel useful on its own. Generall we have data including an observations, which creates a set of rando variables: D = { 1, 2,..., M } Two ver coon assuptions: 1. Observations are independentl and identicall distributed according to joint distribution of graphical odel: IID saples. 2. We observe all rando variables in the doain on each observation: coplete data. For IID data: Maiu Likelihood p(d θ) = l(θ; D) = p( θ) log p( θ) Idea of aiu likelihod estiation (MLE): pick the setting of paraeters ost likel to have generated the data we saw: θ ML = arga θ l(θ; D) Ver coonl used in statistics. Often leads to intuitive, appealing, or natural estiators.
9 Eaple: Bernoulli Trials We observe M iid coin flips: D=H,H,T,H,... Model: p(h) = θ p(t ) = (1 θ) Likelihood: l(θ; D) = log p(d θ) = log θ (1 θ) 1 = log θ + log(1 θ) (1 ) = log θn H + log(1 θ)n T Take derivatives and set to zero: l θ = N H θ N T 1 θ θml = N H N H + N T Eaple: Univariate Noral We observe M iid real saples: D=1.18,-.25,.78,... Model: p() = (2πσ 2 ) 1/2 ep{ ( µ) 2 /2σ 2 } Likelihood (using probabilit densit): l(θ; D) = log p(d θ) = M 2 log(2πσ2 ) 1 ( µ) 2 2 σ 2 Take derivatives and set to zero: l µ = (1/σ2 ) ( µ) l σ 2 = M 2σ σ 4 ( µ) 2 µ ML = (1/M) σml 2 = (1/M) 2 µ 2 ML Eaple: Multinoial Eaple: Univariate Noral We observe M iid die rolls (K-sided): D=3,1,K,2,... Model: p(k) = θ k k θ k = 1 Likelihood (for binar indicators [ = k]): l(θ; D) = log p(d θ) = log θ = log θ [ =1] 1... θ [ =k] k = log θ k [ = k] = N k log θ k k k Take derivatives and set to zero (enforcing k θ k = 1): l = N k M θ k θ k θ k = N k M A B
10 Eaple: Linear Regression In linear regression, soe inputs (covariates,parents) and all outputs (responses,children) are continuous valued variables. For each child and setting of discrete parents we use the odel: p(, θ) = gauss( θ, σ 2 ) The likelihood is the failiar squared error cost: l(θ; D) = 1 2σ 2 ( θ ) 2 The ML paraeters can be solved for using linear least-squares: l θ = ( θ ) θml = (X X) 1 X Y Sufficient Statistics A statistic is a function of a rando variable. T (X) is a sufficient statistic for X if T ( 1 ) = T ( 2 ) L(θ; 1 ) = L(θ; 2 ) θ Equivalentl (b the Nean factorization theore) we can write: p( θ) = h (, T ()) g (T (), θ) Eaple: eponential fail odels: p( θ) = h() ep{η T () A(η)} Eaple: Linear Regression Sufficient Statistics are Sus X In the eaples above, the sufficient statistics were erel sus (counts) of the data: Bernoulli: # of heads, tails Multinoial: # of each tpe Gaussian: ean, ean-square Regression: correlations As we will see, this is true for all eponential fail odels: sufficient statistics are average natural paraeters. Onl eponential fail odels have siple sufficient statistics. Y
11 MLE for Eponential Fail Models Recall the probabilit function for eponential odels: p( θ) = h() ep{η T () A(η)} For iid data, sufficient statistic is T ( ): ( ) ( l(η; D) = log p(d η) = log h( ) M A(η)+ η ) T ( ) Take derivatives and set to zero: l η = T ( ) M A(η) η A(η) η = M 1 T ( ) η ML = M 1 T ( ) recalling that the natural oents of an eponential distribution are the derivatives of the log noralizer. Fundaental Operations with Distributions Generate data: draw saples fro the distribution. This often involves generating a uniforl distributed variable in the range [0,1] and transforing it. For ore cople distributions it a involve an iterative procedure that takes a long tie to produce a single saple (e.g. Gibbs sapling, MCMC). Copute log probabilities. When all variables are either observed or arginalized the result is a single nuber which is the log prob of the configuration. Inference: Copute epectations of soe variables given others which are observed or arginalized. Learning. Set the paraeters of the densit functions given soe (partiall) observed data to aiize likelihood or penalized likelihood. Basic Statistical Probles Let s reind ourselves of the basic probles we discussed on the first da: densit estiation, clustering classification and regression. Densit estiation is hardest. If we can do joint densit estiation then we can alwas condition to get what we want: Regression: p( ) = p(, )/p() Classification: p(c ) = p(c, )/p() Clustering: p(c ) = p(c, )/p() c unobserved Learning In AI the bottleneck is often knowledge acquisition. Huan eperts are rare, epensive, unreliable, slow. But we have lots of data. Want to build sstes autoaticall based on data and a sall aount of prior inforation (fro eperts).
12 Known Models Man sstes we build will be essentiall probabilit odels. Assue the prior inforation we have specifies tpe & structure of the odel, as well as the for of the (conditional) distributions or potentials. In this case learning setting paraeters. Also possible to do structure learning to learn odel. Jensen s Inequalit For an concave function f() and an distribution on, E[f()] f(e[]) f(e[]) E[f()] e.g. log() and are concave This allows us to bound epressions like log p() = log z p(, z)
CS Lecture 13. More Maximum Likelihood
CS 6347 Lecture 13 More Maxiu Likelihood Recap Last tie: Introduction to axiu likelihood estiation MLE for Bayesian networks Optial CPTs correspond to epirical counts Today: MLE for CRFs 2 Maxiu Likelihood
More informationProblem Set 2 Due Sept, 21
EE6: Rando Processes in Sstes Lecturer: Jean C. Walrand Proble Set Due Sept, Fall 6 GSI: Assane Guee This proble set essentiall reviews notions of conditional epectation, conditional distribution, and
More informationUsing EM To Estimate A Probablity Density With A Mixture Of Gaussians
Using EM To Estiate A Probablity Density With A Mixture Of Gaussians Aaron A. D Souza adsouza@usc.edu Introduction The proble we are trying to address in this note is siple. Given a set of data points
More informationEstimating Parameters for a Gaussian pdf
Pattern Recognition and achine Learning Jaes L. Crowley ENSIAG 3 IS First Seester 00/0 Lesson 5 7 Noveber 00 Contents Estiating Paraeters for a Gaussian pdf Notation... The Pattern Recognition Proble...3
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationBlock designs and statistics
Bloc designs and statistics Notes for Math 447 May 3, 2011 The ain paraeters of a bloc design are nuber of varieties v, bloc size, nuber of blocs b. A design is built on a set of v eleents. Each eleent
More informationProbability Distributions
Probability Distributions In Chapter, we ephasized the central role played by probability theory in the solution of pattern recognition probles. We turn now to an exploration of soe particular exaples
More informationLecture 2: Parameter Learning in Fully Observed Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan
Lecture 2: Parameter Learning in Fully Observed Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Likelihood Functions So far we have focused on the (log) probability
More informationFeature Extraction Techniques
Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that
More informationEstimation of the Mean of the Exponential Distribution Using Maximum Ranked Set Sampling with Unequal Samples
Open Journal of Statistics, 4, 4, 64-649 Published Online Septeber 4 in SciRes http//wwwscirporg/ournal/os http//ddoiorg/436/os4486 Estiation of the Mean of the Eponential Distribution Using Maiu Ranked
More information13.2 Fully Polynomial Randomized Approximation Scheme for Permanent of Random 0-1 Matrices
CS71 Randoness & Coputation Spring 018 Instructor: Alistair Sinclair Lecture 13: February 7 Disclaier: These notes have not been subjected to the usual scrutiny accorded to foral publications. They ay
More informationWill Monroe August 9, with materials by Mehran Sahami and Chris Piech. image: Arito. Parameter learning
Will Monroe August 9, 07 with aterials by Mehran Sahai and Chris Piech iage: Arito Paraeter learning Announceent: Proble Set #6 Goes out tonight. Due the last day of class, Wednesday, August 6 (before
More informationOBJECTIVES INTRODUCTION
M7 Chapter 3 Section 1 OBJECTIVES Suarize data using easures of central tendency, such as the ean, edian, ode, and idrange. Describe data using the easures of variation, such as the range, variance, and
More informationMachine Learning Basics: Estimators, Bias and Variance
Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics
More informationBayes Decision Rule and Naïve Bayes Classifier
Bayes Decision Rule and Naïve Bayes Classifier Le Song Machine Learning I CSE 6740, Fall 2013 Gaussian Mixture odel A density odel p(x) ay be ulti-odal: odel it as a ixture of uni-odal distributions (e.g.
More informationProbabilistic Machine Learning
Probabilistic Machine Learning by Prof. Seungchul Lee isystes Design Lab http://isystes.unist.ac.kr/ UNIST Table of Contents I.. Probabilistic Linear Regression I... Maxiu Likelihood Solution II... Maxiu-a-Posteriori
More informationElectromagnetics I Exam No. 3 December 1, 2003 Solution
Electroagnetics Ea No. 3 Deceber 1, 2003 Solution Please read the ea carefull. Solve the folloing 4 probles. Each proble is 1/4 of the grade. To receive full credit, ou ust sho all ork. f cannot understand
More informationCourse Notes for EE227C (Spring 2018): Convex Optimization and Approximation
Course Notes for EE227C (Spring 2018): Convex Optiization and Approxiation Instructor: Moritz Hardt Eail: hardt+ee227c@berkeley.edu Graduate Instructor: Max Sichowitz Eail: sichow+ee227c@berkeley.edu October
More informationThe Weierstrass Approximation Theorem
36 The Weierstrass Approxiation Theore Recall that the fundaental idea underlying the construction of the real nubers is approxiation by the sipler rational nubers. Firstly, nubers are often deterined
More informationKeywords: Estimator, Bias, Mean-squared error, normality, generalized Pareto distribution
Testing approxiate norality of an estiator using the estiated MSE and bias with an application to the shape paraeter of the generalized Pareto distribution J. Martin van Zyl Abstract In this work the norality
More informationThe Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Parameters
journal of ultivariate analysis 58, 96106 (1996) article no. 0041 The Distribution of the Covariance Matrix for a Subset of Elliptical Distributions with Extension to Two Kurtosis Paraeters H. S. Steyn
More informationRandom Variables and Densities
Random Variables and Densities Review: Probability and Statistics Sam Roweis Random variables X represents outcomes or states of world. Instantiations of variables usually in lower case: We will write
More informationPhysics 139B Solutions to Homework Set 3 Fall 2009
Physics 139B Solutions to Hoework Set 3 Fall 009 1. Consider a particle of ass attached to a rigid assless rod of fixed length R whose other end is fixed at the origin. The rod is free to rotate about
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationE0 370 Statistical Learning Theory Lecture 6 (Aug 30, 2011) Margin Analysis
E0 370 tatistical Learning Theory Lecture 6 (Aug 30, 20) Margin Analysis Lecturer: hivani Agarwal cribe: Narasihan R Introduction In the last few lectures we have seen how to obtain high confidence bounds
More information1 Bounding the Margin
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #12 Scribe: Jian Min Si March 14, 2013 1 Bounding the Margin We are continuing the proof of a bound on the generalization error of AdaBoost
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationLecture 1: Probabilistic Graphical Models. Sam Roweis. Monday July 24, 2006 Machine Learning Summer School, Taiwan
Lecture 1: Probabilistic Graphical Models Sam Roweis Monday July 24, 2006 Machine Learning Summer School, Taiwan Building Intelligent Computers We want intelligent, adaptive, robust behaviour in computers.
More information1 Proof of learning bounds
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #4 Scribe: Akshay Mittal February 13, 2013 1 Proof of learning bounds For intuition of the following theore, suppose there exists a
More informationWhat is Probability? (again)
INRODUCTION TO ROBBILITY Basic Concepts and Definitions n experient is any process that generates well-defined outcoes. Experient: Record an age Experient: Toss a die Experient: Record an opinion yes,
More informationSolutions of some selected problems of Homework 4
Solutions of soe selected probles of Hoework 4 Sangchul Lee May 7, 2018 Proble 1 Let there be light A professor has two light bulbs in his garage. When both are burned out, they are replaced, and the next
More informationCSE525: Randomized Algorithms and Probabilistic Analysis May 16, Lecture 13
CSE55: Randoied Algoriths and obabilistic Analysis May 6, Lecture Lecturer: Anna Karlin Scribe: Noah Siegel, Jonathan Shi Rando walks and Markov chains This lecture discusses Markov chains, which capture
More informationLecture October 23. Scribes: Ruixin Qiang and Alana Shine
CSCI699: Topics in Learning and Gae Theory Lecture October 23 Lecturer: Ilias Scribes: Ruixin Qiang and Alana Shine Today s topic is auction with saples. 1 Introduction to auctions Definition 1. In a single
More information36-720: Latent Class Models
36-720: Latent Class Models Brian Junker October 17, 2007 Latent Class Models and Simpson s Paradox Latent Class Model for a Table of Counts An Intuitive E-M Algorithm for Fitting Latent Class Models Deriving
More information3.3 Variational Characterization of Singular Values
3.3. Variational Characterization of Singular Values 61 3.3 Variational Characterization of Singular Values Since the singular values are square roots of the eigenvalues of the Heritian atrices A A and
More information16 Independence Definitions Potential Pitfall Alternative Formulation. mcs-ftl 2010/9/8 0:40 page 431 #437
cs-ftl 010/9/8 0:40 page 431 #437 16 Independence 16.1 efinitions Suppose that we flip two fair coins siultaneously on opposite sides of a roo. Intuitively, the way one coin lands does not affect the way
More informationSequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5,
Sequence Analysis, WS 14/15, D. Huson & R. Neher (this part by D. Huson) February 5, 2015 31 11 Motif Finding Sources for this section: Rouchka, 1997, A Brief Overview of Gibbs Sapling. J. Buhler, M. Topa:
More informationLecture 3: Latent Variables Models and Learning with the EM Algorithm. Sam Roweis. Tuesday July25, 2006 Machine Learning Summer School, Taiwan
Lecture 3: Latent Variables Models and Learning with the EM Algorithm Sam Roweis Tuesday July25, 2006 Machine Learning Summer School, Taiwan Latent Variable Models What to do when a variable z is always
More informationSupplementary to Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data
Suppleentary to Learning Discriinative Bayesian Networks fro High-diensional Continuous Neuroiaging Data Luping Zhou, Lei Wang, Lingqiao Liu, Philip Ogunbona, and Dinggang Shen Proposition. Given a sparse
More informationPattern Recognition and Machine Learning. Learning and Evaluation for Pattern Recognition
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lesson 1 4 October 2017 Outline Learning and Evaluation for Pattern Recognition Notation...2 1. The Pattern Recognition
More informationProbability Theory Refresher
Machine Learning WS24 Module IN264 Sheet 2 Page Machine Learning Worksheet 2 Probabilit Theor Refresher Basic Probabilit Problem : A secret government agenc has developed a scanner which determines whether
More informationComputability and Complexity Random Sources. Computability and Complexity Andrei Bulatov
Coputabilit and Copleit 29- Rando Sources Coputabilit and Copleit Andrei Bulatov Coputabilit and Copleit 29-2 Rando Choices We have seen several probabilistic algoriths, that is algoriths that ake soe
More informationDERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS
DERIVING PROPER UNIFORM PRIORS FOR REGRESSION COEFFICIENTS N. van Erp and P. van Gelder Structural Hydraulic and Probabilistic Design, TU Delft Delft, The Netherlands Abstract. In probles of odel coparison
More informationIntelligent Systems: Reasoning and Recognition. Artificial Neural Networks
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley MOSIG M1 Winter Seester 2018 Lesson 7 1 March 2018 Outline Artificial Neural Networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationX-Ray Notes, Part II
Noll X-ra Notes : Page X-Ra Notes Part Source ssues The Parallel X-ra aging Sste Earlier we consiere a parallel ra sste with an incient intensit that passes through a 3D object having a istribution of
More informationRandom Process Examples 1/23
ando Process Eaples /3 E. #: D-T White Noise Let ] be a sequence of V s where each V ] in the sequence is uncorrelated with all the others: E{ ] ] } 0 for This DEFINES a DT White Noise Also called Uncorrelated
More informationRAFIA(MBA) TUTOR S UPLOADED FILE Course STA301: Statistics and Probability Lecture No 1 to 5
Course STA0: Statistics and Probability Lecture No to 5 Multiple Choice Questions:. Statistics deals with: a) Observations b) Aggregates of facts*** c) Individuals d) Isolated ites. A nuber of students
More informationThe full procedure for drawing a free-body diagram which isolates a body or system consists of the following steps. 8 Chapter 3 Equilibrium
8 Chapter 3 Equilibriu all effect on a rigid bod as forces of equal agnitude and direction applied b direct eternal contact. Eaple 9 illustrates the action of a linear elastic spring and of a nonlinear
More informationModel Fitting. CURM Background Material, Fall 2014 Dr. Doreen De Leon
Model Fitting CURM Background Material, Fall 014 Dr. Doreen De Leon 1 Introduction Given a set of data points, we often want to fit a selected odel or type to the data (e.g., we suspect an exponential
More informationBayesian Learning. Chapter 6: Bayesian Learning. Bayes Theorem. Roles for Bayesian Methods. CS 536: Machine Learning Littman (Wu, TA)
Bayesian Learning Chapter 6: Bayesian Learning CS 536: Machine Learning Littan (Wu, TA) [Read Ch. 6, except 6.3] [Suggested exercises: 6.1, 6.2, 6.6] Bayes Theore MAP, ML hypotheses MAP learners Miniu
More informationPHYS 1443 Section 003 Lecture #22
PHYS 443 Section 003 Lecture # Monda, Nov. 4, 003. Siple Bloc-Spring Sste. Energ of the Siple Haronic Oscillator 3. Pendulu Siple Pendulu Phsical Pendulu orsion Pendulu 4. Siple Haronic Motion and Unifor
More informationIn this chapter, we consider several graph-theoretic and probabilistic models
THREE ONE GRAPH-THEORETIC AND STATISTICAL MODELS 3.1 INTRODUCTION In this chapter, we consider several graph-theoretic and probabilistic odels for a social network, which we do under different assuptions
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016 Lessons 7 14 Dec 2016 Outline Artificial Neural networks Notation...2 1. Introduction...3... 3 The Artificial
More informationTraining an RBM: Contrastive Divergence. Sargur N. Srihari
Training an RBM: Contrastive Divergence Sargur N. srihari@cedar.buffalo.edu Topics in Partition Function Definition of Partition Function 1. The log-likelihood gradient 2. Stochastic axiu likelihood and
More informationBiostatistics Department Technical Report
Biostatistics Departent Technical Report BST006-00 Estiation of Prevalence by Pool Screening With Equal Sized Pools and a egative Binoial Sapling Model Charles R. Katholi, Ph.D. Eeritus Professor Departent
More informationNon-Parametric Non-Line-of-Sight Identification 1
Non-Paraetric Non-Line-of-Sight Identification Sinan Gezici, Hisashi Kobayashi and H. Vincent Poor Departent of Electrical Engineering School of Engineering and Applied Science Princeton University, Princeton,
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationLecture 12: Ensemble Methods. Introduction. Weighted Majority. Mixture of Experts/Committee. Σ k α k =1. Isabelle Guyon
Lecture 2: Enseble Methods Isabelle Guyon guyoni@inf.ethz.ch Introduction Book Chapter 7 Weighted Majority Mixture of Experts/Coittee Assue K experts f, f 2, f K (base learners) x f (x) Each expert akes
More informationKernel Methods and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley ENSIAG 2 / osig 1 Second Seester 2012/2013 Lesson 20 2 ay 2013 Kernel ethods and Support Vector achines Contents Kernel Functions...2 Quadratic
More informationTABLE FOR UPPER PERCENTAGE POINTS OF THE LARGEST ROOT OF A DETERMINANTAL EQUATION WITH FIVE ROOTS. By William W. Chen
TABLE FOR UPPER PERCENTAGE POINTS OF THE LARGEST ROOT OF A DETERMINANTAL EQUATION WITH FIVE ROOTS By Willia W. Chen The distribution of the non-null characteristic roots of a atri derived fro saple observations
More informationDetection and Estimation Theory
ESE 54 Detection and Estiation Theory Joseph A. O Sullivan Sauel C. Sachs Professor Electronic Systes and Signals Research Laboratory Electrical and Systes Engineering Washington University 11 Urbauer
More informationHandout 7. and Pr [M(x) = χ L (x) M(x) =? ] = 1.
Notes on Coplexity Theory Last updated: October, 2005 Jonathan Katz Handout 7 1 More on Randoized Coplexity Classes Reinder: so far we have seen RP,coRP, and BPP. We introduce two ore tie-bounded randoized
More informationProbability and Stochastic Processes: A Friendly Introduction for Electrical and Computer Engineers Roy D. Yates and David J.
Probability and Stochastic Processes: A Friendly Introduction for Electrical and oputer Engineers Roy D. Yates and David J. Goodan Proble Solutions : Yates and Goodan,1..3 1.3.1 1.4.6 1.4.7 1.4.8 1..6
More informationA Simple Regression Problem
A Siple Regression Proble R. M. Castro March 23, 2 In this brief note a siple regression proble will be introduced, illustrating clearly the bias-variance tradeoff. Let Y i f(x i ) + W i, i,..., n, where
More informationThe proofs of Theorem 1-3 are along the lines of Wied and Galeano (2013).
A Appendix: Proofs The proofs of Theore 1-3 are along the lines of Wied and Galeano (2013) Proof of Theore 1 Let D[d 1, d 2 ] be the space of càdlàg functions on the interval [d 1, d 2 ] equipped with
More informationPhysics 18 Spring 2011 Homework 3 - Solutions Wednesday February 2, 2011
Phsics 18 Spring 2011 Hoework 3 - s Wednesda Februar 2, 2011 Make sure our nae is on our hoework, and please bo our final answer. Because we will be giving partial credit, be sure to attept all the probles,
More informationGaussians. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
Note to other teachers and users of these slides. Andrew would be delighted if you found this source aterial useful in giing your own lectures. Feel free to use these slides erbati, or to odify the to
More informationAarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework
More informationSupport Vector Machines. Machine Learning Series Jerry Jeychandra Blohm Lab
Support Vector Machines Machine Learning Series Jerry Jeychandra Bloh Lab Outline Main goal: To understand how support vector achines (SVMs) perfor optial classification for labelled data sets, also a
More informationTEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES
TEST OF HOMOGENEITY OF PARALLEL SAMPLES FROM LOGNORMAL POPULATIONS WITH UNEQUAL VARIANCES S. E. Ahed, R. J. Tokins and A. I. Volodin Departent of Matheatics and Statistics University of Regina Regina,
More informationLearnability of Gaussians with flexible variances
Learnability of Gaussians with flexible variances Ding-Xuan Zhou City University of Hong Kong E-ail: azhou@cityu.edu.hk Supported in part by Research Grants Council of Hong Kong Start October 20, 2007
More informationMulti-Scale/Multi-Resolution: Wavelet Transform
Multi-Scale/Multi-Resolution: Wavelet Transfor Proble with Fourier Fourier analysis -- breaks down a signal into constituent sinusoids of different frequencies. A serious drawback in transforing to the
More informationDerivative at a point
Roberto s Notes on Differential Calculus Capter : Definition of derivative Section Derivative at a point Wat you need to know already: Te concept of liit and basic etods for coputing liits. Wat you can
More informationSampling How Big a Sample?
C. G. G. Aitken, 1 Ph.D. Sapling How Big a Saple? REFERENCE: Aitken CGG. Sapling how big a saple? J Forensic Sci 1999;44(4):750 760. ABSTRACT: It is thought that, in a consignent of discrete units, a certain
More informationImproving Ground Based Telescope Focus through Joint Parameter Estimation. Maj J. Chris Zingarelli USAF AFIT/ENG
Iproving Ground Based Telescope Focus through Joint Paraeter Estiation Maj J Chris Zingarelli USAF AFIT/ENG Lt Col Travis Blake DARPA/TTO - Space Systes Dr Stephen Cain USAF AFIT/ENG Abstract-- Space Surveillance
More informationMonte Carlo integration
Monte Carlo integration Eample of a Monte Carlo sampler in D: imagine a circle radius L/ within a square of LL. If points are randoml generated over the square, what s the probabilit to hit within circle?
More informationPOSITIVITY CONSTRAINT IMPLEMENTATIONS FOR MULTIFRAME BLIND DECONVOLUTION RECONSTRUCTIONS
POSITIVITY CONSTRAINT IMPLEMENTATIONS FOR MULTIFRAME BLIND DECONVOLUTION RECONSTRUCTIONS A DISSERTATION SUBMITTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAWAI I AT MĀNOA IN PARTIAL FULFILLMENT OF
More informationINF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification
INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging
More informationIntelligent Systems: Reasoning and Recognition. Perceptrons and Support Vector Machines
Intelligent Systes: Reasoning and Recognition Jaes L. Crowley osig 1 Winter Seester 2018 Lesson 6 27 February 2018 Outline Perceptrons and Support Vector achines Notation...2 Linear odels...3 Lines, Planes
More informationAn Approximate Model for the Theoretical Prediction of the Velocity Increase in the Intermediate Ballistics Period
An Approxiate Model for the Theoretical Prediction of the Velocity... 77 Central European Journal of Energetic Materials, 205, 2(), 77-88 ISSN 2353-843 An Approxiate Model for the Theoretical Prediction
More informationSection 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y
Paul J. Bruillard MATH 0.970 Problem Set 6 An Introduction to Abstract Mathematics R. Bond and W. Keane Section 3.1: 3b,c,e,i, 4bd, 6, 9, 15, 16, 18c,e, 19a, 0, 1b Section 3.: 1f,i, e, 6, 1e,f,h, 13e,
More information1 Generalization bounds based on Rademacher complexity
COS 5: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #0 Scribe: Suqi Liu March 07, 08 Last tie we started proving this very general result about how quickly the epirical average converges
More informationA Note on the Applied Use of MDL Approximations
A Note on the Applied Use of MDL Approxiations Daniel J. Navarro Departent of Psychology Ohio State University Abstract An applied proble is discussed in which two nested psychological odels of retention
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationGEE ESTIMATORS IN MIXTURE MODEL WITH VARYING CONCENTRATIONS
ACTA UIVERSITATIS LODZIESIS FOLIA OECOOMICA 3(3142015 http://dx.doi.org/10.18778/0208-6018.314.03 Olesii Doronin *, Rostislav Maiboroda ** GEE ESTIMATORS I MIXTURE MODEL WITH VARYIG COCETRATIOS Abstract.
More informationTracking using CONDENSATION: Conditional Density Propagation
Tracking using CONDENSATION: Conditional Density Propagation Goal Model-based visual tracking in dense clutter at near video frae rates M. Isard and A. Blake, CONDENSATION Conditional density propagation
More informationPhys101 Lectures 13, 14 Momentum and Collisions
Phs0 Lectures 3, 4 Moentu and ollisions Ke points: Moentu and ipulse ondition for conservation of oentu and wh How to solve collision probles entre of ass Ref: 7-,,3,4,5,6,7,8,9,0. Page Moentu is a vector:
More informationThis model assumes that the probability of a gap has size i is proportional to 1/i. i.e., i log m e. j=1. E[gap size] = i P r(i) = N f t.
CS 493: Algoriths for Massive Data Sets Feb 2, 2002 Local Models, Bloo Filter Scribe: Qin Lv Local Models In global odels, every inverted file entry is copressed with the sae odel. This work wells when
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my
More informationINF Introduction to classifiction Anne Solberg
INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300
More informationBinomial and Poisson Probability Distributions
Binoial and Poisson Probability Distributions There are a few discrete robability distributions that cro u any ties in hysics alications, e.g. QM, SM. Here we consider TWO iortant and related cases, the
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2017 Lessons 7 20 Dec 2017 Outline Artificial Neural networks Notation...2 Introduction...3 Key Equations... 3 Artificial
More informationIntroduction to Machine Learning. Recitation 11
Introduction to Machine Learning Lecturer: Regev Schweiger Recitation Fall Seester Scribe: Regev Schweiger. Kernel Ridge Regression We now take on the task of kernel-izing ridge regression. Let x,...,
More informationSimple Expressions for Success Run Distributions in Bernoulli Trials
Siple Epressions for Success Run Distributions in Bernoulli Trials Marco Muselli Istituto per i Circuiti Elettronici Consiglio Nazionale delle Ricerche via De Marini, 6-16149 Genova, Ital Eail: uselli@ice.ge.cnr.it
More informationSummary of Random Variable Concepts March 17, 2000
Summar of Random Variable Concepts March 17, 2000 This is a list of important concepts we have covered, rather than a review that devives or eplains them. Tpes of random variables discrete A random variable
More informationPattern Recognition and Machine Learning. Artificial Neural networks
Pattern Recognition and Machine Learning Jaes L. Crowley ENSIMAG 3 - MMIS Fall Seester 2016/2017 Lessons 9 11 Jan 2017 Outline Artificial Neural networks Notation...2 Convolutional Neural Networks...3
More informationSharp Time Data Tradeoffs for Linear Inverse Problems
Sharp Tie Data Tradeoffs for Linear Inverse Probles Saet Oyak Benjain Recht Mahdi Soltanolkotabi January 016 Abstract In this paper we characterize sharp tie-data tradeoffs for optiization probles used
More informationTail Estimation of the Spectral Density under Fixed-Domain Asymptotics
Tail Estiation of the Spectral Density under Fixed-Doain Asyptotics Wei-Ying Wu, Chae Young Li and Yiin Xiao Wei-Ying Wu, Departent of Statistics & Probability Michigan State University, East Lansing,
More information