Signal Processing - Lecture 7

Size: px
Start display at page:

Download "Signal Processing - Lecture 7"

Transcription

1 1 Introduction Signal Processing - Lecture 7 Fitting a function to a set of data gathered in time sequence can be viewed as signal processing or learning, and is an important topic in information theory. In this process, a signal is measured along with noise which has a known functional form. The estimator then filters the input, converging to a measure of the signal as the data is collected. Schematically the problem appears in Figure 1. In Figure 1, Z is the signal, Y is the measured signal, W is the noise, Z is the estimated signal, and E = Z Z. The signal estimator is designed as an optimum filter to obtain Z in the presence of the noise, W. Some knowledge of the signal generation is assumed, as well as an understanding of the noise. Estimators must be designed to work with a variety of inputs, and they must work with the inverted the model equations. Thus they need to avoid discontinuities or infinities. Note that the above development is inspired by Bayes theorem which allows probability updating. The best known example of such an estimator is the Kalman filter. In its simplest case, the generator is linear and invariant with time, and the noise is distributed normally about 0. The filter minimizes the mean square of the estimator error for each data step. Stochastic Input Error Generator Measurement Y Z E W Stochastic Input Z Estimator Comparitor Figure 1: An illustration of a time sequenced measure of a signal with an estimator which improves measurement by learning 1

2 H optimal estimators are another type of time invariant filter. In these estimators the quantity; k=0 E[ w(k) 2 ] is interpreted as the mean energy in a signal, w. This estimator minimizes the largest mean energy gain from the noise input, w, into the estimator error. 2 Kalman Filter The best known estimator of this type is the Kalman filter. This filter describes a set of mathematical equations that provide an efficient method to estimate parameters in an iterative process. It uses previous information to influence future calculations in the spirit of Bayes theorem. As an example, the free 3-D motion of a particle in space can be written as a linear set of equations given by the change of a state vector, (x, y, z, p x, p y, p z ) to the vector (x + (p x /m) t, y + (p y /m) t, z + (p z /m) t, p x, p y p z ). Generalization to more complicated motion is straightforward. The state vector is obtained by the linear operation; x k = A x k 1 + B u k 1 + w k i In the above, k numbers the order of the measurements, u represents a control input to the system, and w the noise input which is assumed to be random. An actual measurement of x at the time k is given by z k, and written z k = Hx k +ν k. In this later equation, ν k, is a normally distributed random variable associated with the noise in the measurement process. In this example A, B, and H are tunable parameters. We then proceed to take an average, removing the random variables and smoothing the variations; x k = x k 1 + K(z k Hx k 1 ) The 2 nd term corrects the value due to a measurement. Note if z k and (Hx k 1 ) are in agreement then the correction vanishes. The expectation of the square of the difference between the predicted value and the mean is the square of the standard deviation, σk 2. This is to be minimized by the choice of K. The solution is independent of the initial choice of starting values, and is iterated to convergence. However, convergence speed depends on the selection of the filter parameters. Examples of the use of the Kalman filter are shown in figures 2, 3. The Kalman filter has been mainly applied to signal processing, but has been used as a sophisticated way to determine the trajectory of particles in a background of many position measurements. Note that tuning of the filter parameters can be important for the speed of convergence, and will differ given the problem and the noise input to the data. 2

3 Figure 2: An illustration of the application the kalman filter to fit a set of voltage measurements showing the data points which fluctuate with noise, and the convergence of the filter to the data The Kalman filter has the following properties; 1. The average value of the estimate equals the average value of the state 2. The Kalman algorithm minimizes the expected value of the estimated error squared. Thus on average the algorithm gives the smallest possible error in the estimator The Kalman filter may be viewed as a predictor-corrector series. The filter recursively conditions the current estimate using all past measurements. In implementation, the noise covariance is usually determined prior to the filter operation. On the other hand, measurement of the generator noise is more difficult as it can t be directly measured. These could be viewed as parameters and tuned by the analysis, or in another set of data. The Kalman filter can be extended to include processes where the noise is not constant and/or the process is not linear. In the later case one linearizes the equations and restricts changes to small values, (ie use a Taylor expansion, solve the linear equations for small steps, and iterate). In the simplest case, a measure of the variable z may be obtained from the state variables, x k. In the case of a linear relationship; z k = Hx k + v k In the above, v k is a random variable, which for simplicity is assumed to be normally distributed with 0 mean. The assumption is that both generator and measurement noise are independent. Write the noise covariance matrix for w as Q and that for v as R. These are also assumed constant as k changes. 3

4 Figure 3: The same as in figure 2, but with different filter values showing slower convergence. R = Hv k H Q = Aw k A The matrix, A, relates the state at a previous time, k 1, to the state at k. The matrix H relates the state at x k to the measurement at z k. The prior and posterior are; a k = x k x k a k = x k x k The estimated covariance is; p k = E[a k a k ] Prior p k = E[a k a k ] Posterior p k = AP k 1A + Q The covariance matrix in general has the form; ( ) (x1 x p k = 1 )(x 1 x 1 ) (x 1 x 1 )(x 2 x 2 ) (x 1 x 1 )(x 2 x 2 ) (x 2 x 2 )(x 2 x 2 ) We are to obtain a posterior estimate, x k using the prior estimate, x k, and the actual measurement, z k. Thus using z k = Hx k + v k ; x k = x k + K(z k Hx k ) In the above, K is a weighting factor multiplying the difference between the value and the 4

5 prediction (Gain). Now we wish to minimize the error matrix, p k. A solution is; K = p k H Hp k H + R The solution is intuitive. As the covariance error R 0, the gain K weights the residual more heavily, lim R 0 K k = H 1 and z k = Hx k. If the prior estimate error p k 0, the residual is weighted less heavily, lim pk 0 K k = 0. Thus as the error covariance, R 0 the actual measurement is trusted more, but as the prior estimate error covariance, p k 0 the prediction is trusted more. As an example, measure the value of a constant voltage on which is imposed a normal distribution of noise. Write; x k = Ax k 1 + Bµ k 1 + w k = x k 1 + w k The measurement is; z k = H x k + v k = x k + v k In this case the state does not change so A = 1, there is no control input so B = 0, and the state is directly measured, so H = 1. The average values of the updated equations are; x k 1 = x k 1 The error covariance is; p k = p k 1 + Q The resulting equations are; K k = p k /(p k + R) x k = x k + K k(z k x k ) Look at Figure 2. In this calculation the noise covariance, R = 0.01, and the filter was slower to believe the measurements and relied on the calculation. In Figure 3 the noise covariance was chosen as so that the filter was quick to believe the measurements. In all cases, convergence to the same result occurs, but in the later case more slowly. Thus, an appropriate choice of parameters is important for convergence. 5

6 x2(l) 4 (x,y) 3 x2(l+1) x1(l) 1 2 x1(l+1) Figure 4: An example of a 2-D grid used to derive the interpolation equations 3 Interpolation We have been estimating functions by expansions about a point, and attempting to find values of a function using predictor-corrector methods. However, in most cases this is just a more complicated process than using interpolation. Immediately note that interpolation and extrapolation are not the same. In interpolation the function is bounded by assuming it is analytic within the bounds of the known values. When extrapolating, one is finding the value of a function outside of known values, and errors in this later case can be large. Suppose we want to find the value of y(x x, x 2, x n ) if we know the values on a grid. Suppose the grid is Cartesian and to simplify, use 2-D. Thus one knows the values; y = y(x i, x j ) x i = x 1 (i) x j = x 2 (j) To find y at some point within the grid, consider the diagram shown in Figure 4. Find the fractional displacement from the corners. t = u = Combining these; x 1 X1(L) X1(L + 1) X1(L) x 2 X2(L) X2(L + 1) X2(L) y(x 1, x 2 ) = (1 t)(1 u)y 1 + t(1 u)y 2 + tuy 3 + (1 t)uy 4 The equation can be easily developed by using a Taylor expansion in the variables and keeping the lowest terms. A better representation of the value is possible by keeping higher terms. Now suppose we consider the smoothness (apply the derivatives as well as the values of the function at the grid points), of the function. This gives a spline fit. Suppose we have not 6

7 only the values of the function on the grid, but also the gradient. The derivative terms if not given analytically can be determined numerically. A 1-D bilinear interpolation, as determined from the 2-D exposition above, is; y = Ay L + By L+1 A = X L+1 x X L+1 x L B = 1 A = x x L X L+1 x L y = y L + B(Y L+1 y L ) Now suppose we also have the 2 nd derivative, y. y = Ay L + By L+1 + Cy L + Dy L+1 C = (1/6)(A 3 A)(x L+1 X L ) 2 D = (1/6)(B 3 B)(x L+1 X L ) 2 Then we can find y which is the second derivative of the interpolating polynomial. Write all terms to the 0 th and 1 st order in the difference coefficients and solve the simultaneous equations. A spline fit (or interpolation) not only gives the interpolated value but also assures that the 1 st derivative at that point is continuous. 4 Measuring Information In probability there is some uncertainty, and a measure on uncertainty in knowledge is entropy. Here we are more interested in a measure of information as opposed to thermodynamic entropy, and these are not quite the same. If we know a source will transmit a poem by Frost or by Elliot, and we receive a poem by Frost then we have obtained 1 bit of information. The presumption of course is that we know the poems. However, both information entropy and thermodynamic entropy depend on probability distributions. Note that 2 observers may assign different values to the information and entropy of a source. Two students listening to a lecture can perceive different levels of information. One who has no prior knowledge of the subject could assign a completely random distribution to the information content, while another might be more prepared to assimilate the information. Thus think about a data stream as a measurement process. This is information transfer and we could assign a prior to the process which with a likelihood would produce a posterior. Thus this process could be iterated as previously discussed. 7

8 We introduce entropy as a measure of uncertainty contained in a probability distribution. Suppose a discrete distribution having a set of outcomes, A x = {a 1,, a N }. Then, if P(a i ) = p i we have completeness, P(a i ) = 1. Thus, there is a set of corresponding xǫa x probabilities, {p 1,, p N }. A fundamental theorem in information theory is Shannon s theorem which defines the measure of information as S(p 1,, p n ). S = N i=1 p i ln(p i ) Here S is information entropy. The principle of maximum entropy states that the set of probabilities (a distribution) is expected to occur in such a way as to maximize the entropy. If there are no constraints or the probabilities (other than p i = 1) then the maximum i entropy is equivalent to the principle of indifference. This principle states that without prior knowledge, we assign equal probabilities to all allowed values. The entropy of a probability distribution is the expectation value of the distribution. Suppose one has a die whose average value of a number of throws was 3.7 instead of 3.5. i p i = 3.7 i We would need to maximize S subject to the above constraint. For indifference we have p i = 1/6 and find that; i p i = 3.5 i Now as an example use the value of the entropy as S = i that p i = 1. Use this to write for the entropy; i p i ln(p 1 ), with the constraint S = N 1 i=1 For maximum entropy; p i ln(p i ) (1 N 1 i=1 p i ) ln[(1 N 1 j=1 p j )] S = ln(p p i ) 1 + ln(1 N 1 i j=1 ln(p i ) = ln(1 N 1 p j ) j=1 p j ) + 1 = 0 Thus for maximum entropy, all probabilities are equal. For example let p 1 + p 2 + p 3 = 1 8

9 p 1 = 1 p 1 p 3 p 2 = 1 p 1 p 2 5 Example Return to the example of the asymmetric die introduced above. The average value of trials is found to be; ip i = 3.7 i A completely symmetric die would give the average value of 3.5 with a probability of (1/6) for any face to show after a toss. For this case all probabilities are equal and we have for the information entropy; S = 6 [1/6]ln(6) = ln(6) = 1.79 i=1 If the ordering of the states is unimportant, we find that the number of possible states is; Number of States = 6! Now suppose we choose a set of probabilities which give an average of 3.7. This is not the only possible set of probabilities with this average, of course. (a 1 = 1, p 1 = 0.1), (a 2 = 1, p 2 = 0.1), (a 3 = 1, p 3 = 0.1) (a 4 = 1, p 1 = 0.5), (a 5 = 1, p 2 = 0.1), (a 6 = 1, p 6 = 0.1) S = 5 (1/10) ln(10) + (1/2) ln(2) = 1.5 i=1 i = i i p i = 3.7 The number of states is then 5!, and the entropy has decreased. 6 Information Information is defined for purposes here in terms of probability. It is measured by information entropy. The expectation value for a function, f is; 9

10 f = i f i p i Entropy will be defined as the expectation value of the information content. Thus the information is given by ln(p i ). From the above exposition, the probability which maximizes the entropy for N items is p i = 1/N for all i. S p i = 0 = ln(p i ) + 1. Thus 0 S(p) ln(n) Then S(p) = 0 if one p i = 1 and all others vanish, and S(p) = ln(n) if p i = 1/N for all i. There are different bases for the log operation but these only provide different measures of information (units). Thus; 1. For log 2 the units are bits 2. For log 3 the units are trits 3. For log e the units are nats 4. For log 10 the units are Hartleys Information is non-negative, and if the event has probability 1 it carries no information, thus I(1) = 0. If two independent events occur, the probability is the product of the separate probabilities. Then the information content is the sum of the information from each event. I(p 1 p 2 ) = I(p 1 ) + I(p 2 ) Information is a continuous, monotonic function of probability. Suppose we have N symbols, {a 1,, a N } which represent outcomes with probabilities, {p 1,, p n }. Define the average amount of information for each symbol as the outcome. For each symbol, a i, there is information, ln(1/p i ). In N observations there will be Np i occurrences of the symbol, a i. The total information is; I = i (Np i )ln(1/p i ) This is just the average number of observations of the symbol, a i, times the probability of occurrence summed over all events, ie the expectation value of the information content represented by the symbol, a 1. The average information per symbol is; Information N = i p i ln(p i ) 10

11 In the continuum limit; S(p) = dxp(x) ln(p(x)) 7 Review Assume an ensemble of random variables, x i, which produce a set of observables, A, with elements, a i. Each element has probability, p i. The constraint is N p i = 1. Also assume the joint probability where the outcome is the ordered pair, (x, y). This means; P(x) = P(x, y) yǫa This expresses the marginalization of the probability with respect to y. The conditional probability of x given y is; P(x y) = P(x, y) P(y) The product rule for the probability P(x, y) given z is; P(x, y z) = P(x y, z)[p(y z)] which leads to Bayes theorem; P(y x, z) = Then entropy is defined as; P(x y, z)p(y z) P(x z) S(X) = P(x) ln[p(x)] xǫa We must define P(x) ln[p(x)] = 0 when P(x) = 0. The joint entropy is; S(X, Y ) = x,yǫa P(x, y)ln[p(x, y)] Entropy is additive. For P(x, y) = P(x)P(y) ; i=1 S(X, Y ) = x,y P(x)P(y) ln[p(x)p(y)] = S(X) + S(Y ) The conditioned entropy of X given Y is sum of all values of S(X y) weighted by the probability of y. 11

12 S(X y) = P(x y) ln(p(x y xǫa S(X Y ) = P(y)[ S(X y)] = yǫa x,yǫa P(x, y) ln[p(x, y)] This measures the uncertainty in x when y is known.there is a chain rule for combining entropy written as; S(X, Y ) = S(X) + S(Y X) = S(Y ) + S(X Y ) The average reduction in uncertainty of X that results from learning the value of Y is; S(X, Y ) = S(X) S(Y X) Finally, entropy is maximized by variation of the probabilities subject to the constraints on the data. 8 Connection to thermodynamics and statistical mechanics Thermodynamics measures the macroscopic properties of systems that are microscopically complex. Suppose a macroscopic system with states {x 1,, x N } having probabilities, {p 1, p N } which are assigned by the principle of maximum entropy. In practice this is not possible because the number of microscopic states is huge. We can of course use expectation values, for example the system energy. The maximum entropy principle is useful in combinatorial problems. To illustrate, consider levels of measure of a complex system. Thus for a cube of sugar the first level would consist of all possible sizes and orientations of the crystals and the assignment of probabilities to these observables. At the next level the crystals are composed of molecules and one could study all possible molecular arrangements. In this case many arrangements are identical, but probabilities could deal with departures from the normal. Predictions of cleavage or heat conductivity are possible. At the third level, configuration of the molecular arrangements their rotations and vibrations could be considered. At this level, classical physics would need to be replaced with quantum mechanics. At the final level all quantum states would be studied, and the equilibrium macrostates would be the ones with greatest multiplicity, ie the states of greatest entropy. This is the level of quantum statistical mechanics. [At each level, probability is used to replace imperfect knowledge, and thus makes the connection to entropy]. Statistical mechanics is a probabilistic theory dealing with variables at the peak in the probability distributions - maximum entropy. 12

13 Thus statistical mechanics is a mixed micro-macroscopic theory. Data on the microscopic level is paired with information at the macroscopic information level as encoded in the partition function. All predictions are probabilistic and states of maximum entropy are chosen (peak of the probability distribution). This works because of the large number of states which means the variance is small. 8.1 Classical thermodynamics Suppose we define a discrete, stochastic variable, ǫ i, having a probability, p i. This is subject N to the constraint, = 1. Choose 2 functions of ǫ, r(ǫ) and η(ǫ), with expectation values; i=1 x = p i r(ǫ) y = p i η(ǫ) Assign probabilities constrained by the expectation values and normalization. Then define a general function of the form; F(λ 1, λ 2, λ 3 ) = p i ln(p i ) + λ 1 [ p i 1] + λ 2 [ p i r(ǫ i ) x] + λ 1 [ p i η(ǫ i ) y] In the above, the λ i are Lagrange multipliers used to apply the constraints. We interpret the probability as the likelihood of the occurrence of an event given all possible outcomes. From this we obtain a probability distribution using the maximum entropy principle. To eliminate the Lagrange multipliers, apply the calculus of variations. Identify; F p i = ln(p i ) + λ 1 + λ 2 r(ǫ i ) + λ 3 ηǫ p i = EXP[ (λ 2r(ǫ i ) + λ 3 ηǫ)] Normalization Normalization = EXP[ (λ 2 r(ǫ i ) + λ 3 ηǫ)] = e λ 1 Normalization = Z, which is the partition function. Therefore we have found the solution to the following problem. Given a set of N discrete observables, {x i } (outcomes of an experiment) and the expectation value of m variables, {g k (x i )}, we have a probability distribution, {p i } which describes the known information about the observables. That is, given g k (x i ) = p i g k (x i ) we have obtained obtain the best unbiased use of the available information for the choice of {p i }. Now suppose we allow ǫ i ǫ, ie to be a continuous variable. The sum in the partition function is replaced by an integral. 13

14 Z = dǫ w(ǫ) EXP[ (λ 2 r(ǫ i ) + λ 3 ηǫ)] In the above, w(ǫ) expresses the degeneracy of the states in the sum - ie the density of states which is the number of states between ǫ and ǫ + dǫ. Let; r = ǫ η = ǫ 2 Z = dǫ w(ǫ) EXP[ (λ 2 ǫ i + λ 3 ǫ 2 )] The probability distribution is then; P(ǫ) w(ǫ) EXP[ (λ 2 ǫ i + λ 3 ǫ 2 )] By evaluation of the Lagrange multipliers (taking the partials of F with respect to r and η and setting the result equal to zero) we find that λ 2 and λ 3. These two variables are conjugate to r and η. Thus knowing the expectation values is equivalent to providing the entropy and partition function. This can be interpreted that the Normal Distribution is the best choice of a probability distribution given the mean and variance. As a further example, suppose that the number of events in a time interval is, k. Divide the interval into N sub-intervals so that only 0 or 1 events occur in each sub-interval. The number of ways to distribute k events in N sub-intervals is; (N k)! k! NN k! N k The partition function with one constraint is; Z = N k=0 N k n! e kλ = EXP[Ne λ ] Solve for the Lagrange multiplier. Define the mean of k to be m (mean number of events in a time interval which is equivalent to the expectation value of k k = p k k). Let m = Ne λ. λ = ln(m/n) Z = e m P(k k = m) = enλ Z = e m m k /k! This illustrates that for an event-counting distribution where only the mean is known, the best choice for the probability distribution is Poisson. Finally look at the case of Fermi-Dirac statistics. The probability of an occupied state is either 0 or 1. The constraints are; 14

15 1. The probability of occupancy is 0 or 1, j = 0, or 1 2. The Average occupancy over all states is j = a 3. The average energy of a system is je i = b where E i is the energy of the i th state. We maximize the entropy subject to the above conditions. This leads to; p ij = e j(λ 1+λ 2 E i ) Z i Z i = EXP[0(λ 1 + λ 2 E i )] + [ 1(λ 1 + λ 2 E i )] Z i = 1 + EXP[ (λ 1 + λ 2 E i )] λ 1 = Z j = µ/kt This is the definition of the chemical potential. λ 2 = Z je i = 1/kT This is the definition of temperature. In the above, k is the Boltzman constant. Thus the expected occupancy of the i th state is; j i = 0p i0 + 1p i1 j i = 1 EXP[(E i µ)/kt]

Maximum Entropy - Lecture 8

Maximum Entropy - Lecture 8 1 Introduction Maximum Entropy - Lecture 8 In the previous discussions we have not been particularly concerned with the selection of the initial prior probability to use in Bayes theorem, as the result

More information

Basic Concepts and Tools in Statistical Physics

Basic Concepts and Tools in Statistical Physics Chapter 1 Basic Concepts and Tools in Statistical Physics 1.1 Introduction Statistical mechanics provides general methods to study properties of systems composed of a large number of particles. It establishes

More information

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Machine Learning Srihari. Information Theory. Sargur N. Srihari Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy

More information

Probability Distributions - Lecture 5

Probability Distributions - Lecture 5 Probability Distributions - Lecture 5 1 Introduction There are a number of mathematical models of probability density functions that represent the behavior of physical systems. In this lecture we explore

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

State Estimation of Linear and Nonlinear Dynamic Systems

State Estimation of Linear and Nonlinear Dynamic Systems State Estimation of Linear and Nonlinear Dynamic Systems Part I: Linear Systems with Gaussian Noise James B. Rawlings and Fernando V. Lima Department of Chemical and Biological Engineering University of

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond Department of Biomedical Engineering and Computational Science Aalto University January 26, 2012 Contents 1 Batch and Recursive Estimation

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Lecture 5 - Information theory

Lecture 5 - Information theory Lecture 5 - Information theory Jan Bouda FI MU May 18, 2012 Jan Bouda (FI MU) Lecture 5 - Information theory May 18, 2012 1 / 42 Part I Uncertainty and entropy Jan Bouda (FI MU) Lecture 5 - Information

More information

Fundamentals of Data Assimila1on

Fundamentals of Data Assimila1on 014 GSI Community Tutorial NCAR Foothills Campus, Boulder, CO July 14-16, 014 Fundamentals of Data Assimila1on Milija Zupanski Cooperative Institute for Research in the Atmosphere Colorado State University

More information

Physics 576 Stellar Astrophysics Prof. James Buckley. Lecture 14 Relativistic Quantum Mechanics and Quantum Statistics

Physics 576 Stellar Astrophysics Prof. James Buckley. Lecture 14 Relativistic Quantum Mechanics and Quantum Statistics Physics 576 Stellar Astrophysics Prof. James Buckley Lecture 14 Relativistic Quantum Mechanics and Quantum Statistics Reading/Homework Assignment Read chapter 3 in Rose. Midterm Exam, April 5 (take home)

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

Bioinformatics: Biology X

Bioinformatics: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA Model Building/Checking, Reverse Engineering, Causality Outline 1 Bayesian Interpretation of Probabilities 2 Where (or of what)

More information

Introduction to Mobile Robotics Probabilistic Robotics

Introduction to Mobile Robotics Probabilistic Robotics Introduction to Mobile Robotics Probabilistic Robotics Wolfram Burgard 1 Probabilistic Robotics Key idea: Explicit representation of uncertainty (using the calculus of probability theory) Perception Action

More information

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q

Kalman Filter. Predict: Update: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Kalman Filter Kalman Filter Predict: x k k 1 = F k x k 1 k 1 + B k u k P k k 1 = F k P k 1 k 1 F T k + Q Update: K = P k k 1 Hk T (H k P k k 1 Hk T + R) 1 x k k = x k k 1 + K(z k H k x k k 1 ) P k k =(I

More information

Dynamic System Identification using HDMR-Bayesian Technique

Dynamic System Identification using HDMR-Bayesian Technique Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in

More information

Lecture 13 and 14: Bayesian estimation theory

Lecture 13 and 14: Bayesian estimation theory 1 Lecture 13 and 14: Bayesian estimation theory Spring 2012 - EE 194 Networked estimation and control (Prof. Khan) March 26 2012 I. BAYESIAN ESTIMATORS Mother Nature conducts a random experiment that generates

More information

How does it work? QM describes the microscopic world in a way analogous to how classical mechanics (CM) describes the macroscopic world.

How does it work? QM describes the microscopic world in a way analogous to how classical mechanics (CM) describes the macroscopic world. Today we use Quantum Mechanics (QM) on a regular basis: Chemical bonds NMR spectroscopy The laser (blue beam in Blue-ray player; red beam in a DVD player for example) The fingerprint /signature of a molecule

More information

Endogenous Information Choice

Endogenous Information Choice Endogenous Information Choice Lecture 7 February 11, 2015 An optimizing trader will process those prices of most importance to his decision problem most frequently and carefully, those of less importance

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Computational Systems Biology: Biology X

Computational Systems Biology: Biology X Bud Mishra Room 1002, 715 Broadway, Courant Institute, NYU, New York, USA L#8:(November-08-2010) Cancer and Signals Outline 1 Bayesian Interpretation of Probabilities Information Theory Outline Bayesian

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information.

Introduction to Information Theory. Uncertainty. Entropy. Surprisal. Joint entropy. Conditional entropy. Mutual information. L65 Dept. of Linguistics, Indiana University Fall 205 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission rate

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 28 Information theory answers two fundamental questions in communication theory: What is the ultimate data compression? What is the transmission

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Computational Physics

Computational Physics Interpolation, Extrapolation & Polynomial Approximation Lectures based on course notes by Pablo Laguna and Kostas Kokkotas revamped by Deirdre Shoemaker Spring 2014 Introduction In many cases, a function

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Some Concepts of Probability (Review) Volker Tresp Summer 2018 Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov

More information

MACROSCOPIC VARIABLES, THERMAL EQUILIBRIUM. Contents AND BOLTZMANN ENTROPY. 1 Macroscopic Variables 3. 2 Local quantities and Hydrodynamics fields 4

MACROSCOPIC VARIABLES, THERMAL EQUILIBRIUM. Contents AND BOLTZMANN ENTROPY. 1 Macroscopic Variables 3. 2 Local quantities and Hydrodynamics fields 4 MACROSCOPIC VARIABLES, THERMAL EQUILIBRIUM AND BOLTZMANN ENTROPY Contents 1 Macroscopic Variables 3 2 Local quantities and Hydrodynamics fields 4 3 Coarse-graining 6 4 Thermal equilibrium 9 5 Two systems

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari

Learning MN Parameters with Alternative Objective Functions. Sargur Srihari Learning MN Parameters with Alternative Objective Functions Sargur srihari@cedar.buffalo.edu 1 Topics Max Likelihood & Contrastive Objectives Contrastive Objective Learning Methods Pseudo-likelihood Gradient

More information

An Overly Simplified and Brief Review of Differential Equation Solution Methods. 1. Some Common Exact Solution Methods for Differential Equations

An Overly Simplified and Brief Review of Differential Equation Solution Methods. 1. Some Common Exact Solution Methods for Differential Equations An Overly Simplified and Brief Review of Differential Equation Solution Methods We will be dealing with initial or boundary value problems. A typical initial value problem has the form y y 0 y(0) 1 A typical

More information

MULTINOMIAL PROBABILITY DISTRIBUTION

MULTINOMIAL PROBABILITY DISTRIBUTION MTH/STA 56 MULTINOMIAL PROBABILITY DISTRIBUTION The multinomial probability distribution is an extension of the binomial probability distribution when the identical trial in the experiment has more than

More information

Chapter 9 Fundamental Limits in Information Theory

Chapter 9 Fundamental Limits in Information Theory Chapter 9 Fundamental Limits in Information Theory Information Theory is the fundamental theory behind information manipulation, including data compression and data transmission. 9.1 Introduction o For

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Physics 239/139 Spring 2018 Assignment 2 Solutions

Physics 239/139 Spring 2018 Assignment 2 Solutions University of California at San Diego Department of Physics Prof. John McGreevy Physics 39/139 Spring 018 Assignment Solutions Due 1:30pm Monday, April 16, 018 1. Classical circuits brain-warmer. (a) Show

More information

SLAM Techniques and Algorithms. Jack Collier. Canada. Recherche et développement pour la défense Canada. Defence Research and Development Canada

SLAM Techniques and Algorithms. Jack Collier. Canada. Recherche et développement pour la défense Canada. Defence Research and Development Canada SLAM Techniques and Algorithms Jack Collier Defence Research and Development Canada Recherche et développement pour la défense Canada Canada Goals What will we learn Gain an appreciation for what SLAM

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Linear Classification. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Linear Classification CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 Example of Linear Classification Red points: patterns belonging

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

INTRODUCTION TO INFORMATION THEORY

INTRODUCTION TO INFORMATION THEORY INTRODUCTION TO INFORMATION THEORY KRISTOFFER P. NIMARK These notes introduce the machinery of information theory which is a eld within applied mathematics. The material can be found in most textbooks

More information

CS491/691: Introduction to Aerial Robotics

CS491/691: Introduction to Aerial Robotics CS491/691: Introduction to Aerial Robotics Topic: State Estimation Dr. Kostas Alexis (CSE) World state (or system state) Belief state: Our belief/estimate of the world state World state: Real state of

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons Yan Karklin and Eero P. Simoncelli NYU Overview Efficient coding is a well-known objective for the evaluation and

More information

Introduction to Information Entropy Adapted from Papoulis (1991)

Introduction to Information Entropy Adapted from Papoulis (1991) Introduction to Information Entropy Adapted from Papoulis (1991) Federico Lombardo Papoulis, A., Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw ill, 1991. 1 1. INTRODUCTION

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Variational Inference and Learning. Sargur N. Srihari

Variational Inference and Learning. Sargur N. Srihari Variational Inference and Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics in Approximate Inference Task of Inference Intractability in Inference 1. Inference as Optimization 2. Expectation Maximization

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

Strong Lens Modeling (II): Statistical Methods

Strong Lens Modeling (II): Statistical Methods Strong Lens Modeling (II): Statistical Methods Chuck Keeton Rutgers, the State University of New Jersey Probability theory multiple random variables, a and b joint distribution p(a, b) conditional distribution

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Minimum Bias Events at ATLAS

Minimum Bias Events at ATLAS Camille Bélanger-Champagne Lehman McGill College University City University of ew York Thermodynamics Charged Particle and Correlations Statistical Mechanics in Minimum Bias Events at ATLAS Statistical

More information

Language as a Stochastic Process

Language as a Stochastic Process CS769 Spring 2010 Advanced Natural Language Processing Language as a Stochastic Process Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu 1 Basic Statistics for NLP Pick an arbitrary letter x at random from any

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V

More information

Gaussian Process Approximations of Stochastic Differential Equations

Gaussian Process Approximations of Stochastic Differential Equations Gaussian Process Approximations of Stochastic Differential Equations Cédric Archambeau Centre for Computational Statistics and Machine Learning University College London c.archambeau@cs.ucl.ac.uk CSML

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Mobile Robot Localization

Mobile Robot Localization Mobile Robot Localization 1 The Problem of Robot Localization Given a map of the environment, how can a robot determine its pose (planar coordinates + orientation)? Two sources of uncertainty: - observations

More information

Course 495: Advanced Statistical Machine Learning/Pattern Recognition

Course 495: Advanced Statistical Machine Learning/Pattern Recognition Course 495: Advanced Statistical Machine Learning/Pattern Recognition Lecturer: Stefanos Zafeiriou Goal (Lectures): To present discrete and continuous valued probabilistic linear dynamical systems (HMMs

More information

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain

More information

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graphical model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graphical model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

(Extended) Kalman Filter

(Extended) Kalman Filter (Extended) Kalman Filter Brian Hunt 7 June 2013 Goals of Data Assimilation (DA) Estimate the state of a system based on both current and all past observations of the system, using a model for the system

More information

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Problem Set 8 Fall 2007 UC Berkeley Department of Electrical Engineering and Computer Science EE 6: Probablity and Random Processes Problem Set 8 Fall 007 Issued: Thursday, October 5, 007 Due: Friday, November, 007 Reading: Bertsekas

More information

Learning MN Parameters with Approximation. Sargur Srihari

Learning MN Parameters with Approximation. Sargur Srihari Learning MN Parameters with Approximation Sargur srihari@cedar.buffalo.edu 1 Topics Iterative exact learning of MN parameters Difficulty with exact methods Approximate methods Approximate Inference Belief

More information

213 Midterm coming up

213 Midterm coming up 213 Midterm coming up Monday April 8 @ 7 pm (conflict exam @ 5:15pm) Covers: Lectures 1-12 (not including thermal radiation) HW 1-4 Discussion 1-4 Labs 1-2 Review Session Sunday April 7, 3-5 PM, 141 Loomis

More information

II. Probability. II.A General Definitions

II. Probability. II.A General Definitions II. Probability II.A General Definitions The laws of thermodynamics are based on observations of macroscopic bodies, and encapsulate their thermal properties. On the other hand, matter is composed of atoms

More information

State and Parameter Estimation in Stochastic Dynamical Models

State and Parameter Estimation in Stochastic Dynamical Models State and Parameter Estimation in Stochastic Dynamical Models Timothy DelSole George Mason University, Fairfax, Va and Center for Ocean-Land-Atmosphere Studies, Calverton, MD June 21, 2011 1 1 collaboration

More information

Dr.Salwa Alsaleh fac.ksu.edu.sa/salwams

Dr.Salwa Alsaleh fac.ksu.edu.sa/salwams Dr.Salwa Alsaleh Salwams@ksu.edu.sa fac.ksu.edu.sa/salwams Lecture 5 Basic Ideas of Statistical Mechanics General idea Macrostates and microstates Fundamental assumptions A simple illustration: tossing

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

Lecture 1: Introduction, Entropy and ML estimation

Lecture 1: Introduction, Entropy and ML estimation 0-704: Information Processing and Learning Spring 202 Lecture : Introduction, Entropy and ML estimation Lecturer: Aarti Singh Scribes: Min Xu Disclaimer: These notes have not been subjected to the usual

More information

1.6. Information Theory

1.6. Information Theory 48. INTRODUCTION Section 5.6 Exercise.7 (b) First solve the inference problem of determining the conditional density p(t x), and then subsequently marginalize to find the conditional mean given by (.89).

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Physics 4311 ANSWERS: Sample Problems for Exam #2. (1)Short answer questions:

Physics 4311 ANSWERS: Sample Problems for Exam #2. (1)Short answer questions: (1)Short answer questions: Physics 4311 ANSWERS: Sample Problems for Exam #2 (a) Consider an isolated system that consists of several subsystems interacting thermally and mechanically with each other.

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Fundamentals of Data Assimila1on

Fundamentals of Data Assimila1on 2015 GSI Community Tutorial NCAR Foothills Campus, Boulder, CO August 11-14, 2015 Fundamentals of Data Assimila1on Milija Zupanski Cooperative Institute for Research in the Atmosphere Colorado State University

More information

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods

Pattern Recognition and Machine Learning. Bishop Chapter 6: Kernel Methods Pattern Recognition and Machine Learning Chapter 6: Kernel Methods Vasil Khalidov Alex Kläser December 13, 2007 Training Data: Keep or Discard? Parametric methods (linear/nonlinear) so far: learn parameter

More information

Cartesian-product sample spaces and independence

Cartesian-product sample spaces and independence CS 70 Discrete Mathematics for CS Fall 003 Wagner Lecture 4 The final two lectures on probability will cover some basic methods for answering questions about probability spaces. We will apply them to the

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

02 Background Minimum background on probability. Random process

02 Background Minimum background on probability. Random process 0 Background 0.03 Minimum background on probability Random processes Probability Conditional probability Bayes theorem Random variables Sampling and estimation Variance, covariance and correlation Probability

More information

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials)

Markov Networks. l Like Bayes Nets. l Graph model that describes joint probability distribution using tables (AKA potentials) Markov Networks l Like Bayes Nets l Graph model that describes joint probability distribution using tables (AKA potentials) l Nodes are random variables l Labels are outcomes over the variables Markov

More information

Thermodynamics of nuclei in thermal contact

Thermodynamics of nuclei in thermal contact Thermodynamics of nuclei in thermal contact Karl-Heinz Schmidt, Beatriz Jurado CENBG, CNRS/IN2P3, Chemin du Solarium B.P. 120, 33175 Gradignan, France Abstract: The behaviour of a di-nuclear system in

More information

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag Decision Trees Nicholas Ruozzi University of Texas at Dallas Based on the slides of Vibhav Gogate and David Sontag Supervised Learning Input: labelled training data i.e., data plus desired output Assumption:

More information

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky Monte Carlo Lecture 15 4/9/18 1 Sampling with dynamics In Molecular Dynamics we simulate evolution of a system over time according to Newton s equations, conserving energy Averages (thermodynamic properties)

More information

Ways to make neural networks generalize better

Ways to make neural networks generalize better Ways to make neural networks generalize better Seminar in Deep Learning University of Tartu 04 / 10 / 2014 Pihel Saatmann Topics Overview of ways to improve generalization Limiting the size of the weights

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information