ECE531 Lecture 2b: Bayesian Hypothesis Testing
|
|
- Asher Barber
- 5 years ago
- Views:
Transcription
1 ECE531 Lecture 2b: Bayesian Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 29-January-2009 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
2 Minimizing a Weighted Sum of Conditional Risks We would like to minimize all of the conditional risks R 0 (D),...,R (D), but usually we have to make some sort of tradeoff. A standard approach to solving a multi-objective optimization problem is to form a linear combination of the functions to be minimized: r(d,λ) := where λ j 0 and j λ j = 1. λ j R j (D) = λ R(D) The problem is then to solve the single-objective optimization problem: D = arg min D D r(d,λ) = arg min λ j c j Dp j D D where we are focusing on the case of finite Y for now. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
3 Minimizing a Weighted Sum of Conditional Risks The problem D = arg min r(d,λ) = arg min λ j c j Dp j D D D D is a finite-dimensional linear programming problem since the variables that we control (D ij ) are constrained by linear inequalities M 1 i=0 D ij 0 i {0,...,M 1} and j {0,...,L 1} D ij = 1 and the objective function r(d,λ) is linear in the variables D ij. For notational convenience, we call this minimization problem LPF(λ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
4 Decomposition: Part 1 Since R j (D) = c j Dp j = M 1 i=0 the problem LPF(λ) can be written as D il P lj C ij L 1 l=0 D M 1 = arg min λ j D D i=0 = arg min D D L 1 l=0 M 1 D il i=0 D il P lj C ij L 1 l=0 λ j C ij P lj } {{ } r l (d,λ) Note that we can minimize each term r l (d,λ) separately. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
5 Decomposition: Part 2 The lth subproblem is then d = arg min r l(d,λ) d R M M 1 = arg min d i λ j C ij P lj d i=0 subject to the constraints that d i 0 and i d i = 1. Note that d i D il from the original LPF(λ) problem. For notational convenience, we call this subproblem LPF l (λ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
6 Some Intuition: The Hiker Suppose you are going on a hike with a one liter drinking bottle and you must completely fill the bottle before your hike. At the general store, you can fill your bottle with any combination of: Tap water: $0.25 per liter Ice tea: $0.50 per liter Soda: $1.00 per liter Red Bull: $5.00 per liter You want to minimize your cost for the hike. What should you do? We can define a per-liter cost vector h = [0.25,0.50,1.00,5.00] and a proportional purchase vector ν R 4 with elements satisfying ν i 0 and i ν i = 1. Clearly, your cost h ν is minimized when ν = [1,0,0,0]. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
7 Solution to the Subproblem LPF l (λ) Theorem For fixed h R n, the function h ν is minimized over the set ν {R n : ν i 0 and i ν i = 1} by the vector ν with ν m = 1 and ν k = 0, k m, where m is any index with h m = min i {0,...,n 1} h i. Note that, if two or more commodities have the same minimum price, the solution is not unique. Back to our subproblem LPF l (λ)... " M 1 X # X d = arg min d i λ jc ijp lj d R M i=0 subject to the constraints that d i 0 and i d i = 1. The Theoreom tells us how to easily solve LPF l (λ). We just have to find the index " # X m l = arg min λ jc ijp lj i {0,...,M 1} and then set d ml = 1 and d i = 0 for all i m l. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
8 Solution to the Problem LPF(λ) Solving the subproblem LPF l (λ) for l = 0,...,L 1 and assembling the results of the subproblems into M L decision matrices yields a non-empty, finite set of deterministic decision matrices that solve LPF(λ). All deterministic decision matrices in the set have the same minimal cost: L 1 r (λ) = min λ i j C ij P lj l=0 It is easy to show that any randomized decision rule in the convex hull of this set of deterministic decision matrices also achieves r (λ), and hence is also optimal. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
9 Example Part 1 Bayesian Hypothesis Testing Let G il := λ jc ij P lj and G R M L be the matrix composed of elements G il. Suppose G = What is the solution to the subproblem LPF 0 (λ)? d = [1,0,0]. What is the solution to the problem LPF(λ)? D = Is this solution unique? Yes. What is r (λ)? r (λ) = 1.0. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
10 Example Part 2 Bayesian Hypothesis Testing What happens if G = ? The solution to LPF 2 (λ) is no longer unique. Both D = or D = solve LPF(λ) and achieve r (λ) = 1.1. In fact, any decision matrix of the form α 3 0 D = α for α [0,1] will also achieve r (λ) = 1.1. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
11 Remarks Note that the solution to the subproblem LPF l (λ) always yields at least one deterministic decision rule for the case y = y l. If n l elements of the column G l are equal to the minimum of the column, then there are n l deterministic decision rules solving the subproblem LPF l (λ). There are l n l 1 deterministic decision rules that solve LPF(λ). If there is more than one deterministic solution to LPF(λ) then there will also be a corresponding convex hull of randomized decision rules that also achieve r (λ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
12 Working Example: Effect of Changing the Weighting λ D R D D D8 D R0 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
13 Weighted Risk Level Sets Notice in the last example that, by varying our risk weighting λ, we can create a family of level sets that cover every point of the optimal tradeoff surface. Given a constant c R, the level set of value c is defined as L λ c := {x R N : λ x = c} 1 λ = [ ] T 1 λ = [ ] T x1 x x x0 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
14 Infinite Observation Spaces: Part 1 When the observations are specified by conditional pdfs, our conditional risk function becomes [ M 1 R j (ρ) = ρ i (y)c ij ]p j (y)dy y Y i=0 Recall that the randomized decision rule ρ i (y) specifies the probability of deciding H i when the observation is y. We still have N < states, so, as before, we can write a linear combination of the conditional risk functions as r(ρ,λ) := where λ j 0 and j λ j = 1. λ j R j (ρ) Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
15 Infinite Observation Spaces: Part 2 The problem is then to solve ρ = arg min ρ D r(ρ,λ) = arg min λ j ρ D y Y [ M 1 i=0 ρ i (y)c ij ]p j (y)dy where D is the set of all valid decision rules satisfying ρ i (y) 0 for all i = 0,...,M 1 and y Y, as well as M 1 i=0 ρ i(y) = 1 for all y Y. This is an infinite dimensional linear programming problem. We refer to this problem as LP(λ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
16 Infinite Observation Spaces: Decomposition Part 1 As before, we can decompose the big problem LP(λ) into lots of little subproblems, one for each observation y Y. We rewrite LP(λ) as ρ = arg min λ j ρ D = arg min ρ D y Y y Y M 1 i=0 [ M 1 i=0 ρ i (y)c ij ] p j (y)dy ρ i (y) λ j C ij p j (y) dy To minimize the integral, we can minimize the term M 1 (...) = ρ i (y) λ j C ij p j (y) at each fixed value of y. i=0 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
17 Infinite Observation Spaces: Decomposition Part 2 The subproblem of minimizing the term (... ) inside the integral when y = y 0 is a finite-dimensional linear programming problem (since X is still finite): M 1 ν = arg min ν i=0 ν i λ j C ij p j (y 0 ) subject to ν i 0 for all i = 0,...,M 1 and M 1 i=0 ν i = 1. We already know how to do this. We just find the index that has the lowest commodity cost m y0 = arg min i {0,...,M 1} λ j C ij p j (y 0 ) and set ν my0 = 1 and all other ν i = 0. Same technique as solving LPF l (λ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
18 Infinite Observation Spaces: Minimum Weighted Risk As we saw with finite Y, solving LP(λ) will yield at least one deterministic decision rule that achieves the minimum weighted risk r (λ) = λ j C ij p j (y)dy = = y Y y Y M 1 i=0 min i {0,...,M 1} min g i(y,λ)dy i {0,...,M 1} g i (y,λ)dy y Y i 1. The existence of at least one deterministic decision rule solving LPF(λ) or LP(λ) implies the existence of an optimal partition of Y. 2. Since the problems LPF(λ) and LP(λ) always yield at least one deterministic decision rule, another way to think of these problems is: Find a partition of the observation space that minimizes the weighted risk. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
19 Infinite Observation Spaces: Putting it All Together A decision rule that solves LP(λ) is then ρ i (y) = { 1 if i = arg min m {0,...,M 1} g m (y,λ) 0 otherwise where ρ (y) = [ρ 0 (y),...,ρ M 1 (y)]. Remarks: 1. The solution to LP(λ) may not be unique. 2. If there is more than one deterministic solution to LP(λ) then there will also be a corresponding convex hull of randomized decision rules that also achieve r (λ). 3. Note that the standard convention for deterministic decision rules is δ : Y Z. An equivalent way of specifying a deterministic decision rule that solves LP(λ) is δ (y) = arg min g i(y,λ). i {0,...,M 1} Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
20 The Bayesian Approach We assume a prior state distribution π P N such that Prob(state is x j ) = π j Note that this is our model of the state probabilities prior to the observation. We denote the Bayes Risk of the decision rule ρ as r(ρ,π) = π j R j (ρ) = π j y Y M 1 p j (y) C ij ρ i (y)dy. This is simply the weighted overall risk, or average risk, given our prior belief of the state probabilities. A decision rule that minimizes this risk is called a Bayes decision rule for the prior π. i=0 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
21 Solving Bayesian Hypothesis Testing Problems We know how to solve this problem. For finite Y, we just find the index m l = arg min π j C ij P lj i {0,...,M 1} } {{ } G il and set Dm Bπ l,l = 1 and DBπ i,l = 0 for all i m l. For infinite Y, we just find the index m y0 = arg min i {0,...,M 1} π j C ij p j (y 0 ) } {{ } g i (y 0,π) and set δ Bπ (y 0 ) = m y0 (or, equivalently, set ρ Bπ m y0 (y 0 ) = 1 and ρ Bπ i (y 0 ) = 0 for all i m y0 ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
22 Prior and Posterior Probabilities The conditional probability that H j is true given the observation y: where π j (y) := Prob(H j is true Y = y) = p j(y)π j p(y) p(y) = π j p j (y). Recall that π j is the prior probability of choosing H j, before we have any observations. The quantity π j (y) is the posterior probability of choosing H j, conditioned on the observation y. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
23 A Bayes Decision Rule Minimizes The Posterior Cost General expression for a deterministic Bayes decision rule: δ Bπ (y) = arg But π j p j (y) = π j (y)p(y), hence δ Bπ (y) = arg min i {0,...,M 1} min i {0,...,M 1} π j C ij p j (y) C ij π j (y)p(y) = arg min C ij π j (y) i {0,...,M 1} since p(y) does not affect the minimizer. What does this mean? C ijπ j (y) the average cost of choosing hypothesis H i given Y = y, i.e. the posterior cost of choosing H i. The Bayes decision rule chooses the hypothesis that yields the minimum expected posterior cost. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
24 Bayesian Hypothesis Testing with UCA: Part 1 The uniform cost assignment (UCA): { 0 if x j H i C ij = 1 otherwise The conditional risk R j (ρ) under the UCA is simply the probability of not choosing the hypothesis that contains x j, i.e. the probability of error when the state is x j. The Bayes risk in this case is r(ρ, π) = π j R j (ρ) = Prob(error). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
25 Bayesian Hypothesis Testing with UCA: Part 2 Under the UCA, a Bayes decision rule can be written in terms of the posterior probabilities as δ Bπ (y) = arg min i {0,...,M 1} = arg min i {0,...,M 1} = arg min i {0,...,M 1} = arg max i {0,...,M 1} C ij π j (y) π j (y) x j / H i 1 π j (y) x j H i π j (y) x j H i Hence, for hypothesis tests under the UCA, the Bayes decision rule is the MAP (maximum a posteriori) decision rule. When the hypothesis test is simple, δ Bπ (y) = arg max i π i (y). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
26 Bayesian Hypothesis Testing with UCA and Uniform Prior Under the UCA and a uniform prior, i.e. π j = 1/N for all j = 0,...,N 1 δ Bπ (y) = arg max i {0,...,M 1} = arg max i {0,...,M 1} = arg max i {0,...,M 1} since π j = 1/N does not affect the minimizer. π j (y) x j H i π j p j (y) x j H i p j (y) x j H i In this case, δ Bπ (y) selects the most likely hypothesis, i.e. the hypothesis which best explains the observation y. This is called the maximum likelihood (ML) decision rule. Which is better, MAP or ML? Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
27 Simple Binary Bayesian Hypothesis Testing: Part 1 We have two states x 0 and x 1 and two hypotheses H 0 and H 1. For each y Y, our problem is to compute where m y = arg min i {0,1} g i(y,π) g 0 (y,π) = π 0 C 00 p 0 (y) + π 1 C 01 p 1 (y) g 1 (y,π) = π 0 C 10 p 0 (y) + π 1 C 11 p 1 (y) We only have two things to compare. We can simplify this comparison: g 0(y,π) g 1(y, π) π 0C 00p 0(y) + π 1C 01p 1(y) π 0C 10p 0(y) + π 1C 11p 1(y) p 1(y)π 1(C 01 C 11) p 0(y)π 0(C 10 C 00) p1(y) π0(c10 C00) p 0(y) π 1(C 01 C 11) where we have assumed that C 01 > C 11 to get the final result. The expression L(y) := p 1(y) p 0 (y) is known as the likelihood ratio. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
28 Simple Binary Bayesian Hypothesis Testing: Part 2 Given L(y) := p 1(y) p 0 (y) and τ := π 0(C 10 C 00 ) π 1 (C 01 C 11 ), our Bayes decision rule is then simply 1 if L(y) > τ δ Bπ (y) = 0/1 if L(y) = τ 0 if L(y) < τ. Remark: For any y Y that result in L(y) = τ, the Bayes risk is the same whether we decide H 0 or H 1. You can deal with this by always deciding H 0 in this case, or always deciding H 1, or flipping a coin, etc. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
29 Performance of Simple Binary Bayesian Hypothesis Testing Since the observation Y is a random variable, so is the likelihood ratio L(Y ). Assume that L(Y ) has a conditional pdf Λ j (t) when the state is x j. When the state is x 0, the probability of making an error (deciding H 1 ) is P false positive = Prob(L(y) > τ state is x 0 ) = τ Λ 0 (t)dt Similarly, when the state is x 1, the probability of making an error (deciding H 0 ) is P false negative = Prob(L(y) τ state is x 1 ) = What is the overall (unconditional) probability of error? τ Λ 1 (t)dt Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
30 Simple Binary Bayesian Hypothesis Testing with UCA Uniform cost assignment: C 00 = C 11 = 0 C 01 = C 10 = 1 In this case, the discriminant functions are simply g 0 (y,π) = π 1 p 1 (y) = π 1 (y)p(y) g 1 (y,π) = π 0 p 0 (y) = π 0 (y)p(y) and a Bayes decision rule can be written in terms of the posterior probabilities as 1 if π 1 (y) > π 0 (y) δ Bπ (y) = 0/1 if π 1 (y) = π 0 (y) 0 if π 1 (y) < π 0 (y). In this case, it should be clear that the Bayes decision rule is the MAP (maximum a posteriori) decision rule. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
31 Example: Coherent Detection of BPSK Suppose a transmitter sends one of two scalar signals and the signals arrive at a receiver corrupted by zero-mean additive white Gaussian noise (AWGN) with variance σ 2. We want to use Bayesian hypothesis testing to determine which signal was sent. Signal model conditioned on state x j : Y = a j + η where a j is the scalar signal and η N(0,σ 2 ). Hence ( 1 (y aj ) 2 ) p j (y) = exp 2πσ 2σ 2 Hypotheses: H 0 : a 0 was sent, or Y N(a 0,σ 2 ) H 1 : a 1 was sent, or Y N(a 1,σ 2 ) Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
32 Example: Coherent Detection of BPSK This is just a simple binary hypothesis testing problem. Under the UCA, we can write ( 1 (y a0 ) 2 ) R 0 (ρ) = exp y Y 1 2πσ 2σ 2 dy ( (y a1 ) 2 ) R 1 (ρ) = exp 2πσ 2σ 2 dy y Y 0 1 where Y j = {y Y : ρ j (y) = 1}. Intuitively, a 0 a 1 Y 0 Y 1 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
33 Example: Coherent Detection of BPSK Given a prior π 0, π 1 = 1 π 0, we can write the Bayes risk as r(ρ,π) = π 0 R 0 (ρ) + (1 π 0 )R 1 (ρ) The Bayes decision rule can be found by computing the likelihood ratio and comparing it to the threshold τ: p 1 (y) exp p 0 (y) > τ exp ( ) (y a1 ) 2 2σ ( 2 (y a0 ) 2 2σ 2 ) > π 0 π 1 (y a 0) 2 (y a 1 ) 2 2σ 2 > ln π 0 π 1 y > a 0 + a σ2 a 1 a 0 ln π 0 π 1 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
34 Example: Coherent Detection of BPSK Let γ := a 0+a σ2 a 1 a 0 ln π 0 π 1. Our Bayes decision rule is then : 1 if y > γ δ Bπ (y) = 0/1 if y = γ 0 if y < γ. Using this decision rule, the Bayes risk is then ( ) ( ) γ r(δ Bπ a0 a1 γ,π) = π 0 Q + (1 π 0 )Q σ σ where Q(x) := x 1 2π e t2 /2 dt. Remarks: When π 0 = π 1 = 1 2, the decision boundary is simply a 0+a 1 2, i.e. the midpoint between a 0 and a 1 and r(δ Bπ,π) = Q ( a 1 a 0 ) 2σ. When σ is very small, the prior has little effect on the decision boundary. When σ is very large, the prior becomes more important. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
35 Composite Binary Bayesian Hypothesis Testing We have N > 2 states x 0,...,x and two hypotheses H 0 and H 1 that form a valid partition on these states. For each y Y, we compute m y = arg min g i(y,π), where g i (y,π) = π j C ij p j (y) i {0,1} As was the case of simple binary Bayesian hypothesis testing, we only have two things to compare. The comparison boils down to j g 0 (y,π) g 1 (y,π) J(y) := π jc 0j p j (y) j π jc 1j p j (y) 1. Hence, a Bayes decision rule for the composite binary hypothesis testing problem is then simply 1 if J(y) > 1 δ Bπ (y) = 0/1 if J(y) = 1 0 if J(y) < 1. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
36 Composite Binary Bayesian Hypothesis Testing with UCA For i = 0,1 and j = 0,...,N 1, the UCA is: { 0 if x j H i C ij = 1 otherwise Hence, the discriminant functions are g i (y,π) = π j p j (y) = 1 x j H c i x j H i π j p j (y) = 1 x j H i π j (y)p(y) Under the UCA, the hypothesis test comparison boils down to x g 0 (y,π) g 1 (y,π) J(y) := j H 1 π j (y) x j H 0 π j (y) 1. and the Bayes decision rule δ Bπ (y) is the same as before. Note that P x J(y) := j H 1 π j(y) P x j H 0 π = Prob(H1 y) j(y) Prob(H 0 y) so, as before, the Bayes decision rule is the MAP decision rule. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
37 Final Remarks on Bayesian Hypothesis Testing 1. A Bayes decision rule minimizes the Bayes risk r(ρ,π) = j π jr j (ρ) under the prior state distribution π. 2. At least one deterministic decision rule δ Bπ (y) achieves the minimimum Bayes risk for any prior state distribution π. 3. The prior state distribution π can also be thought of as a parameter. By varying π, we can generate a family of Bayes decision rules δ Bπ parameterized by π. This family of Bayes decision rules can achieve any operating point on the optimal tradeoff surface (or ROC curve). 4. In some applications, a prior state distribution is difficult to determine. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
38 Bayesian Hypothesis Testing with an Incorrect Prior CRV of deterministic decision rules minimum Bayes risk, our prior [ ] minimum Bayes risk, true prior [ ] actual Bayes risk R r= r= R0 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
39 Conclusions and Motivation for the Minimax Criterion If the true prior state distribution π true is far from our belief π, a Bayes detector δ Bπ may result in a significant loss in performance. One way to select a decision rule that is robust with respect to uncertainty in the prior is to minimize max π r(ρ,π) = max j R j (ρ). The minimax criterion leads to a decision rule that minimizes the Bayes risk for the least favorable prior distribution. This minimax criterion is conservative, but it allows one to guarantee performance over all possible priors. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39
ECE531 Lecture 4b: Composite Hypothesis Testing
ECE531 Lecture 4b: Composite Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 16-February-2011 Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 1 / 44 Introduction
More informationECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters
ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters D. Richard Brown III Worcester Polytechnic Institute 26-February-2009 Worcester Polytechnic Institute D. Richard Brown III 26-February-2009
More informationECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations
ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations D. Richard Brown III Worcester Polytechnic Institute Worcester Polytechnic Institute D. Richard Brown III 1 / 7 Neyman
More informationECE531 Lecture 2a: A Mathematical Model for Hypothesis Testing (Finite Number of Possible Observations)
ECE531 Lecture 2a: A Mathematical Model for Hypothesis Testing (Finite Number of Possible Observations) D. Richard Brown III Worcester Polytechnic Institute 26-January-2011 Worcester Polytechnic Institute
More informationECE531 Lecture 13: Sequential Detection of Discrete-Time Signals
ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals D. Richard Brown III Worcester Polytechnic Institute 30-Apr-2009 Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 1 / 32
More informationChapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)
Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis
More informationECE531 Homework Assignment Number 2
ECE53 Homework Assignment Number 2 Due by 8:5pm on Thursday 5-Feb-29 Make sure your reasoning and work are clear to receive full credit for each problem 3 points A city has two taxi companies distinguished
More informationLecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary
ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood
More informationECE531 Lecture 8: Non-Random Parameter Estimation
ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction
More informationEE 574 Detection and Estimation Theory Lecture Presentation 8
Lecture Presentation 8 Aykut HOCANIN Dept. of Electrical and Electronic Engineering 1/14 Chapter 3: Representation of Random Processes 3.2 Deterministic Functions:Orthogonal Representations For a finite-energy
More information2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?
ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we
More informationChapter 2 Signal Processing at Receivers: Detection Theory
Chapter Signal Processing at Receivers: Detection Theory As an application of the statistical hypothesis testing, signal detection plays a key role in signal processing at receivers of wireless communication
More informationLecture 7. Union bound for reducing M-ary to binary hypothesis testing
Lecture 7 Agenda for the lecture M-ary hypothesis testing and the MAP rule Union bound for reducing M-ary to binary hypothesis testing Introduction of the channel coding problem 7.1 M-ary hypothesis testing
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationIntroduction to Statistical Inference
Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationBayesian Decision Theory
Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More information10. Composite Hypothesis Testing. ECE 830, Spring 2014
10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters
More informationECE531 Screencast 11.5: Uniformly Most Powerful Decision Rules
ECE531 Screencast 11.5: Uniformly Most Powerful Decision Rules D. Richard Brown III Worcester Polytechnic Institute Worcester Polytechnic Institute D. Richard Brown III 1 / 9 Monotone Likelihood Ratio
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationIntroduction to Detection Theory
Introduction to Detection Theory Detection Theory (a.k.a. decision theory or hypothesis testing) is concerned with situations where we need to make a decision on whether an event (out of M possible events)
More informationSTOCHASTIC PROCESSES, DETECTION AND ESTIMATION Course Notes
STOCHASTIC PROCESSES, DETECTION AND ESTIMATION 6.432 Course Notes Alan S. Willsky, Gregory W. Wornell, and Jeffrey H. Shapiro Department of Electrical Engineering and Computer Science Massachusetts Institute
More informationUniversity of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries
University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent
More informationDETECTION theory deals primarily with techniques for
ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for
More informationDetection and Estimation Theory
ESE 524 Detection and Estimation Theory Joseph A. O Sullivan Samuel C. Sachs Professor Electronic Systems and Signals Research Laboratory Electrical and Systems Engineering Washington University 2 Urbauer
More informationECE531 Homework Assignment Number 6 Solution
ECE53 Homework Assignment Number 6 Solution Due by 8:5pm on Wednesday 3-Mar- Make sure your reasoning and work are clear to receive full credit for each problem.. 6 points. Suppose you have a scalar random
More informationDetection and Estimation Theory
ESE 524 Detection and Estimation Theory Joseph A. O Sullivan Samuel C. Sachs Professor Electronic Systems and Signals Research Laboratory Electrical and Systems Engineering Washington University 2 Urbauer
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationParameter Estimation
1 / 44 Parameter Estimation Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay October 25, 2012 Motivation System Model used to Derive
More informationProblem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30
Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain
More informationLecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher
Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and
More informationLecture 8: Signal Detection and Noise Assumption
ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(0, σ I n n and S = [s, s,..., s n ] T
More informationDetection theory 101 ELEC-E5410 Signal Processing for Communications
Detection theory 101 ELEC-E5410 Signal Processing for Communications Binary hypothesis testing Null hypothesis H 0 : e.g. noise only Alternative hypothesis H 1 : signal + noise p(x;h 0 ) γ p(x;h 1 ) Trade-off
More informationDecision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over
Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we
More informationIf there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,
Recall The Neyman-Pearson Lemma Neyman-Pearson Lemma: Let Θ = {θ 0, θ }, and let F θ0 (x) be the cdf of the random vector X under hypothesis and F θ (x) be its cdf under hypothesis. Assume that the cdfs
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationECE531: Principles of Detection and Estimation Course Introduction
ECE531: Principles of Detection and Estimation Course Introduction D. Richard Brown III WPI 22-January-2009 WPI D. Richard Brown III 22-January-2009 1 / 37 Lecture 1 Major Topics 1. Web page. 2. Syllabus
More informationA Simple Example Binary Hypothesis Testing Optimal Receiver Frontend M-ary Signal Sets Message Sequences. possible signals has been transmitted.
Introduction I We have focused on the problem of deciding which of two possible signals has been transmitted. I Binary Signal Sets I We will generalize the design of optimum (MPE) receivers to signal sets
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationF2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1
Adaptive Filtering and Change Detection Bo Wahlberg (KTH and Fredrik Gustafsson (LiTH Course Organization Lectures and compendium: Theory, Algorithms, Applications, Evaluation Toolbox and manual: Algorithms,
More informationBayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory
Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology
More informationPhysics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester
Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationDigital Transmission Methods S
Digital ransmission ethods S-7.5 Second Exercise Session Hypothesis esting Decision aking Gram-Schmidt method Detection.K.K. Communication Laboratory 5//6 Konstantinos.koufos@tkk.fi Exercise We assume
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More information7 Gaussian Discriminant Analysis (including QDA and LDA)
36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing
ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationModule 2. Random Processes. Version 2, ECE IIT, Kharagpur
Module Random Processes Version, ECE IIT, Kharagpur Lesson 9 Introduction to Statistical Signal Processing Version, ECE IIT, Kharagpur After reading this lesson, you will learn about Hypotheses testing
More information+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing
Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationComposite Hypotheses and Generalized Likelihood Ratio Tests
Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationMore Spectral Clustering and an Introduction to Conjugacy
CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationDetection theory. H 0 : x[n] = w[n]
Detection Theory Detection theory A the last topic of the course, we will briefly consider detection theory. The methods are based on estimation theory and attempt to answer questions such as Is a signal
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing
ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute Worcester Polytechnic Institute D. Richard Brown III 1 / 8 Basics Hypotheses H 0
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationNotes on Discriminant Functions and Optimal Classification
Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem
More informationIN HYPOTHESIS testing problems, a decision-maker aims
IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO. 12, DECEMBER 2018 1845 On the Optimality of Likelihood Ratio Test for Prospect Theory-Based Binary Hypothesis Testing Sinan Gezici, Senior Member, IEEE, and
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationECE531 Screencast 5.5: Bayesian Estimation for the Linear Gaussian Model
ECE53 Screencast 5.5: Bayesian Estimation for the Linear Gaussian Model D. Richard Brown III Worcester Polytechnic Institute Worcester Polytechnic Institute D. Richard Brown III / 8 Bayesian Estimation
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationIN MOST problems pertaining to hypothesis testing, one
1602 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 11, NOVEMBER 2016 Joint Detection and Decoding in the Presence of Prior Information With Uncertainty Suat Bayram, Member, IEEE, Berkan Dulek, Member, IEEE,
More informationLecture 22: Error exponents in hypothesis testing, GLRT
10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationDetection and Estimation Theory
ESE 524 Detection and Estimation Theory Joseph A. O Sullivan Samuel C. Sachs Professor Electronic Systems and Signals Research Laboratory Electrical and Systems Engineering Washington University 211 Urbauer
More informationA.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace
A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical
More informationOn the Optimality of Likelihood Ratio Test for Prospect Theory Based Binary Hypothesis Testing
1 On the Optimality of Likelihood Ratio Test for Prospect Theory Based Binary Hypothesis Testing Sinan Gezici, Senior Member, IEEE, and Pramod K. Varshney, Life Fellow, IEEE Abstract In this letter, the
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationToday. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?
Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationEngineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers
Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:
More informationBayesian Decision Theory Lecture 2
Bayesian Decision Theory Lecture 2 Jason Corso SUNY at Buffalo 14 January 2009 J. Corso (SUNY at Buffalo) Bayesian Decision Theory Lecture 2 14 January 2009 1 / 58 Overview and Plan Covering Chapter 2
More informationThe Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11
Wishart Priors Patrick Breheny March 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Introduction When more than two coefficients vary, it becomes difficult to directly model each element
More informationDetection and Estimation Chapter 1. Hypothesis Testing
Detection and Estimation Chapter 1. Hypothesis Testing Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2015 1/20 Syllabus Homework:
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationStatistical learning. Chapter 20, Sections 1 4 1
Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationBayesian Decision Theory
Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More information