ECE531 Lecture 2b: Bayesian Hypothesis Testing

Size: px
Start display at page:

Download "ECE531 Lecture 2b: Bayesian Hypothesis Testing"

Transcription

1 ECE531 Lecture 2b: Bayesian Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 29-January-2009 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

2 Minimizing a Weighted Sum of Conditional Risks We would like to minimize all of the conditional risks R 0 (D),...,R (D), but usually we have to make some sort of tradeoff. A standard approach to solving a multi-objective optimization problem is to form a linear combination of the functions to be minimized: r(d,λ) := where λ j 0 and j λ j = 1. λ j R j (D) = λ R(D) The problem is then to solve the single-objective optimization problem: D = arg min D D r(d,λ) = arg min λ j c j Dp j D D where we are focusing on the case of finite Y for now. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

3 Minimizing a Weighted Sum of Conditional Risks The problem D = arg min r(d,λ) = arg min λ j c j Dp j D D D D is a finite-dimensional linear programming problem since the variables that we control (D ij ) are constrained by linear inequalities M 1 i=0 D ij 0 i {0,...,M 1} and j {0,...,L 1} D ij = 1 and the objective function r(d,λ) is linear in the variables D ij. For notational convenience, we call this minimization problem LPF(λ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

4 Decomposition: Part 1 Since R j (D) = c j Dp j = M 1 i=0 the problem LPF(λ) can be written as D il P lj C ij L 1 l=0 D M 1 = arg min λ j D D i=0 = arg min D D L 1 l=0 M 1 D il i=0 D il P lj C ij L 1 l=0 λ j C ij P lj } {{ } r l (d,λ) Note that we can minimize each term r l (d,λ) separately. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

5 Decomposition: Part 2 The lth subproblem is then d = arg min r l(d,λ) d R M M 1 = arg min d i λ j C ij P lj d i=0 subject to the constraints that d i 0 and i d i = 1. Note that d i D il from the original LPF(λ) problem. For notational convenience, we call this subproblem LPF l (λ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

6 Some Intuition: The Hiker Suppose you are going on a hike with a one liter drinking bottle and you must completely fill the bottle before your hike. At the general store, you can fill your bottle with any combination of: Tap water: $0.25 per liter Ice tea: $0.50 per liter Soda: $1.00 per liter Red Bull: $5.00 per liter You want to minimize your cost for the hike. What should you do? We can define a per-liter cost vector h = [0.25,0.50,1.00,5.00] and a proportional purchase vector ν R 4 with elements satisfying ν i 0 and i ν i = 1. Clearly, your cost h ν is minimized when ν = [1,0,0,0]. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

7 Solution to the Subproblem LPF l (λ) Theorem For fixed h R n, the function h ν is minimized over the set ν {R n : ν i 0 and i ν i = 1} by the vector ν with ν m = 1 and ν k = 0, k m, where m is any index with h m = min i {0,...,n 1} h i. Note that, if two or more commodities have the same minimum price, the solution is not unique. Back to our subproblem LPF l (λ)... " M 1 X # X d = arg min d i λ jc ijp lj d R M i=0 subject to the constraints that d i 0 and i d i = 1. The Theoreom tells us how to easily solve LPF l (λ). We just have to find the index " # X m l = arg min λ jc ijp lj i {0,...,M 1} and then set d ml = 1 and d i = 0 for all i m l. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

8 Solution to the Problem LPF(λ) Solving the subproblem LPF l (λ) for l = 0,...,L 1 and assembling the results of the subproblems into M L decision matrices yields a non-empty, finite set of deterministic decision matrices that solve LPF(λ). All deterministic decision matrices in the set have the same minimal cost: L 1 r (λ) = min λ i j C ij P lj l=0 It is easy to show that any randomized decision rule in the convex hull of this set of deterministic decision matrices also achieves r (λ), and hence is also optimal. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

9 Example Part 1 Bayesian Hypothesis Testing Let G il := λ jc ij P lj and G R M L be the matrix composed of elements G il. Suppose G = What is the solution to the subproblem LPF 0 (λ)? d = [1,0,0]. What is the solution to the problem LPF(λ)? D = Is this solution unique? Yes. What is r (λ)? r (λ) = 1.0. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

10 Example Part 2 Bayesian Hypothesis Testing What happens if G = ? The solution to LPF 2 (λ) is no longer unique. Both D = or D = solve LPF(λ) and achieve r (λ) = 1.1. In fact, any decision matrix of the form α 3 0 D = α for α [0,1] will also achieve r (λ) = 1.1. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

11 Remarks Note that the solution to the subproblem LPF l (λ) always yields at least one deterministic decision rule for the case y = y l. If n l elements of the column G l are equal to the minimum of the column, then there are n l deterministic decision rules solving the subproblem LPF l (λ). There are l n l 1 deterministic decision rules that solve LPF(λ). If there is more than one deterministic solution to LPF(λ) then there will also be a corresponding convex hull of randomized decision rules that also achieve r (λ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

12 Working Example: Effect of Changing the Weighting λ D R D D D8 D R0 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

13 Weighted Risk Level Sets Notice in the last example that, by varying our risk weighting λ, we can create a family of level sets that cover every point of the optimal tradeoff surface. Given a constant c R, the level set of value c is defined as L λ c := {x R N : λ x = c} 1 λ = [ ] T 1 λ = [ ] T x1 x x x0 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

14 Infinite Observation Spaces: Part 1 When the observations are specified by conditional pdfs, our conditional risk function becomes [ M 1 R j (ρ) = ρ i (y)c ij ]p j (y)dy y Y i=0 Recall that the randomized decision rule ρ i (y) specifies the probability of deciding H i when the observation is y. We still have N < states, so, as before, we can write a linear combination of the conditional risk functions as r(ρ,λ) := where λ j 0 and j λ j = 1. λ j R j (ρ) Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

15 Infinite Observation Spaces: Part 2 The problem is then to solve ρ = arg min ρ D r(ρ,λ) = arg min λ j ρ D y Y [ M 1 i=0 ρ i (y)c ij ]p j (y)dy where D is the set of all valid decision rules satisfying ρ i (y) 0 for all i = 0,...,M 1 and y Y, as well as M 1 i=0 ρ i(y) = 1 for all y Y. This is an infinite dimensional linear programming problem. We refer to this problem as LP(λ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

16 Infinite Observation Spaces: Decomposition Part 1 As before, we can decompose the big problem LP(λ) into lots of little subproblems, one for each observation y Y. We rewrite LP(λ) as ρ = arg min λ j ρ D = arg min ρ D y Y y Y M 1 i=0 [ M 1 i=0 ρ i (y)c ij ] p j (y)dy ρ i (y) λ j C ij p j (y) dy To minimize the integral, we can minimize the term M 1 (...) = ρ i (y) λ j C ij p j (y) at each fixed value of y. i=0 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

17 Infinite Observation Spaces: Decomposition Part 2 The subproblem of minimizing the term (... ) inside the integral when y = y 0 is a finite-dimensional linear programming problem (since X is still finite): M 1 ν = arg min ν i=0 ν i λ j C ij p j (y 0 ) subject to ν i 0 for all i = 0,...,M 1 and M 1 i=0 ν i = 1. We already know how to do this. We just find the index that has the lowest commodity cost m y0 = arg min i {0,...,M 1} λ j C ij p j (y 0 ) and set ν my0 = 1 and all other ν i = 0. Same technique as solving LPF l (λ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

18 Infinite Observation Spaces: Minimum Weighted Risk As we saw with finite Y, solving LP(λ) will yield at least one deterministic decision rule that achieves the minimum weighted risk r (λ) = λ j C ij p j (y)dy = = y Y y Y M 1 i=0 min i {0,...,M 1} min g i(y,λ)dy i {0,...,M 1} g i (y,λ)dy y Y i 1. The existence of at least one deterministic decision rule solving LPF(λ) or LP(λ) implies the existence of an optimal partition of Y. 2. Since the problems LPF(λ) and LP(λ) always yield at least one deterministic decision rule, another way to think of these problems is: Find a partition of the observation space that minimizes the weighted risk. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

19 Infinite Observation Spaces: Putting it All Together A decision rule that solves LP(λ) is then ρ i (y) = { 1 if i = arg min m {0,...,M 1} g m (y,λ) 0 otherwise where ρ (y) = [ρ 0 (y),...,ρ M 1 (y)]. Remarks: 1. The solution to LP(λ) may not be unique. 2. If there is more than one deterministic solution to LP(λ) then there will also be a corresponding convex hull of randomized decision rules that also achieve r (λ). 3. Note that the standard convention for deterministic decision rules is δ : Y Z. An equivalent way of specifying a deterministic decision rule that solves LP(λ) is δ (y) = arg min g i(y,λ). i {0,...,M 1} Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

20 The Bayesian Approach We assume a prior state distribution π P N such that Prob(state is x j ) = π j Note that this is our model of the state probabilities prior to the observation. We denote the Bayes Risk of the decision rule ρ as r(ρ,π) = π j R j (ρ) = π j y Y M 1 p j (y) C ij ρ i (y)dy. This is simply the weighted overall risk, or average risk, given our prior belief of the state probabilities. A decision rule that minimizes this risk is called a Bayes decision rule for the prior π. i=0 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

21 Solving Bayesian Hypothesis Testing Problems We know how to solve this problem. For finite Y, we just find the index m l = arg min π j C ij P lj i {0,...,M 1} } {{ } G il and set Dm Bπ l,l = 1 and DBπ i,l = 0 for all i m l. For infinite Y, we just find the index m y0 = arg min i {0,...,M 1} π j C ij p j (y 0 ) } {{ } g i (y 0,π) and set δ Bπ (y 0 ) = m y0 (or, equivalently, set ρ Bπ m y0 (y 0 ) = 1 and ρ Bπ i (y 0 ) = 0 for all i m y0 ). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

22 Prior and Posterior Probabilities The conditional probability that H j is true given the observation y: where π j (y) := Prob(H j is true Y = y) = p j(y)π j p(y) p(y) = π j p j (y). Recall that π j is the prior probability of choosing H j, before we have any observations. The quantity π j (y) is the posterior probability of choosing H j, conditioned on the observation y. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

23 A Bayes Decision Rule Minimizes The Posterior Cost General expression for a deterministic Bayes decision rule: δ Bπ (y) = arg But π j p j (y) = π j (y)p(y), hence δ Bπ (y) = arg min i {0,...,M 1} min i {0,...,M 1} π j C ij p j (y) C ij π j (y)p(y) = arg min C ij π j (y) i {0,...,M 1} since p(y) does not affect the minimizer. What does this mean? C ijπ j (y) the average cost of choosing hypothesis H i given Y = y, i.e. the posterior cost of choosing H i. The Bayes decision rule chooses the hypothesis that yields the minimum expected posterior cost. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

24 Bayesian Hypothesis Testing with UCA: Part 1 The uniform cost assignment (UCA): { 0 if x j H i C ij = 1 otherwise The conditional risk R j (ρ) under the UCA is simply the probability of not choosing the hypothesis that contains x j, i.e. the probability of error when the state is x j. The Bayes risk in this case is r(ρ, π) = π j R j (ρ) = Prob(error). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

25 Bayesian Hypothesis Testing with UCA: Part 2 Under the UCA, a Bayes decision rule can be written in terms of the posterior probabilities as δ Bπ (y) = arg min i {0,...,M 1} = arg min i {0,...,M 1} = arg min i {0,...,M 1} = arg max i {0,...,M 1} C ij π j (y) π j (y) x j / H i 1 π j (y) x j H i π j (y) x j H i Hence, for hypothesis tests under the UCA, the Bayes decision rule is the MAP (maximum a posteriori) decision rule. When the hypothesis test is simple, δ Bπ (y) = arg max i π i (y). Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

26 Bayesian Hypothesis Testing with UCA and Uniform Prior Under the UCA and a uniform prior, i.e. π j = 1/N for all j = 0,...,N 1 δ Bπ (y) = arg max i {0,...,M 1} = arg max i {0,...,M 1} = arg max i {0,...,M 1} since π j = 1/N does not affect the minimizer. π j (y) x j H i π j p j (y) x j H i p j (y) x j H i In this case, δ Bπ (y) selects the most likely hypothesis, i.e. the hypothesis which best explains the observation y. This is called the maximum likelihood (ML) decision rule. Which is better, MAP or ML? Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

27 Simple Binary Bayesian Hypothesis Testing: Part 1 We have two states x 0 and x 1 and two hypotheses H 0 and H 1. For each y Y, our problem is to compute where m y = arg min i {0,1} g i(y,π) g 0 (y,π) = π 0 C 00 p 0 (y) + π 1 C 01 p 1 (y) g 1 (y,π) = π 0 C 10 p 0 (y) + π 1 C 11 p 1 (y) We only have two things to compare. We can simplify this comparison: g 0(y,π) g 1(y, π) π 0C 00p 0(y) + π 1C 01p 1(y) π 0C 10p 0(y) + π 1C 11p 1(y) p 1(y)π 1(C 01 C 11) p 0(y)π 0(C 10 C 00) p1(y) π0(c10 C00) p 0(y) π 1(C 01 C 11) where we have assumed that C 01 > C 11 to get the final result. The expression L(y) := p 1(y) p 0 (y) is known as the likelihood ratio. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

28 Simple Binary Bayesian Hypothesis Testing: Part 2 Given L(y) := p 1(y) p 0 (y) and τ := π 0(C 10 C 00 ) π 1 (C 01 C 11 ), our Bayes decision rule is then simply 1 if L(y) > τ δ Bπ (y) = 0/1 if L(y) = τ 0 if L(y) < τ. Remark: For any y Y that result in L(y) = τ, the Bayes risk is the same whether we decide H 0 or H 1. You can deal with this by always deciding H 0 in this case, or always deciding H 1, or flipping a coin, etc. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

29 Performance of Simple Binary Bayesian Hypothesis Testing Since the observation Y is a random variable, so is the likelihood ratio L(Y ). Assume that L(Y ) has a conditional pdf Λ j (t) when the state is x j. When the state is x 0, the probability of making an error (deciding H 1 ) is P false positive = Prob(L(y) > τ state is x 0 ) = τ Λ 0 (t)dt Similarly, when the state is x 1, the probability of making an error (deciding H 0 ) is P false negative = Prob(L(y) τ state is x 1 ) = What is the overall (unconditional) probability of error? τ Λ 1 (t)dt Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

30 Simple Binary Bayesian Hypothesis Testing with UCA Uniform cost assignment: C 00 = C 11 = 0 C 01 = C 10 = 1 In this case, the discriminant functions are simply g 0 (y,π) = π 1 p 1 (y) = π 1 (y)p(y) g 1 (y,π) = π 0 p 0 (y) = π 0 (y)p(y) and a Bayes decision rule can be written in terms of the posterior probabilities as 1 if π 1 (y) > π 0 (y) δ Bπ (y) = 0/1 if π 1 (y) = π 0 (y) 0 if π 1 (y) < π 0 (y). In this case, it should be clear that the Bayes decision rule is the MAP (maximum a posteriori) decision rule. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

31 Example: Coherent Detection of BPSK Suppose a transmitter sends one of two scalar signals and the signals arrive at a receiver corrupted by zero-mean additive white Gaussian noise (AWGN) with variance σ 2. We want to use Bayesian hypothesis testing to determine which signal was sent. Signal model conditioned on state x j : Y = a j + η where a j is the scalar signal and η N(0,σ 2 ). Hence ( 1 (y aj ) 2 ) p j (y) = exp 2πσ 2σ 2 Hypotheses: H 0 : a 0 was sent, or Y N(a 0,σ 2 ) H 1 : a 1 was sent, or Y N(a 1,σ 2 ) Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

32 Example: Coherent Detection of BPSK This is just a simple binary hypothesis testing problem. Under the UCA, we can write ( 1 (y a0 ) 2 ) R 0 (ρ) = exp y Y 1 2πσ 2σ 2 dy ( (y a1 ) 2 ) R 1 (ρ) = exp 2πσ 2σ 2 dy y Y 0 1 where Y j = {y Y : ρ j (y) = 1}. Intuitively, a 0 a 1 Y 0 Y 1 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

33 Example: Coherent Detection of BPSK Given a prior π 0, π 1 = 1 π 0, we can write the Bayes risk as r(ρ,π) = π 0 R 0 (ρ) + (1 π 0 )R 1 (ρ) The Bayes decision rule can be found by computing the likelihood ratio and comparing it to the threshold τ: p 1 (y) exp p 0 (y) > τ exp ( ) (y a1 ) 2 2σ ( 2 (y a0 ) 2 2σ 2 ) > π 0 π 1 (y a 0) 2 (y a 1 ) 2 2σ 2 > ln π 0 π 1 y > a 0 + a σ2 a 1 a 0 ln π 0 π 1 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

34 Example: Coherent Detection of BPSK Let γ := a 0+a σ2 a 1 a 0 ln π 0 π 1. Our Bayes decision rule is then : 1 if y > γ δ Bπ (y) = 0/1 if y = γ 0 if y < γ. Using this decision rule, the Bayes risk is then ( ) ( ) γ r(δ Bπ a0 a1 γ,π) = π 0 Q + (1 π 0 )Q σ σ where Q(x) := x 1 2π e t2 /2 dt. Remarks: When π 0 = π 1 = 1 2, the decision boundary is simply a 0+a 1 2, i.e. the midpoint between a 0 and a 1 and r(δ Bπ,π) = Q ( a 1 a 0 ) 2σ. When σ is very small, the prior has little effect on the decision boundary. When σ is very large, the prior becomes more important. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

35 Composite Binary Bayesian Hypothesis Testing We have N > 2 states x 0,...,x and two hypotheses H 0 and H 1 that form a valid partition on these states. For each y Y, we compute m y = arg min g i(y,π), where g i (y,π) = π j C ij p j (y) i {0,1} As was the case of simple binary Bayesian hypothesis testing, we only have two things to compare. The comparison boils down to j g 0 (y,π) g 1 (y,π) J(y) := π jc 0j p j (y) j π jc 1j p j (y) 1. Hence, a Bayes decision rule for the composite binary hypothesis testing problem is then simply 1 if J(y) > 1 δ Bπ (y) = 0/1 if J(y) = 1 0 if J(y) < 1. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

36 Composite Binary Bayesian Hypothesis Testing with UCA For i = 0,1 and j = 0,...,N 1, the UCA is: { 0 if x j H i C ij = 1 otherwise Hence, the discriminant functions are g i (y,π) = π j p j (y) = 1 x j H c i x j H i π j p j (y) = 1 x j H i π j (y)p(y) Under the UCA, the hypothesis test comparison boils down to x g 0 (y,π) g 1 (y,π) J(y) := j H 1 π j (y) x j H 0 π j (y) 1. and the Bayes decision rule δ Bπ (y) is the same as before. Note that P x J(y) := j H 1 π j(y) P x j H 0 π = Prob(H1 y) j(y) Prob(H 0 y) so, as before, the Bayes decision rule is the MAP decision rule. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

37 Final Remarks on Bayesian Hypothesis Testing 1. A Bayes decision rule minimizes the Bayes risk r(ρ,π) = j π jr j (ρ) under the prior state distribution π. 2. At least one deterministic decision rule δ Bπ (y) achieves the minimimum Bayes risk for any prior state distribution π. 3. The prior state distribution π can also be thought of as a parameter. By varying π, we can generate a family of Bayes decision rules δ Bπ parameterized by π. This family of Bayes decision rules can achieve any operating point on the optimal tradeoff surface (or ROC curve). 4. In some applications, a prior state distribution is difficult to determine. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

38 Bayesian Hypothesis Testing with an Incorrect Prior CRV of deterministic decision rules minimum Bayes risk, our prior [ ] minimum Bayes risk, true prior [ ] actual Bayes risk R r= r= R0 Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

39 Conclusions and Motivation for the Minimax Criterion If the true prior state distribution π true is far from our belief π, a Bayes detector δ Bπ may result in a significant loss in performance. One way to select a decision rule that is robust with respect to uncertainty in the prior is to minimize max π r(ρ,π) = max j R j (ρ). The minimax criterion leads to a decision rule that minimizes the Bayes risk for the least favorable prior distribution. This minimax criterion is conservative, but it allows one to guarantee performance over all possible priors. Worcester Polytechnic Institute D. Richard Brown III 29-January / 39

ECE531 Lecture 4b: Composite Hypothesis Testing

ECE531 Lecture 4b: Composite Hypothesis Testing ECE531 Lecture 4b: Composite Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute 16-February-2011 Worcester Polytechnic Institute D. Richard Brown III 16-February-2011 1 / 44 Introduction

More information

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters

ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters ECE531 Lecture 6: Detection of Discrete-Time Signals with Random Parameters D. Richard Brown III Worcester Polytechnic Institute 26-February-2009 Worcester Polytechnic Institute D. Richard Brown III 26-February-2009

More information

ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations

ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations ECE531 Screencast 9.2: N-P Detection with an Infinite Number of Possible Observations D. Richard Brown III Worcester Polytechnic Institute Worcester Polytechnic Institute D. Richard Brown III 1 / 7 Neyman

More information

ECE531 Lecture 2a: A Mathematical Model for Hypothesis Testing (Finite Number of Possible Observations)

ECE531 Lecture 2a: A Mathematical Model for Hypothesis Testing (Finite Number of Possible Observations) ECE531 Lecture 2a: A Mathematical Model for Hypothesis Testing (Finite Number of Possible Observations) D. Richard Brown III Worcester Polytechnic Institute 26-January-2011 Worcester Polytechnic Institute

More information

ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals

ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals ECE531 Lecture 13: Sequential Detection of Discrete-Time Signals D. Richard Brown III Worcester Polytechnic Institute 30-Apr-2009 Worcester Polytechnic Institute D. Richard Brown III 30-Apr-2009 1 / 32

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

ECE531 Homework Assignment Number 2

ECE531 Homework Assignment Number 2 ECE53 Homework Assignment Number 2 Due by 8:5pm on Thursday 5-Feb-29 Make sure your reasoning and work are clear to receive full credit for each problem 3 points A city has two taxi companies distinguished

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction

More information

EE 574 Detection and Estimation Theory Lecture Presentation 8

EE 574 Detection and Estimation Theory Lecture Presentation 8 Lecture Presentation 8 Aykut HOCANIN Dept. of Electrical and Electronic Engineering 1/14 Chapter 3: Representation of Random Processes 3.2 Deterministic Functions:Orthogonal Representations For a finite-energy

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Chapter 2 Signal Processing at Receivers: Detection Theory

Chapter 2 Signal Processing at Receivers: Detection Theory Chapter Signal Processing at Receivers: Detection Theory As an application of the statistical hypothesis testing, signal detection plays a key role in signal processing at receivers of wireless communication

More information

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing

Lecture 7. Union bound for reducing M-ary to binary hypothesis testing Lecture 7 Agenda for the lecture M-ary hypothesis testing and the MAP rule Union bound for reducing M-ary to binary hypothesis testing Introduction of the channel coding problem 7.1 M-ary hypothesis testing

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

10. Composite Hypothesis Testing. ECE 830, Spring 2014

10. Composite Hypothesis Testing. ECE 830, Spring 2014 10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters

More information

ECE531 Screencast 11.5: Uniformly Most Powerful Decision Rules

ECE531 Screencast 11.5: Uniformly Most Powerful Decision Rules ECE531 Screencast 11.5: Uniformly Most Powerful Decision Rules D. Richard Brown III Worcester Polytechnic Institute Worcester Polytechnic Institute D. Richard Brown III 1 / 9 Monotone Likelihood Ratio

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Introduction to Detection Theory

Introduction to Detection Theory Introduction to Detection Theory Detection Theory (a.k.a. decision theory or hypothesis testing) is concerned with situations where we need to make a decision on whether an event (out of M possible events)

More information

STOCHASTIC PROCESSES, DETECTION AND ESTIMATION Course Notes

STOCHASTIC PROCESSES, DETECTION AND ESTIMATION Course Notes STOCHASTIC PROCESSES, DETECTION AND ESTIMATION 6.432 Course Notes Alan S. Willsky, Gregory W. Wornell, and Jeffrey H. Shapiro Department of Electrical Engineering and Computer Science Massachusetts Institute

More information

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries

University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout 2:. The Multivariate Gaussian & Decision Boundaries University of Cambridge Engineering Part IIB Module 3F3: Signal and Pattern Processing Handout :. The Multivariate Gaussian & Decision Boundaries..15.1.5 1 8 6 6 8 1 Mark Gales mjfg@eng.cam.ac.uk Lent

More information

DETECTION theory deals primarily with techniques for

DETECTION theory deals primarily with techniques for ADVANCED SIGNAL PROCESSING SE Optimum Detection of Deterministic and Random Signals Stefan Tertinek Graz University of Technology turtle@sbox.tugraz.at Abstract This paper introduces various methods for

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 524 Detection and Estimation Theory Joseph A. O Sullivan Samuel C. Sachs Professor Electronic Systems and Signals Research Laboratory Electrical and Systems Engineering Washington University 2 Urbauer

More information

ECE531 Homework Assignment Number 6 Solution

ECE531 Homework Assignment Number 6 Solution ECE53 Homework Assignment Number 6 Solution Due by 8:5pm on Wednesday 3-Mar- Make sure your reasoning and work are clear to receive full credit for each problem.. 6 points. Suppose you have a scalar random

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 524 Detection and Estimation Theory Joseph A. O Sullivan Samuel C. Sachs Professor Electronic Systems and Signals Research Laboratory Electrical and Systems Engineering Washington University 2 Urbauer

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Parameter Estimation

Parameter Estimation 1 / 44 Parameter Estimation Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay October 25, 2012 Motivation System Model used to Derive

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Lecture 3 STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher Previous lectures What is machine learning? Objectives of machine learning Supervised and

More information

Lecture 8: Signal Detection and Noise Assumption

Lecture 8: Signal Detection and Noise Assumption ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Lecture 8: Signal Detection and Noise Assumption Signal Detection : X = W H : X = S + W where W N(0, σ I n n and S = [s, s,..., s n ] T

More information

Detection theory 101 ELEC-E5410 Signal Processing for Communications

Detection theory 101 ELEC-E5410 Signal Processing for Communications Detection theory 101 ELEC-E5410 Signal Processing for Communications Binary hypothesis testing Null hypothesis H 0 : e.g. noise only Alternative hypothesis H 1 : signal + noise p(x;h 0 ) γ p(x;h 1 ) Trade-off

More information

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over

Decision theory. 1 We may also consider randomized decision rules, where δ maps observed data D to a probability distribution over Point estimation Suppose we are interested in the value of a parameter θ, for example the unknown bias of a coin. We have already seen how one may use the Bayesian method to reason about θ; namely, we

More information

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell,

If there exists a threshold k 0 such that. then we can take k = k 0 γ =0 and achieve a test of size α. c 2004 by Mark R. Bell, Recall The Neyman-Pearson Lemma Neyman-Pearson Lemma: Let Θ = {θ 0, θ }, and let F θ0 (x) be the cdf of the random vector X under hypothesis and F θ (x) be its cdf under hypothesis. Assume that the cdfs

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

ECE531: Principles of Detection and Estimation Course Introduction

ECE531: Principles of Detection and Estimation Course Introduction ECE531: Principles of Detection and Estimation Course Introduction D. Richard Brown III WPI 22-January-2009 WPI D. Richard Brown III 22-January-2009 1 / 37 Lecture 1 Major Topics 1. Web page. 2. Syllabus

More information

A Simple Example Binary Hypothesis Testing Optimal Receiver Frontend M-ary Signal Sets Message Sequences. possible signals has been transmitted.

A Simple Example Binary Hypothesis Testing Optimal Receiver Frontend M-ary Signal Sets Message Sequences. possible signals has been transmitted. Introduction I We have focused on the problem of deciding which of two possible signals has been transmitted. I Binary Signal Sets I We will generalize the design of optimum (MPE) receivers to signal sets

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1

F2E5216/TS1002 Adaptive Filtering and Change Detection. Course Organization. Lecture plan. The Books. Lecture 1 Adaptive Filtering and Change Detection Bo Wahlberg (KTH and Fredrik Gustafsson (LiTH Course Organization Lectures and compendium: Theory, Algorithms, Applications, Evaluation Toolbox and manual: Algorithms,

More information

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory

Bayesian decision theory Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Bayesian decision theory 8001652 Introduction to Pattern Recognition. Lectures 4 and 5: Bayesian decision theory Jussi Tohka jussi.tohka@tut.fi Institute of Signal Processing Tampere University of Technology

More information

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Choosing Priors and the Principle of Maximum Entropy. Department of Physics and Astronomy University of Rochester Physics 403 Choosing Priors and the Principle of Maximum Entropy Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Odds Ratio Occam Factors

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Digital Transmission Methods S

Digital Transmission Methods S Digital ransmission ethods S-7.5 Second Exercise Session Hypothesis esting Decision aking Gram-Schmidt method Detection.K.K. Communication Laboratory 5//6 Konstantinos.koufos@tkk.fi Exercise We assume

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

7 Gaussian Discriminant Analysis (including QDA and LDA)

7 Gaussian Discriminant Analysis (including QDA and LDA) 36 Jonathan Richard Shewchuk 7 Gaussian Discriminant Analysis (including QDA and LDA) GAUSSIAN DISCRIMINANT ANALYSIS Fundamental assumption: each class comes from normal distribution (Gaussian). X N(µ,

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing

ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing ORF 245 Fundamentals of Statistics Chapter 9 Hypothesis Testing Robert Vanderbei Fall 2014 Slides last edited on November 24, 2014 http://www.princeton.edu/ rvdb Coin Tossing Example Consider two coins.

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur

Module 2. Random Processes. Version 2, ECE IIT, Kharagpur Module Random Processes Version, ECE IIT, Kharagpur Lesson 9 Introduction to Statistical Signal Processing Version, ECE IIT, Kharagpur After reading this lesson, you will learn about Hypotheses testing

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Composite Hypotheses and Generalized Likelihood Ratio Tests

Composite Hypotheses and Generalized Likelihood Ratio Tests Composite Hypotheses and Generalized Likelihood Ratio Tests Rebecca Willett, 06 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

More Spectral Clustering and an Introduction to Conjugacy

More Spectral Clustering and an Introduction to Conjugacy CS8B/Stat4B: Advanced Topics in Learning & Decision Making More Spectral Clustering and an Introduction to Conjugacy Lecturer: Michael I. Jordan Scribe: Marco Barreno Monday, April 5, 004. Back to spectral

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

Detection theory. H 0 : x[n] = w[n]

Detection theory. H 0 : x[n] = w[n] Detection Theory Detection theory A the last topic of the course, we will briefly consider detection theory. The methods are based on estimation theory and attempt to answer questions such as Is a signal

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing

ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing ECE531 Screencast 11.4: Composite Neyman-Pearson Hypothesis Testing D. Richard Brown III Worcester Polytechnic Institute Worcester Polytechnic Institute D. Richard Brown III 1 / 8 Basics Hypotheses H 0

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Notes on Discriminant Functions and Optimal Classification

Notes on Discriminant Functions and Optimal Classification Notes on Discriminant Functions and Optimal Classification Padhraic Smyth, Department of Computer Science University of California, Irvine c 2017 1 Discriminant Functions Consider a classification problem

More information

IN HYPOTHESIS testing problems, a decision-maker aims

IN HYPOTHESIS testing problems, a decision-maker aims IEEE SIGNAL PROCESSING LETTERS, VOL. 25, NO. 12, DECEMBER 2018 1845 On the Optimality of Likelihood Ratio Test for Prospect Theory-Based Binary Hypothesis Testing Sinan Gezici, Senior Member, IEEE, and

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

ECE531 Screencast 5.5: Bayesian Estimation for the Linear Gaussian Model

ECE531 Screencast 5.5: Bayesian Estimation for the Linear Gaussian Model ECE53 Screencast 5.5: Bayesian Estimation for the Linear Gaussian Model D. Richard Brown III Worcester Polytechnic Institute Worcester Polytechnic Institute D. Richard Brown III / 8 Bayesian Estimation

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

IN MOST problems pertaining to hypothesis testing, one

IN MOST problems pertaining to hypothesis testing, one 1602 IEEE SIGNAL PROCESSING LETTERS, VOL. 23, NO. 11, NOVEMBER 2016 Joint Detection and Decoding in the Presence of Prior Information With Uncertainty Suat Bayram, Member, IEEE, Berkan Dulek, Member, IEEE,

More information

Lecture 22: Error exponents in hypothesis testing, GLRT

Lecture 22: Error exponents in hypothesis testing, GLRT 10-704: Information Processing and Learning Spring 2012 Lecture 22: Error exponents in hypothesis testing, GLRT Lecturer: Aarti Singh Scribe: Aarti Singh Disclaimer: These notes have not been subjected

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 CS students: don t forget to re-register in CS-535D. Even if you just audit this course, please do register.

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Detection and Estimation Theory

Detection and Estimation Theory ESE 524 Detection and Estimation Theory Joseph A. O Sullivan Samuel C. Sachs Professor Electronic Systems and Signals Research Laboratory Electrical and Systems Engineering Washington University 211 Urbauer

More information

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace

A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I. kevin small & byron wallace A.I. in health informatics lecture 2 clinical reasoning & probabilistic inference, I kevin small & byron wallace today a review of probability random variables, maximum likelihood, etc. crucial for clinical

More information

On the Optimality of Likelihood Ratio Test for Prospect Theory Based Binary Hypothesis Testing

On the Optimality of Likelihood Ratio Test for Prospect Theory Based Binary Hypothesis Testing 1 On the Optimality of Likelihood Ratio Test for Prospect Theory Based Binary Hypothesis Testing Sinan Gezici, Senior Member, IEEE, and Pramod K. Varshney, Life Fellow, IEEE Abstract In this letter, the

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use?

Today. Statistical Learning. Coin Flip. Coin Flip. Experiment 1: Heads. Experiment 1: Heads. Which coin will I use? Which coin will I use? Today Statistical Learning Parameter Estimation: Maximum Likelihood (ML) Maximum A Posteriori (MAP) Bayesian Continuous case Learning Parameters for a Bayesian Network Naive Bayes Maximum Likelihood estimates

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers

Engineering Part IIB: Module 4F10 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Engineering Part IIB: Module 4F0 Statistical Pattern Processing Lecture 5: Single Layer Perceptrons & Estimating Linear Classifiers Phil Woodland: pcw@eng.cam.ac.uk Michaelmas 202 Engineering Part IIB:

More information

Bayesian Decision Theory Lecture 2

Bayesian Decision Theory Lecture 2 Bayesian Decision Theory Lecture 2 Jason Corso SUNY at Buffalo 14 January 2009 J. Corso (SUNY at Buffalo) Bayesian Decision Theory Lecture 2 14 January 2009 1 / 58 Overview and Plan Covering Chapter 2

More information

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11

The Wishart distribution Scaled Wishart. Wishart Priors. Patrick Breheny. March 28. Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Wishart Priors Patrick Breheny March 28 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/11 Introduction When more than two coefficients vary, it becomes difficult to directly model each element

More information

Detection and Estimation Chapter 1. Hypothesis Testing

Detection and Estimation Chapter 1. Hypothesis Testing Detection and Estimation Chapter 1. Hypothesis Testing Husheng Li Min Kao Department of Electrical Engineering and Computer Science University of Tennessee, Knoxville Spring, 2015 1/20 Syllabus Homework:

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

Statistical learning. Chapter 20, Sections 1 4 1

Statistical learning. Chapter 20, Sections 1 4 1 Statistical learning Chapter 20, Sections 1 4 Chapter 20, Sections 1 4 1 Outline Bayesian learning Maximum a posteriori and maximum likelihood learning Bayes net learning ML parameter learning with complete

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Lecture 2: Statistical Decision Theory (Part I)

Lecture 2: Statistical Decision Theory (Part I) Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical

More information

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics

DS-GA 1002 Lecture notes 11 Fall Bayesian statistics DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Bayesian Decision Theory Bayesian classification for normal distributions Error Probabilities

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information