S T A T I S T I C S. Jan Ámos Víšek 2008/09. Institute of Economic Studies, Faculty of Social Sciences Charles University, Prague

Size: px

Start display at page:

Download "S T A T I S T I C S. Jan Ámos Víšek 2008/09. Institute of Economic Studies, Faculty of Social Sciences Charles University, Prague"

Lydia Austin
5 years ago
Views:

1 S T A T I S T I C S (THE THIRTEENTH LECTURE) 2008/09 Institute of Economic Studies, Faculty of Social Sciences Charles University, Prague visek/statistika/

2 Content of lecture 1 Recalling an example - an inspiration for the summer term Regression model 2 What it is an estimator - definition Basic types of estimators - point versus interval ones 3

3 Regression model In the second lecture we have considered 1 REGRESSION MODEL. In one of the previous lectures we have recalled it and used as a motivation, e.g. for introducing Varadarajan theorem. Do you remember what is it?

4 Regression model REGRESSION MODEL Y i = X i β0 + ε i = p j=1 X ij β 0 j + ε i, i = 1, 2,..., n Y i - response variable (vysvětlovaná veličina) What do - we need to learn? (for i-th object) X i R p - explanatory variables (vysvětlující proměnné) β 0 1 A construction of an estimator ˆβ - regression coefficients (regresní koeficienty) of the regression coefficients β 0. ε i - error term (chybový člen) 2 Test of hypothesis that β 0 = Galton, F. (1886): Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute vol. 15,

5 Regression model 3 And these two tasks will be our topics for the rest of summer term Testing statistical hypotheses. Keep in mind: To guess is cheap, to guess wrongly is (extremely) expensive. Chinese proverb

6 What it is an estimator - definition Basic types of estimators - point versus interval ones Returning once again to REGRESSION MODEL 1 Y i = X i β0 + ε i = p j=1 X ij βj 0 + ε i, i = 1, 2,..., n LetYus i analyze -situation response and variable decide what (vysvětlovaná to do: veličina) 1 Somebody brought - data, say (for i-th object) X i R p - explanatory variables y 1,(vysvětlující x 11,, xproměnné) 1p β 0 - regression coefficients (regresní koeficienty) (Y ε i - error term (n), X (chybový (n) y 2, x 21,, x 2p ) = člen)..... y n, x n1,, x np 2 She/he assumes that we shall create a (regression) model. Galton, F. (1886): Regression towards mediocrity in hereditary stature. Journal of the Anthropological Institute vol. 15, A (very) first step is to estimate β 0 s by an estimate ˆβ = ˆβ ( (Y (n), X (n) ) ).

7 What it is an estimator - definition Basic types of estimators - point versus interval ones Conclusion and further considerations: The estimate is to be a function of data, i.e. estimate is a mapping (zobrazení) from the observational space to the parameter space. ( ) ( ) Formally, ˆβ (Y (n), X (n) ) : ( Y (n), X (n) ) R p. 2 As the data are realization of some r.v. s, it would be reasonable to express estimated knowledge about the true value of parmeter in probabilistic assertions. We shall plug into the estimate ˆβ ( (Y (n), X (n) ) ) the random variables Y (n) (ω) and X (n) (ω). So we obtain a mapping ( ) ˆβ (Y (n), X (n) ) : Ω R p. We shall call it estimator (after adding some details, see next slide).

8 What it is an estimator - definition Basic types of estimators - point versus interval ones So: To be able to formulate and to prove probabilistic assertions, the estimator has to be measurable. 3 DEFINITION : Estimator The estimator is a random variable. REMARK 1. Values of the estimator fall at a parameter space which is usually a part of l-dimensional Euclidean space, (for a (general) framework see next slide). REMARK 2. Notice that the (numerical) value of estimator was called (on the previous slide) estimate.

9 What it is an estimator - definition Basic types of estimators - point versus interval ones The English used by statisticians (all over the world) has two words for odhad : 4 estimator and estimate. Some authors use the word estimator for ( ( )) ˆβ Y (n) (ω), X (n) (ω) : Ω R p. while the word estimate for the value of the estimator at given data. We shall keep this convention as it facilitates reading and understanding the text.

10 What it is an estimator - definition Basic types of estimators - point versus interval ones A (general) framework for establishing an estimator can be: 1 Let (Ω, A, P) and (R, B) be a probability and measurable space, respectively. 2 We shall assume that the parameter space (we ll use in the next) is part of R p (for some p N ). 5 3 Consider a sequence of (i.i.d.) r.v. s {X n } n=1 goverened by an unknown d.f. from a family {F θ (x)} θ Θ. (θ (Θ) is theeta (capital theeta ), in Czech theta (velké theta ))

11 What it is an estimator - definition Basic types of estimators - point versus interval ones A (general) framework for establishing... be (continued) : 1 By some technique of constructing etimators establish (n N ) an estimator, i.e. a mapping 6 ˆθ n (X 1 (ω), X 2 (ω),..., X n (ω)) : Ω Θ which is (B, A) measurable. 2 Prove (all) required properties of the estimator ˆθ n (we ll discuss them later). 3 Find a reliable, sufficiently quick algorithm for evaluating ˆθ n and implement (or get implement) it.

12 What it is an estimator - definition Basic types of estimators - point versus interval ones REMARK 3. Let s assume, we have at hand a data x (n) = (x 1, x 2,..., x n ). 7 We can consider it so that somebody selected an ω 0 Ω and informed us that the values of the first n r.v. s of the sequence of (i.i.d.) r.v. s {X n } n=1 at ω 0 are just x 1, x 2,..., x n, i.e. (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )) = (x 1, x 2,..., x n ) (denote (x 1, x 2,..., x n ) by x (n) ). We may evaluate the estimate as ˆθ n = ˆθ n (x (n) ) = ˆθ n (x 1, x 2,..., x n ) = ˆθ n (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )).

13 What it is an estimator - definition Basic types of estimators - point versus interval ones Consider an example of Normal distributions 1 {F θ (x)} θ Θ = { Φ µ,σ 2(x) }, µ R,σ 2 R + so θ = (µ, σ 2 ) Θ = (R R + ) and hence ) ˆθ n = (ˆθ n1, ˆθ n2 = (ˆµ n, ˆσ n) 2. If and ˆµ n (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )) : Ω R ˆσ n 2 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )) : Ω R +, we speak about the point estimator, and we assume that with large probability for some small δ > 0 ˆµ n (µ δ, µ + δ) and ˆσ n 2 (σ 2 δ, σ 2 + δ).

14 What it is an estimator - definition Basic types of estimators - point versus interval ones If ˆµ n is an interval, i.e. ˆµ n : Ω R R, and hence 2 ˆµ n (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )) = (ˆµ n1 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )), ˆµ n2 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 ))) and ˆσ 2 n : Ω R + R + ˆσ 2 n (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )) = (ˆσ 2 n1 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )), ˆσ 2 n2 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 ))), we speak about the interval estimator, and we assume that with large probability (ˆµ n1 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )), ˆµ n2 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 ))) µ and (ˆσ 2 n1 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 )), ˆσ 2 n2 (X 1 (ω 0 ), X 2 (ω 0 ),..., X n (ω 0 ))) σ 2.

15 What is a history of studying estimation? 1

16 What is a history of studying estimation? Pierre Simon Adrien Marie Carl Friedrich Laplace Legendre Gauss Thomas Ronald Aylmer Edwin James Bayes Fisher Pitman

17 3 Types of estimators 1 (momentová metoda) 2 Maximum likelihood (maximálně věrohodné odhady) 3 The least squares (nejmenší čtverce) 4 Minimum distance (metoda minimalizující vzdálenost) 5 Many others - minimum volume, weighted, etc.

18 Let us recall - as an inspiring example 1 THE SECOND KOLMOGOROV S THEOREM : Alternative conditions for the strong law of large numbers Let {X n } n=1 be a sequence of i.i.d. r.v. s. Then IEX 1 = µ exists iff 1 n n X l µ a. e. as n.

19 Another example 2 COROLLARY Let {X n } n=1 be a sequence of i.i.d. r.v. s. Then IEX 2 1 = σ2 + µ 2 exists iff Then also 1 n n 1 n n Xl 2 σ 2 + µ 2 a. e. as n. X 2 l ( 1 n ) 2 n X l σ 2 a. e. as n.

20 More generaly: 3 LEMMA Let {X n } n=1 be a sequence of i.i.d. r.v. s. goverened by an unknown d.f. from a family {F θ (x)} θ Θ. Assume that for some k N IEX1 k exists and that there is a function h such that θ = h(iex1 k k 1, IEX1,..., IEX 1 ). Then ( ) 1 n h Xl k n, 1 n X k 1 n l,..., 1 n X l θ a. e. as n. n This way of constructing estimators is called.

21 EXAMPLE OF ESTIMATORS CONSTRUCTED BY MEANS OF. 4 Consider once again the example of Normal distributions { Φµ,σ 2(x) } µ R,σ 2 R +. By Kolmogorov Theorem 1 n X l n µ a. e. as n as well as 1 n n X 2 l ( 1 n ) 2 n X l σ 2 a. e. as n.

22 Putting ˆµ n = 1 n n X l and ˆσ n 2 = 1 n n X 2 l ( 1 n ) 2 n X l, 5 we have ˆµ n µ and ˆσ 2 n σ 2 a. e. as n. REMARK 4. ˆµ n and ˆσ 2 n are examples of estimators constructed by method of moments. They work for a wide class of families of d.f. s, namely for all for which the corresponding moments exist.

23 (MLE) - Pierre Simon Laplace 1 IT IS REMARKABLE THAT A SCIENCE WHICH BEGAN WITH CONSIDERATION OF GAMES OF CHANCE SHOULD HAVE BECOME THE MOST IMPORTANT OBJECT OF HUMAN KNOWLEDGE. P.S.LAPLACE: Theórie Analytique des Probabilités Laplace, P.S. (1774): Mémoire sur la probabilité des causes par les évènemens. Mémoires de l Académie royale des sciences presentés par divers savans 6, Pierre Simon Laplase was the first scientist who became to study likelihood.

24 MLE - framework and definition A framework for establishing the MLE can be as follows: 1 Let (Ω, A, P) and (R, B) be We shall assume that the parameter space is again part of R p (for some p N ). 3 Consider a sequence of absolutely continuous i.i.d. r.v. s {X n } n=1 governed by an unknown d.f., density of which is from a family {f θ (x)} θ Θ. 4 Then define maximum likelihood estimator by 2 ) ˆθ n (X (n) (ω) = arg max θ Θ n f θ (X l (ω)).

25 MLE - evaluation Let s rewrite here the last line of the previous slide ) n ) ˆθ n (X (n) (ω) = arg max f θ (X l (ω). θ Θ Due to monotonicity of ln(x), the point θ Θ at which ( n ) n ) f θ (X l (ω) and ln f θ (X ) l (ω) 3 attain maximum, is the same. Moreover ( n ln f θ (X l (ω)) ) n ( )) = ln (f θ X l (ω).

26 MLE - evaluation So, finally ( ) ˆθ n X (n) (ω) = arg max θ Θ n ( )) ln (f θ X l (ω). Having at hand data x (n) = (x 1, x 2,..., x n ), we can calculate the value of estimate as ˆθ n (x (n)) n = arg max ln (f θ (x l )). θ Θ 4 It is frequently (in many textbooks and monographs) written as n ˆθ n = arg max ln (f θ (x l )). θ Θ

27 MLE - evaluation Assuming that f θ(x) θ is equal to ˆθ n. n exists, one of (possibly more) solutions of ( 1 ln f θ (x l ) f ) θ (x) = 0 (1) θ 5 REMARK 5. In the case that (1) has more solutions, there are some recommendations, what to do. We shall discuss them later.

28 THE FIRST EXAMPLE OF Maximun likelihood estimator. 6 Consider again the family of Normal distributions { Φµ,σ 2(x) } µ R,σ 2 R +. Then ( ˆµ n, ˆσ 2 n) (ML) = arg max µ R,σ R + { [ ]} n 1 σ 2π exp (X l µ) 2 2σ 2. We shall see that we can look for ˆµ (ML) n and ˆσ n(ml) 2 separetly. So, let us consider at first ˆµ (ML) n = arg max µ R { n 1 σ 2π exp [ (X l µ) 2 2σ 2 ]}.

29 7 ˆµ (ML) n = arg max µ R = arg max µ R = arg max µ R = arg max µ R n { [ ]} n 1 σ 2π exp (X l µ) 2 2σ 2 { n ( ) } 1 log σ (X l µ) 2 2π 2σ 2 { } n (X l µ) 2 2σ 2 { (X l µ) 2} = arg min µ R n (X l µ) 2

30 8 So we have ˆµ (ML) n = arg min µ R n (X l µ) 2 We are going to solve n (X l µ) 2 µ n i. e. 2 (X l µ) = 0 and finally n = 0, X l = n µ ˆµ (ML) n = 1 n n X l

31 9 So we have found that ˆµ (ML) n = arg min µ R n (X l µ) 2 = 1 n n X l, i. e. the sum n (X l µ) 2 attains its minimum if we put µ = 1 n n X l = x (the last equality is an introduction of notation).

32 THE SECOND EXAMPLE OF Maximun likelihood estimator. 10 Similarly as in the previous example { n σ ˆ n(ml) 2 1 = arg max σ R + σ 2π exp [ (X l µ) 2 2σ 2 ]}.

33 So ˆ σ 2 n(ml) = arg max σ R + = arg max σ R + { n ( ) } 1 log σ (X l µ) 2 2π 2σ 2 { ( ) } 1 n n log σ (X l µ) 2 2π 2σ As the sum n (X l µ) 2 has in the previous expression minus sign, we have minimize it in order to maximize the whole expression, i. e. according to the previous slide we have to put µ = x ˆ σ 2 n(ml) = arg max σ R + { ( 1 n log σ 2π ) } n (X l x) 2 2σ 2.

34 It means that σ ˆ n(ml) 2 is among the solution of equation { ( ) P } n n log 1 σ (X l x) 2 2π 2σ 2 σ 2 = As the mapping σ σ 2 is for σ R + one-to-one, we may solve { ( ) P } n n log 1 σ (X l x) 2 2π 2σ 2 = 0, σ i. e. σ 2π = n σ 2 2π + 2 n 2σ 3 (X l x) 2 = 0.

35 13 It gives and finally n σ 2 = ˆ σ 2 n = 1 n n (X l x) 2 n (X l x) 2. We shall see however later that a better estimator is σ ˆ n 2 = 1 n (X l x) 2. n 1

36 The least squares (LS)- - Adrien Marie Legendre & Carl Friedrich Gauss Legendre, A. M. (1805): Nouvelles méthodes pour la détermination des orbites des comètes. Paris, Courcier. Gauss, C. F. (1809): Theoria molus corporum celestium. Hamburg, Perthes et Besser.

37 Somebody brought data. 2

38 Hertzsprung-Russell diagram of stars cluster CYG OB1 (in the direction of Cygnus) 3

39 4 r(β) = y x β r 2 (β) = ( y x β ) 2

40 5 Recalling : Somebody brought data, say (Y (n), X (n) ) = y 1, 1, x 11,, x 1p y 2, 1, x 21,, x 2p.... y n, 1, x n1,, x np,

41 6 ˆβ (LS) = arg min β R p r i (β) = y i x i β ri 2 (β) = ( y i x i β) 2 n ri 2 (β) = arg min β R p i=1 ˆβ (LS) = ( X X ) 1 X Y n i=1 ( yi x i β) 2

42 DEFINITION : The Least Squares (estimator) The Least Squares (estimator) is given as a solution of extremal problem 7 ˆβ (LS) = arg min β R p n i=1 r 2 i (β) = arg min β R p n i=1 ( yi x i β) 2. REMARK 6. If p = 1 (see data on previous but one slide), the extremal problem turns to ˆµ (LS) = arg min µ R n i=1 r 2 i (µ) = arg min µ R n (y i µ) 2. i=1

43 8

44 Hertzsprung-Russell diagram of stars cluster CYG OB1 (in the direction of Cygnus (Labut )) 9 Number of stars = 47 Humpreys, R. M. (1978): Studies of luminous stars in nearby galaxies. Supergiant and O stars in the milky way. Astrophysical Journal Supplument Ser., 38,

45 10

46 11

47 12

48 Francis Ysidro Edgeworth

Francis Ysidro Edgeworth 1845 1926 14 The method of the Least Squares is seen to be our best course when we have thrown overboard a certain

49 Francis Ysidro Edgeworth The method of the Least Squares is seen to be our best course when we have thrown overboard a certain portion of our data - a sort of sacrifice which has often to be made by those who sail upon the stormy seas of Probability. F. Y. EDGEWORTH

50 Francis Ysidro Edgeworth

51 A (general) framework for establishing a minimal distance estimator can be: 1 Let (Ω, A, P) and (R, B) be Let a parameter space Θ R p (p N ). 3 Consider a family of d.f. s {F θ (x)} θ Θ. 4 Somebody brought data, say x (n) = (x 1, x 2,..., x n ). 5 Let us draw the empirical distribution function (e.d.f.).

52 Empirical d.f. 2 F emp (x) = 1 n n i=1 I {xi <x}

53 Empirical d.f. F emp (x) = 1 n n i=1 I {xi <x} 3 and theoretical d.f., evidently with too large variance.

54 Empirical d.f. 4 F emp (x) = 1 n n i=1 I {x i <x} and theoretical d.f., may be with already right variance, but evidently with too small location (shift) parameter.

55 Empirical d.f. F emp (x) = 1 n n i=1 I {xi <x} 5 and theoretical d.f., hopefully with right variance, but evidently with too large location parameter.

56 Empirical d.f. F emp (x) = 1 n n i=1 I {xi <x} 6 and theoretical d.f., already fitting to the e.d.f..

57 DEFINITION : The minimal distance estimator The estimator given as a solution of extremal problem 7 ˆθ (MD) = arg min θ Θ max x R F emp(x) F θ (x) belongs to the class of minimal distance estimators but very frequently it is called minimax estimator. REMARK 7. The expression max x R F emp(x) F θ (x) represents a measure of distance between F emp (x) and F(x). Let s try to consider this idea in a bit more general way.

58 8 Let F(x) and G(x) be two d.f. s. Then d (F, G) is called (measure of) distance of F(x) and G(x) if 1 d (F, G) 0, 2 d (F, F) = 0, 3 for any d.f. H(x), d (F, G) d (F, H) + d (H, G). The measure of distance max F(x) G(x) x R is frequently used. It was firstly studied by Kolmogorov and Smirnov.

59 Kolmogorov-Smirnov distance 9 DEFINITION : Kolmogorov-Smirnov distance Let F(x) and G(x) be two d.f. s. The (measure of) distance given by d KS (F(x), G(x)) = max F(x) G(x) x R is called Kolmogorov-Smirnov distance.

60 Consider a sequence of (i.i.d.) r.v. s {X n } n=1 goverened by a d.f. F(x) and the sequence of empirical d.f. s F emp(x) (n) = 1 n I n {Xi (ω)<x}. Direct computation gives: i=1 IEI {Xi (ω)<x} = 1 P (X i (ω) < x) = F(x) 10 Then strong law of large numbers implies: 1 n ( I{Xi (ω)<x} F(x) ) = F (n) n emp(x) F(x) 0 i=1 a. s. as n.

61 Under the same framework we have: IEI 2 {X i (ω)<x} = IEI {X i (ω)<x} = F(x) 11 var ( I {Xi (ω)<x}) = F(x)(1 F(x)) = σ 2 <. Then CLT implies: L as ( 1 σ n i.e. n ( I{Xi (ω)<x} F(x) )) = N(0, 1). i=1 ( n ( )) L as F emp(x) (n) F(x) = N(0, σ).

62 Under the same framework Kolmogorov and Smirnov proved: ( n ) P sup F emp(x) (n) F (x) = Q(x) x R 12 where Q(x) = 0 for x 0 and Q(x) = ( 1) l exp( 2l 2 x 2 ) l= for x > 0.

63 What is Kolmogorov-Smirnov distance? 13

64 The answer is: The yellow segment. 14

65 Consider one fix red d.f. F(x) and a green sequence of d.f. s {F n (x)} n=1. Assume that for n the brown segment converges to Does the green sequence of d.f. s converge to the red d.f.? The answer is (unfortunatelly and against an natural feeling) NO (!?).

66 Verify it. 16

67 That is why Prokhorov proposed 17 DEFINITION : Prokhorov distance For two d.f. s F(x) and G(x) the value π (F (x), G(x)) = inf {ε : F(x) < G(x + ε) + ε, G(x) < F (x + ε) + ε}. ε>0 is called Prokhorov distance.

68 So, Prokhorov distance is given by horizontal (or vertical) yellow segment. Verify it. 18

69 Now, the green sequence converges to the red one, in the Prokhorov distance. Verify it. 19

70 Cramér-von-Mises distance An alternative to K-S and Prokhorov distance may be DEFINITION : Cramér-von-Mises distance 20 Let F(x) and G(x) be two d.f. s. The (measure of) distance given by d KS (F(x), G(x)) = [F(x) G(x)] 2 dg(x) is called Cramér-von-Mises distance. REMARK 8. There are many other proposal of distances between d.f. s. About one of them, so called χ 2 -distance, we shall speak in the next term in Statistics III.

71 So, we may continue a bit more generally: Let d (F emp (x), F θ (x)) be a (measure of) distance of F emp (x) and F θ (x) (which need not be necessarily a metric). 21 DEFINITION : The minimal distance estimator The estimator given as a solution of extremal problem ˆθ (MD) = arg min θ Θ d (F emp (x), F θ (x)) is called minimal distance estimator.

72 An alternative way for constructing a minimal distance estimator can be: 1 Let (Ω, A, P) and (R, B) be Let a parameter space Θ R p (p N ). 3 Consider a family of densities {f θ (x)} θ Θ. 4 Somebody brought data, say x (n) = (x 1, x 2,..., x n ). 5 Let us draw the histogram, see next slide.

73 23 This is histogram. We may try to fit to it one density from our family.

74 This is density with too small variance. 24

75 This is density with too large variance. 25

76 These are densities with too small and too large location parameters, respectively. 26

77 Finally, we have found the fit. 27

78 So, we may continue a bit more generally: Let d (f emp (x), f θ (x)) be a (measure of) distance of f emp (x) and f θ (x) (which need not be necessarily a metric). 28 DEFINITION : The minimal distance estimator The estimator given as a solution of extremal problem ˆθ (MD) = arg min θ Θ d (f emp (x), f θ (x)) is called minimal distance estimator.

79 EXAMPLE : Let f (x) and g(x) be two density and α (1, ). Then the value { } 1 div(f, g) = (f (x) g(x)) α α dx 29 is called α-divergence of densities f (x) and g(x).

80 SUMMARY Notions to keep in mind to understand next lectures and to pass the exam: 1 Point and interval estimators, 2 moment method, least squares, maximum likelihood, 3 minimal distance - examples and general point of view.

81 SUMMARY (continued) Keep in mind that we have denoted estimator by the same letter as estimated parameter but we put ˆ above the given letter, e. g.: 1 ˆθ n (X 1 (ω), X 2 (ω),..., X n (ω)) : Ω Θ, 2 or ˆβ n (X 1 (ω), X 2 (ω),..., X n (ω)) : Ω Θ, 3 or ˆµ n (X 1 (ω), X 2 (ω),..., X n (ω)) : Ω Θ, 4 or ˆσ n 2 (X 1 (ω), X 2 (ω),..., X n (ω)) : Ω Θ. We shall keep it in the rest of term.

82 SUMMARY (continued) What we are going to do in the next lecture: 1 We ll continue in studying point estimation, 2 plausible properties and their sense.

83 End of the Thirteenth lecture Thanks for attention

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7