Three Approaches towards Optimal Property Estimation and Testing
|
|
- Eugenia Robinson
- 6 years ago
- Views:
Transcription
1 Three Approaches towards Optimal Property Estimatio ad Testig Jiatao Jiao (taford EE) Joit work with: Yaju Ha, Dmitri Pavlichi, Kartik Vekat, Tsachy Weissma Frotiers i Distributio Testig Workshop, FOC 2017 Oct. 14th, / 23
2 tatistical properties Disclaimer: Throughout this talk, refers to the umber of samples, refer to the alphabet size of a distributio. 1 hao etropy: H(P) i=1 p i l p i. 2 F α (P): F α (P) i=1 pα i, α > 0. 3 KL divergece, χ 2 divergece, L 1 distace, Helliger distace F (P, Q) i=1 f (p i, q i ) for f (x, y) = x l(x/y), (x y) 2 /x, x y, ( x y) 2. 2 / 23
3 Tolerat testig/learig/estimatio We focus o the questio: how may samples are eeded to achieve accuracy ɛ for estimatig these properties from empirical data? Example: L 1 (P, U ), U = (1/, 1/,..., 1/), observe i.i.d. samples from P; (VV 11, VV 11): exist approach whose error is l ; o cosistet estimator whe The MLE plug-i L 1 ( ˆP, U ) achieves error l whe l ; whe. 3 / 23
4 Tolerat testig/learig/estimatio We focus o the questio: how may samples are eeded to achieve accuracy ɛ for estimatig these properties from empirical data? Example: L 1 (P, U ), U = (1/, 1/,..., 1/), observe i.i.d. samples from P; (VV 11, VV 11): exist approach whose error is l ; o cosistet estimator whe The MLE plug-i L 1 ( ˆP, U ) achieves error Effective sample size elargemet l whe l ; whe. Miimax rate-optimal with samples MLE with l samples imilar results also hold for hao etropy (VV 11, VV 11, VV 13, WY 16, JVHW 15), power sum fuctioal (JVHW 15), Réyi etropy estimatio (AOT 14), χ 2, Helliger, ad KL-divergece estimatio (HJW 16, BZLV 16), L r orm estimatio uder Gaussia white oise model (HJMW 17), L 1 distace estimatio (JHW 16), etc. except for support size (WY 16) 3 / 23
5 Effective sample size elargemet R mimax (F, P, ) = if ˆF (X 1,...,X ) P P sup E ˆF F (P) R plug-i (F, P, ) = sup E F ( ˆP ) F (P). P P F (P) P R mimax (F, P, ) R plug-i (F, P, ) ( ) 1 p i log M p i=1 i log() + log() + log() F α(p) = p α i, 0 < α 1 M 2 ( log()) i=1 α α F 1 α(p), 2 < α < 1 M ( log()) α + 1 α α + 1 α F α(p), 1 < α < 2 3 M ( log()) (α 1) (α 1) F α(p), α M 2 ( { }) log() 1(p i 0) {P : mi i p i 1 Θ max, } e ( ) e Θ i=1 qi p i q i M q i l i=1 i=1 i=1 qi q i 4 / 23
6 Effective sample size elargemet Divergece fuctios: here P, Q M where we have m samples from p ad samples from q. For the Kullback-Leibler ad χ 2 divergece estimators we oly cosider (P, Q) {(P, Q) P, Q M, P i Q i u()} where u() is some fuctio of. F (P, Q) R mimax (F, P, m, ) R plug-i (F, P, m, ) p i q i mi{m, } log(mi{m, }) + mi{m, } i=1 1 ( p i q i ) 2 2 mi{m, } log(mi{m, }) mi{m, } i=1 ( ) pi D(P Q) = p i log q i=1 i m log(m) + u() log() + log(u()) u() + m m + u() + log(u()) u() + m χ 2 pi 2 u() 2 (P Q) = 1 q i=1 i log() + u() + u()3/2 u() 2 + u() + u()3/2 m m 5 / 23
7 Goal of this talk Uderstad the mechaism behid the logarithmic sample size elargemet. For what fuctioals do we have this pheomeo? What cocrete algorithms achieve this pheomeo? If there exist multiple approaches, what are their relative advatages ad disadvatages? 6 / 23
8 First approach: Approximatio methodology Questio Is the elargemet pheomeo caused by the fact that the fuctioals are permutatio ivariat (symmetric)? 7 / 23
9 First approach: Approximatio methodology Questio Is the elargemet pheomeo caused by the fact that the fuctioals are permutatio ivariat (symmetric)? Aswer Nope. :) Literature o approximatio methodology VV 11 (liear estimator), WY 16, WY 16 JVHW 15, AOT 14, HJW 16, BZLV 16, HJMW 16, JHW 16 7 / 23
10 Example: L 1 distace estimatio Give Q = (q 1, q 2,..., q ), we estimate L 1 (P, Q) give i.i.d. samples from P. Theorem (J., Ha, Weissma 16) ( ) uppose l l l i=1 qi q i l, 2. The, if ˆL For the MLE, we have sup E P ˆL L 1 (P, Q) P M qi q i l. (1) i=1 sup E P L 1 ( ˆP, Q) L 1 (P, Q) P M q i i=1 qi. (2) 8 / 23
11 Cofidece sets i biomial model: coverage probability 1 A 0 1 Θ = [0, 1] ˆp B(, p)
12 Cofidece sets i biomial model: coverage probability 1 A l 0 1 Θ = [0, 1] ˆp B(, p)
13 Cofidece sets i biomial model: coverage probability 1 A l 0 1 ˆp < l Θ = [0, 1] ˆp B(, p)
14 Cofidece sets i biomial model: coverage probability 1 A l l U(ˆp) 0 1 ˆp < l Θ = [0, 1] ˆp B(, p)
15 Cofidece sets i biomial model: coverage probability 1 A l l U(ˆp) 0 1 ˆp < l ˆp > l Θ = [0, 1] ˆp B(, p)
16 Cofidece sets i biomial model: coverage probability 1 A l l ˆp l U(ˆp) U(ˆp) 0 1 ˆp < l ˆp > l Θ = [0, 1] ˆp B(, p) 9 / 23
17 Cofidece sets i biomial model: coverage probability 1 A l l ˆp l U(ˆp) U(ˆp) 0 1 ˆp < l ˆp > l Theorem (J., Ha, Weissma 16) Θ = [0, 1] ˆp B(, p) Partitio [0, 1] ito fiitely umber of itervals I i = [x i, x i+1 ], x 0 = 0, x 1 l, x i+1 l x i. The, 1 if p I i, the ˆp 2I i with probability 1 A ; 2 if ˆp I i, the p 2I i with probability 1 A ; 3 Those itervals are of the shortest legth. 9 / 23
18 Algorithmic descriptio of Approximatio methodology First coduct samplig splittig, get ˆp i, ˆp i i.i.d. with distributio 2 B(/2, p i). uppose q i I j. For each i do the followig: 1 if ˆp i I j, compute best polyomial approximatio i 2I j : P K (x; q i ) = arg mi max z q i P(z), (3) P Poly K z 2I j ad the estimate p i q i by the ubiased estimator of P K (p i ; q i ) usig ˆp i ; 2 if ˆp i / I j, estimate p i q i by ˆp i q i ; 3 sum everythig up. 10 / 23
19 Why it works? 1 uppose ˆp i I j. No matter what we use to estimate, oe ca always assume that p i 2I j ; 2 The bias of the MLE is approximately (trukov ad Tima 77) qi sup p i q i E ˆp i q i q i p i 2I j ; (4) 3 The bias of the Approximatio methodology is approximately (Ditzia ad Totik 87) qi sup p i q i P K (p i ; q i ) q i p i 2I j l. (5) 4 Permutatio ivariace does ot play a role sice we are doig symbol by symbol bias correctio; 5 The bias domiates i high dimesios (measure cocetratio pheomeo). 11 / 23
20 Properties of the Approximatio Methodology 1 Applies to essetially ay fuctioal 2 Applies to a wide rage of statistical models (biomial, Poisso, Gaussia, etc) 3 Near-liear complexity 4 Explicit polyomial approximatio for each differet fuctioal 5 Need to tue parameters i practice 12 / 23
21 ecod approach: Local momet matchig methodology Motivatio Does there exist a sigle plug-i estimator that ca replace the Approximatio methodology? 13 / 23
22 ecod approach: Local momet matchig methodology Motivatio Does there exist a sigle plug-i estimator that ca replace the Approximatio methodology? Aswer No. For ay plug-i rule ˆP, there exists a fixed Q such that L 1 ( ˆP, Q) requires samples to cosistetly estimate L 1 (P, Q), while the optimal method requires at most l. 13 / 23
23 ecod approach: Local momet matchig methodology Motivatio Does there exist a sigle plug-i estimator that ca replace the Approximatio methodology? Aswer No. For ay plug-i rule ˆP, there exists a fixed Q such that L 1 ( ˆP, Q) requires samples to cosistetly estimate L 1 (P, Q), while the optimal method requires at most l. Weakeed goal What about we oly cosider permutatio ivariat fuctioals? Literature o the local momet matchig methodology VV 11 (liear programmig), HJW / 23
24 Local momet matchig methodology Theorem (Ha, J., Weissma 17) There exists a sigle estimator ˆP, efficietly computable, ad achieves the optimal phase trasitios for ALL the permutatio ivariat fuctioals metioed above. I particular, it solves the miimax problem if ˆP sup P M E ˆP P < 1 l + where P < = (p (1), p (2),..., p () ), p (i) p (i+1). ( Õ( 1/3 ) ), (6) 14 / 23
25 A simple example Assume for all i, p i l, ˆp i l. Cosider the hao etropy fuctioal H(P) = i=1 f (p i), f (x) = x l(1/x). Theorem (VV 11, Wu ad Yag 16, J. et al 15) Optimal error i estimatig H is l, while MLE error is. 15 / 23
26 A simple example Assume for all i, p i l, ˆp i l. Cosider the hao etropy fuctioal H(P) = i=1 f (p i), f (x) = x l(1/x). Theorem (VV 11, Wu ad Yag 16, J. et al 15) Optimal error i estimatig H is l, while MLE error is. uppose we use the plug-i rule i=1 f (q i) to estimate H(P), where q i l. The, for ay P K (x) Poly K, K = l, H i f (q i ) = i (f (p i ) P K (p i )) + i (P K (p i ) P K (q i )) + (P K (q i ) f (q i )) i 2 if max f (x) P K (x) + (P K (p i ) P K (q i )) P K x [0, l ] i l + (P K (p i ) P K (q i )). i 15 / 23
27 Local momet matchig We showed for ay plug-i rule Q, H i f (q i ) l + i (P K (p i ) P K (q i )). (7) Why MLE is bad? The MLE is bad because [ E (P K (p i ) P K (q i ))]. (8) i olutio It suffices to reduce the bias of P K (q i ) i estimatig P K (p i ). 16 / 23
28 Local momet matchig Ideal situatio uppose for each 0 k l, j p k j = j q k j, (9) we immediately have [ ] E (P K (p i ) P K (q i )) = 0. (10) i 17 / 23
29 Algorithmic descriptio of local momet matchig For each iterval I j, collect A = {i : ˆp i I j }. The, for each 0 k l, we solve Q such that ( qi k ubiased estimates of ) pi k ɛ σ k,a, (11) i A i A here σ k,a = stadard deviatio of ubiased estimates of pi k. (12) i A Existece of solutio The solutio exists with overwhelmig probability sice the true distributio P satisfies these iequalities with overwhelmig probability. 18 / 23
30 Properties of the Local momet matchig Methodology 1 Applies oly to permutatio ivariat fuctioals 2 Applies to a wide rage of statistical models (biomial, Poisso, Gaussia, etc) 3 Polyomial complexity 4 Implicit polyomial approximatio, just eed to compute oce 5 Need to tue parameters i practice 19 / 23
31 Third approach: the profile maximum likelihood methodology (PML) Properties Approximatio Local MM PML Permutatio ivariat No Yes Yes tatistical model Broad Broad (Cojectured) Broad Complexity Near-liear Polyomial Uclear Fuctioal depedet Yes No No Parameter tuig Yes Yes No Thak you! 20 / 23
32 Literature Jayadev Acharya, Hirakedu Das, Alo Orlitsky, ad Aada Theertha uresh. A uified maximum likelihood approach for optimal distributio property estimatio, Proceedigs of ICML, Jiatao Jiao, Yaju Ha, ad Tsachy Weissma. Miimax Estimatio of the L 1 Distace, arxiv e-prits, May 2017 Gregory Valiat ad Paul Valiat. A CLT ad tight lower bouds for estimatig etropy, Electroic Colloquium o Computatioal Complexity (ECCC), 2010 Gregory Valiat ad Paul Valiat. Estimatig the usee: a subliear-sample caoical estimator of distributios, Electroic Colloquium o Computatioal Complexity, Gregory Valiat ad Paul Valiat, Estimatig the usee: a / log sample estimator for etropy ad support size, show optimal via ew clts, Proceedigs of TOC, Gregory Valiat ad Paul Valiat, The power of liear estimators, Proceedigs of FOC, / 23
33 Literature Yihog Wu ad Pegku Yag. Miimax rates of etropy estimatio o large alphabets via best polyomial approximatio. IEEE Trasactios o Iformatio Theory 62.6 (2016): Jiatao Jiao, Kartik Vekat, Yaju Ha, ad Tsachy Weissma. Miimax estimatio of fuctioals of discrete distributios. IEEE Trasactios o Iformatio Theory 61.5 (2015): Jayadev Acharya, Alo Orlitsky, Aada Theertha uresh, Himashu Tyagi. The complexity of estimatig Ryi etropy. Proceedigs of the Twety-ixth Aual ACM-IAM ymposium o Discrete Algorithms. ociety for Idustrial ad Applied Mathematics, Yaju Ha, Jiatao Jiao, ad Tsachy Weissma. Miimax Rate-Optimal Estimatio of Divergeces betwee Discrete Distributios. arxiv preprit arxiv: (2016). Yuheg Bu, haofeg Zou, Yigbi Liag, Veugopal V. Veeravalli. Estimatio of KL Divergece: Optimal Miimax Rate. arxiv preprit arxiv: (2016). 22 / 23
34 Literature Yaju Ha, Jiatao Jiao, Rajarshi Mukherjee, ad Tsachy Weissma. O Estimatio of L r -Norms i Gaussia White Noise Models. arxiv preprit arxiv: (2017). Yihog Wu ad Pegku Yag. Chebyshev polyomials, momet matchig, ad optimal estimatio of the usee. arxiv preprit arxiv: (2015). Yaju Ha, Jiatao Jiao, Tsachy Weissma, Local momet matchig: a uified methodology for symmetric fuctioal estimatio ad distributio estimatio uder Wasserstei distace, i preparatio 23 / 23
Lecture 16: Achieving and Estimating the Fundamental Limit
EE378A tatistical igal Processig Lecture 6-05/25/207 Lecture 6: Achievig ad Estimatig the Fudametal Limit Lecturer: Jiatao Jiao cribe: William Clary I this lecture, we formally defie the two distict problems
More informationLecture 17: Minimax estimation of high-dimensional functionals. 1 Estimating the fundamental limit is easier than achieving it: other loss functions
EE378A tatistical igal Processig Lecture 3-05/29/207 Lecture 7: Miimax estimatio of high-dimesioal fuctioals Lecturer: Jiatao Jiao cribe: Joatha Lacotte Estimatig the fudametal limit is easier tha achievig
More informationLocal moment matching: A unified methodology for symmetric functional estimation and distribution estimation under Wasserstein distance
Proceedigs of Machie Learig Research vol 75:1 33, 2018 31st Aual Coferece o Learig Theory Local momet matchig: A uified methodology for symmetric fuctioal estimatio ad distributio estimatio uder Wasserstei
More informationLecture 13: Maximum Likelihood Estimation
ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationInformation-based Feature Selection
Iformatio-based Feature Selectio Farza Faria, Abbas Kazeroui, Afshi Babveyh Email: {faria,abbask,afshib}@staford.edu 1 Itroductio Feature selectio is a topic of great iterest i applicatios dealig with
More informationMinimax Estimation of the L 1 Distance
Miimax Estimatio of the L Distace Jiatao Jiao, tudet Member, IEEE, Yaju Ha, tudet Member, IEEE, ad Tsachy Weissma, Fellow, IEEE. We emphasize that the observed effective sample size elargemet here is aother
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More informationLecture Stat Maximum Likelihood Estimation
Lecture Stat 461-561 Maximum Likelihood Estimatio A.D. Jauary 2008 A.D. () Jauary 2008 1 / 63 Maximum Likelihood Estimatio Ivariace Cosistecy E ciecy Nuisace Parameters A.D. () Jauary 2008 2 / 63 Parametric
More informationEstimation for Complete Data
Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of
More informationMaximum Likelihood Estimation and Complexity Regularization
ECE90 Sprig 004 Statistical Regularizatio ad Learig Theory Lecture: 4 Maximum Likelihood Estimatio ad Complexity Regularizatio Lecturer: Rob Nowak Scribe: Pam Limpiti Review : Maximum Likelihood Estimatio
More informationEECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1
EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum
More informationLogit regression Logit regression
Logit regressio Logit regressio models the probability of Y= as the cumulative stadard logistic distributio fuctio, evaluated at z = β 0 + β X: Pr(Y = X) = F(β 0 + β X) F is the cumulative logistic distributio
More informationMinimax Estimation of Functionals of Discrete Distributions
Miimax Estimatio of Fuctioals of Discrete Distributios Jiatao Jiao, tudet Member, IEEE, Kartik Vekat, tudet Member, IEEE, Yaju Ha, tudet Member, IEEE, ad Tsachy Weissma, Fellow, IEEE arxiv:406.6956v5 [cs.it]
More informationLecture 33: Bootstrap
Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece
More informationStat410 Probability and Statistics II (F16)
Some Basic Cocepts of Statistical Iferece (Sec 5.) Suppose we have a rv X that has a pdf/pmf deoted by f(x; θ) or p(x; θ), where θ is called the parameter. I previous lectures, we focus o probability problems
More informationSince X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain
Assigmet 9 Exercise 5.5 Let X biomial, p, where p 0, 1 is ukow. Obtai cofidece itervals for p i two differet ways: a Sice X / p d N0, p1 p], the variace of the limitig distributio depeds oly o p. Use the
More informationMATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4
MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationMathematical Statistics - MS
Paper Specific Istructios. The examiatio is of hours duratio. There are a total of 60 questios carryig 00 marks. The etire paper is divided ito three sectios, A, B ad C. All sectios are compulsory. Questios
More informationLecture 10: Universal coding and prediction
0-704: Iformatio Processig ad Learig Sprig 0 Lecture 0: Uiversal codig ad predictio Lecturer: Aarti Sigh Scribes: Georg M. Goerg Disclaimer: These otes have ot bee subjected to the usual scrutiy reserved
More informationSimulation. Two Rule For Inverting A Distribution Function
Simulatio Two Rule For Ivertig A Distributio Fuctio Rule 1. If F(x) = u is costat o a iterval [x 1, x 2 ), the the uiform value u is mapped oto x 2 through the iversio process. Rule 2. If there is a jump
More informationLecture 9: September 19
36-700: Probability ad Mathematical Statistics I Fall 206 Lecturer: Siva Balakrisha Lecture 9: September 9 9. Review ad Outlie Last class we discussed: Statistical estimatio broadly Pot estimatio Bias-Variace
More information32 estimating the cumulative distribution function
32 estimatig the cumulative distributio fuctio 4.6 types of cofidece itervals/bads Let F be a class of distributio fuctios F ad let θ be some quatity of iterest, such as the mea of F or the whole fuctio
More informationMaximum Likelihood Estimation of Functionals of Discrete Distributions
Maximum Likelihood Estimatio of Fuctioals of Discrete Distributios Jiatao Jiao, Studet Member, IEEE, Kartik Vekat, Studet Member, IEEE, Yaju Ha, Studet Member, IEEE, ad Tsachy Weissma, Fellow, IEEE arxiv:406.6959v7
More informationInformation Theory and Statistics Lecture 4: Lempel-Ziv code
Iformatio Theory ad Statistics Lecture 4: Lempel-Ziv code Łukasz Dębowski ldebowsk@ipipa.waw.pl Ph. D. Programme 203/204 Etropy rate is the limitig compressio rate Theorem For a statioary process (X i)
More informationIIT JAM Mathematical Statistics (MS) 2006 SECTION A
IIT JAM Mathematical Statistics (MS) 6 SECTION A. If a > for ad lim a / L >, the which of the followig series is ot coverget? (a) (b) (c) (d) (d) = = a = a = a a + / a lim a a / + = lim a / a / + = lim
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationLecture 12: September 27
36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.
More informationSTAC51: Categorical data Analysis
STAC51: Categorical data Aalysis Mahida Samarakoo Jauary 28, 2016 Mahida Samarakoo STAC51: Categorical data Aalysis 1 / 35 Table of cotets Iferece for Proportios 1 Iferece for Proportios Mahida Samarakoo
More informationExponential Families and Bayesian Inference
Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where
More informationDiscrete Mathematics for CS Spring 2008 David Wagner Note 22
CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More information1 Models for Matched Pairs
1 Models for Matched Pairs Matched pairs occur whe we aalyse samples such that for each measuremet i oe of the samples there is a measuremet i the other sample that directly relates to the measuremet i
More informationThe picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled
1 Lecture : Area Area ad distace traveled Approximatig area by rectagles Summatio The area uder a parabola 1.1 Area ad distace Suppose we have the followig iformatio about the velocity of a particle, how
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber
More informationHypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance
Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More information4.1 Data processing inequality
ECE598: Iformatio-theoretic methods i high-dimesioal statistics Sprig 206 Lecture 4: Total variatio/iequalities betwee f-divergeces Lecturer: Yihog Wu Scribe: Matthew Tsao, Feb 8, 206 [Ed. Mar 22] Recall
More informationSDS 321: Introduction to Probability and Statistics
SDS 321: Itroductio to Probability ad Statistics Lecture 23: Cotiuous radom variables- Iequalities, CLT Puramrita Sarkar Departmet of Statistics ad Data Sciece The Uiversity of Texas at Austi www.cs.cmu.edu/
More informationEntropy Rates and Asymptotic Equipartition
Chapter 29 Etropy Rates ad Asymptotic Equipartitio Sectio 29. itroduces the etropy rate the asymptotic etropy per time-step of a stochastic process ad shows that it is well-defied; ad similarly for iformatio,
More informationMATH 472 / SPRING 2013 ASSIGNMENT 2: DUE FEBRUARY 4 FINALIZED
MATH 47 / SPRING 013 ASSIGNMENT : DUE FEBRUARY 4 FINALIZED Please iclude a cover sheet that provides a complete setece aswer to each the followig three questios: (a) I your opiio, what were the mai ideas
More informationSummary. Recap ... Last Lecture. Summary. Theorem
Last Lecture Biostatistics 602 - Statistical Iferece Lecture 23 Hyu Mi Kag April 11th, 2013 What is p-value? What is the advatage of p-value compared to hypothesis testig procedure with size α? How ca
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More informationCS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5
CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio
More informationBasics of Probability Theory (for Theory of Computation courses)
Basics of Probability Theory (for Theory of Computatio courses) Oded Goldreich Departmet of Computer Sciece Weizma Istitute of Sciece Rehovot, Israel. oded.goldreich@weizma.ac.il November 24, 2008 Preface.
More information5.1 A mutual information bound based on metric entropy
Chapter 5 Global Fao Method I this chapter, we exted the techiques of Chapter 2.4 o Fao s method the local Fao method) to a more global costructio. I particular, we show that, rather tha costructig a local
More informationFACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures
FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals
More informationDS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10
DS 00: Priciples ad Techiques of Data Sciece Date: April 3, 208 Name: Hypothesis Testig Discussio #0. Defie these terms below as they relate to hypothesis testig. a) Data Geeratio Model: Solutio: A set
More informationA quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population
A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate
More information7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals
7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses
More informationLecture 2 Long paths in random graphs
Lecture Log paths i radom graphs 1 Itroductio I this lecture we treat the appearace of log paths ad cycles i sparse radom graphs. will wor with the probability space G(, p) of biomial radom graphs, aalogous
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationIt is often useful to approximate complicated functions using simpler ones. We consider the task of approximating a function by a polynomial.
Taylor Polyomials ad Taylor Series It is ofte useful to approximate complicated fuctios usig simpler oes We cosider the task of approximatig a fuctio by a polyomial If f is at least -times differetiable
More informationECE 6980 An Algorithmic and Information-Theoretic Toolbox for Massive Data
ECE 6980 A Algorithmic ad Iformatio-Theoretic Toolbo for Massive Data Istructor: Jayadev Acharya Lecture # Scribe: Huayu Zhag 8th August, 017 1 Recap X =, ε is a accuracy parameter, ad δ is a error parameter.
More informationST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.
ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic
More informationLast Lecture. Wald Test
Last Lecture Biostatistics 602 - Statistical Iferece Lecture 22 Hyu Mi Kag April 9th, 2013 Is the exact distributio of LRT statistic typically easy to obtai? How about its asymptotic distributio? For testig
More informationLecture 11 October 27
STATS 300A: Theory of Statistics Fall 205 Lecture October 27 Lecturer: Lester Mackey Scribe: Viswajith Veugopal, Vivek Bagaria, Steve Yadlowsky Warig: These otes may cotai factual ad/or typographic errors..
More informationLecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)
Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +
More informationON BARTLETT CORRECTABILITY OF EMPIRICAL LIKELIHOOD IN GENERALIZED POWER DIVERGENCE FAMILY. Lorenzo Camponovo and Taisuke Otsu.
ON BARTLETT CORRECTABILITY OF EMPIRICAL LIKELIHOOD IN GENERALIZED POWER DIVERGENCE FAMILY By Lorezo Campoovo ad Taisuke Otsu October 011 COWLES FOUNDATION DISCUSSION PAPER NO. 185 COWLES FOUNDATION FOR
More informationChapter 6 Sampling Distributions
Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to
More informationSTAT 515 fa 2016 Lec Sampling distribution of the mean, part 2 (central limit theorem)
STAT 515 fa 2016 Lec 15-16 Samplig distributio of the mea, part 2 cetral limit theorem Karl B. Gregory Moday, Sep 26th Cotets 1 The cetral limit theorem 1 1.1 The most importat theorem i statistics.............
More informationSTATS 200: Introduction to Statistical Inference. Lecture 1: Course introduction and polling
STATS 200: Itroductio to Statistical Iferece Lecture 1: Course itroductio ad pollig U.S. presidetial electio projectios by state (Source: fivethirtyeight.com, 25 September 2016) Pollig Let s try to uderstad
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationElement sampling: Part 2
Chapter 4 Elemet samplig: Part 2 4.1 Itroductio We ow cosider uequal probability samplig desigs which is very popular i practice. I the uequal probability samplig, we ca improve the efficiecy of the resultig
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationTAMS24: Notations and Formulas
TAMS4: Notatios ad Formulas Basic otatios ad defiitios X: radom variable stokastiska variabel Mea Vätevärde: µ = X = by Xiagfeg Yag kpx k, if X is discrete, xf Xxdx, if X is cotiuous Variace Varias: =
More informationTMA4245 Statistics. Corrected 30 May and 4 June Norwegian University of Science and Technology Department of Mathematical Sciences.
Norwegia Uiversity of Sciece ad Techology Departmet of Mathematical Scieces Corrected 3 May ad 4 Jue Solutios TMA445 Statistics Saturday 6 May 9: 3: Problem Sow desity a The probability is.9.5 6x x dx
More informationWeek 10. f2 j=2 2 j k ; j; k 2 Zg is an orthonormal basis for L 2 (R). This function is called mother wavelet, which can be often constructed
Wee 0 A Itroductio to Wavelet regressio. De itio: Wavelet is a fuctio such that f j= j ; j; Zg is a orthoormal basis for L (R). This fuctio is called mother wavelet, which ca be ofte costructed from father
More informationMATH/STAT 352: Lecture 15
MATH/STAT 352: Lecture 15 Sectios 5.2 ad 5.3. Large sample CI for a proportio ad small sample CI for a mea. 1 5.2: Cofidece Iterval for a Proportio Estimatig proportio of successes i a biomial experimet
More informationApproximations and more PMFs and PDFs
Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationDimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector
Dimesio-free PAC-Bayesia bouds for the estimatio of the mea of a radom vector Olivier Catoi CREST CNRS UMR 9194 Uiversité Paris Saclay olivier.catoi@esae.fr Ilaria Giulii Laboratoire de Probabilités et
More information(all terms are scalars).the minimization is clearer in sum notation:
7 Multiple liear regressio: with predictors) Depedet data set: y i i = 1, oe predictad, predictors x i,k i = 1,, k = 1, ' The forecast equatio is ŷ i = b + Use matrix otatio: k =1 b k x ik Y = y 1 y 1
More informationInformation Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n
Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua
More informationLecture 11: Channel Coding Theorem: Converse Part
EE376A/STATS376A Iformatio Theory Lecture - 02/3/208 Lecture : Chael Codig Theorem: Coverse Part Lecturer: Tsachy Weissma Scribe: Erdem Bıyık I this lecture, we will cotiue our discussio o chael codig
More informationL = n i, i=1. dp p n 1
Exchageable sequeces ad probabilities for probabilities 1996; modified 98 5 21 to add material o mutual iformatio; modified 98 7 21 to add Heath-Sudderth proof of de Fietti represetatio; modified 99 11
More informationNotes for Lecture 11
U.C. Berkeley CS78: Computatioal Complexity Hadout N Professor Luca Trevisa 3/4/008 Notes for Lecture Eigevalues, Expasio, ad Radom Walks As usual by ow, let G = (V, E) be a udirected d-regular graph with
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More informationSTATISTICAL INFERENCE
STATISTICAL INFERENCE POPULATION AND SAMPLE Populatio = all elemets of iterest Characterized by a distributio F with some parameter θ Sample = the data X 1,..., X, selected subset of the populatio = sample
More informationLecture 4: April 10, 2013
TTIC/CMSC 1150 Mathematical Toolkit Sprig 01 Madhur Tulsiai Lecture 4: April 10, 01 Scribe: Haris Agelidakis 1 Chebyshev s Iequality recap I the previous lecture, we used Chebyshev s iequality to get a
More informationLECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if
LECTURE 14 NOTES 1. Asymptotic power of tests. Defiitio 1.1. A sequece of -level tests {ϕ x)} is cosistet if β θ) := E θ [ ϕ x) ] 1 as, for ay θ Θ 1. Just like cosistecy of a sequece of estimators, Defiitio
More informationInterval Estimation (Confidence Interval = C.I.): An interval estimate of some population parameter is an interval of the form (, ),
Cofidece Iterval Estimatio Problems Suppose we have a populatio with some ukow parameter(s). Example: Normal(,) ad are parameters. We eed to draw coclusios (make ifereces) about the ukow parameters. We
More informationMASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables
MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013 Large Deviatios for i.i.d. Radom Variables Cotet. Cheroff boud usig expoetial momet geeratig fuctios. Properties of a momet
More informationMODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING. University of Illinois at Urbana-Champaign
MODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING Yuheg Bu Jiaxu Lu Veugopal V. Veeravalli Uiversity of Illiois at Urbaa-Champaig Tsighua Uiversity Email: bu3@illiois.edu, lujx4@mails.tsighua.edu.c,
More informationSupplementary Materials for Statistical-Computational Phase Transitions in Planted Models: The High-Dimensional Setting
Supplemetary Materials for Statistical-Computatioal Phase Trasitios i Plated Models: The High-Dimesioal Settig Yudog Che The Uiversity of Califoria, Berkeley yudog.che@eecs.berkeley.edu Jiamig Xu Uiversity
More informationInformation Theory Tutorial Communication over Channels with memory. Chi Zhang Department of Electrical Engineering University of Notre Dame
Iformatio Theory Tutorial Commuicatio over Chaels with memory Chi Zhag Departmet of Electrical Egieerig Uiversity of Notre Dame Abstract A geeral capacity formula C = sup I(; Y ), which is correct for
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationChi-Squared Tests Math 6070, Spring 2006
Chi-Squared Tests Math 6070, Sprig 2006 Davar Khoshevisa Uiversity of Utah February XXX, 2006 Cotets MLE for Goodess-of Fit 2 2 The Multiomial Distributio 3 3 Applicatio to Goodess-of-Fit 6 3 Testig for
More informationKurskod: TAMS11 Provkod: TENB 21 March 2015, 14:00-18:00. English Version (no Swedish Version)
Kurskod: TAMS Provkod: TENB 2 March 205, 4:00-8:00 Examier: Xiagfeg Yag (Tel: 070 2234765). Please aswer i ENGLISH if you ca. a. You are allowed to use: a calculator; formel -och tabellsamlig i matematisk
More informationEstimation of a population proportion March 23,
1 Social Studies 201 Notes for March 23, 2005 Estimatio of a populatio proportio Sectio 8.5, p. 521. For the most part, we have dealt with meas ad stadard deviatios this semester. This sectio of the otes
More informationLecture 3: August 31
36-705: Itermediate Statistics Fall 018 Lecturer: Siva Balakrisha Lecture 3: August 31 This lecture will be mostly a summary of other useful expoetial tail bouds We will ot prove ay of these i lecture,
More information