MODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING. University of Illinois at Urbana-Champaign
|
|
- Ernest Fisher
- 5 years ago
- Views:
Transcription
1 MODEL CHANGE DETECTION WITH APPLICATION TO MACHINE LEARNING Yuheg Bu Jiaxu Lu Veugopal V. Veeravalli Uiversity of Illiois at Urbaa-Champaig Tsighua Uiversity Throughout this paper, we use lower case letters to deote scalars ad vectors, ad use upper case letters to deote radom variables ad matrices. We use λ max (A) ad λ mi (A) to deote the largest ad the smallest eigevalues of matrix A, respectively, ad Tr(A) to deote the trace of a square matrix A. All logarithms are the atural oes. We cosider the model chage detectio problem i the followig settig. We are give two datasets S = z,, z ad S = z,, z with samples z draw from some istace space Z. I additio, we are give a paarxiv: v [stat.ml] 9 Nov 08 ABSTRACT Model chage detectio is studied, i which there are two sets of samples that are idepedetly ad idetically distributed (i.i.d.) accordig to a pre-chage probabilistic model with parameter, ad a post-chage model with parameter, respectively. The goal is to detect whether the chage i the model is sigificat, i.e., whether the differece betwee the prechage parameter ad the post-chage parameter is larger tha a pre-determied threshold ρ. The problem is cosidered i a Neyma-Pearso settig, where the goal is to maximize the probability of detectio uder a false alarm costrait. Sice the geeralized likelihood ratio test (GLRT) is difficult to compute i this problem, we costruct a empirical differece test (EDT), which approximates the GLRT ad has low computatioal complexity. Moreover, we provide a approximatio method to set the threshold of the EDT to meet the false alarm costrait. Experimets with liear regressio ad logistic regressio are coducted to validate the proposed algorithms. Idex Terms Model chage detectio, geeralized likelihood ratio test, Neyma-Pearso settig. INTRODUCTION We study the model chage detectio problem, where two sets of samples are idepedetly ad idetically distributed (i.i.d.) accordig to a pre-chage probabilistic model with parameter, ad a post-chage probabilistic model with parameter, respectively. The goal is to determie whether the chage i the model is sigificat or ot. We formulate the problem i a Neyma-Pearso settig, ad adopt the l distace betwee the parameters to measure the chage betwee the models. More specifically, our goal is to costruct a test to detect whether is larger tha a pre-determied threshold ρ, while satisfyig a false alarm costrait. This problem is motivated i part by the recet works o active ad adaptive sequetial learig [ 3], where the machie learig models leared i previous time-steps are used adaptively to improve the accuracy ad data-efficiecy i the The work of Y. Bu ad V. V. Veeravalli was supported by the Army Research Laboratory uder Cooperative Agreemet W9NF through the Uiversity of Illiois at Urbaa-Champaig. ext time-step. A key step i applyig these adaptive sequetial learig methods is the detectio of a abrupt or large model chage, sice adaptig to the previous model if it is sigificatly differet from the curret oe could deteriorate performace. A specific applicatio i this cotext is the detectio of a shift of user prefereces i persoalized recommedatio systems [4, 5]. I additio, we believe that our model chage detectio formulatio ca be applied i trasfer learig [6] to determie whether two machie learig tasks are trasferable. We ote that our model chage detectio problem is differet from the quickest chage detectio problem studied i [7, 8]. There a liear regressio model chages at a ukow poit i time, ad the goal is to detect the chage as soo as possible with streamig data. We are iterested i detectig whether the chage i the model is sigificat, give sets of samples from the pre- ad post-chage models. A stadard method for solvig a composite hypothesis testig problem such as the model chage detectio problem uder cosideratio is the geeralized likelihood ratio test (GLRT). However, the maximum likelihood estimates of ad required i the GLRT are difficult to compute uder the costrait ρ i this case. Our first cotributio is to propose a empirical differece test (EDT), which approximates the GLRT ad has low computatioal complexity. Moreover, we provide a approximatio method to set the threshold i the proposed EDT, which esures a boud o the worst-case false alarm probability. We validate our results usig experimets ivolvig liear regressio ad logistic regressio.. PROBLEM MODEL
2 rameterized family of distributio models M = p(z ), R d. We assume that there exist two ukow parameters, R d, such that the datasets S ad S are idepedetly geerated from the followig pre-chage ad post-chage models, respectively, Z i p(z i ), z i S, ad Z j p(z i ), z j S. () Our goal is to costruct a computatioal efficiet test to decide betwee the followig two hypotheses: H 0 : (, ) χ 0 (, ) ρ, H : (, ) χ (, ) > ρ, where ρ is a costat determied by the specific applicatios. Let δ : Z Z 0, deote the decisio rule for the model chage detectio problem. The the probabilities of false alarm ad correct detectio ca be writte as P F (δ,, ) P (, )δ(s, S ) =, (, ) χ 0, (3) P D (δ,, ) P (, )δ(s, S ) =, (, ) χ, (4) where P (, ) deotes the probability measure for the data coditioed o the model parameter (, ). Note that i (), both the ull hypothesis ad the alterative hypothesis are composite. We study the detectio problem i the Neyma-Pearso settig: max δ P D (δ,, ), (, ) χ s.t. P F (δ,, ) α, (, ) χ 0. As see i (5), our goal is to costruct a test that maximizes the detectio probability for all (, ) χ, ad satisfies the false alarm costrait for all (, ) χ 0. The solutio to (5) if it exists is said to be a uiformly most powerful (UMP) test. Sice z i ad z i are draw i.i.d. from p(z i ) ad p(z i ), respectively, we ca use L() log p(z i ), () (5) L () log p(z i ) (6) to deote the egative log-likelihood fuctios with the prechage dataset S ad post-chage dataset S, respectively. The, the maximum likelihood estimates (MLE) of ad ca be writte as ˆ ML argmi L(), ˆ ML argmi L (). (7) I additio, we deote the Hessia matrices of L() ad L () as H() L(), ad H () L (). 3. EMPIRICAL DIFFERENCE TEST 3.. Geeralized Likelihood Ratio Test I geeral, a UMP solutio to the composite hypothesis testig problem i (5) may ot exist, ad may be difficult to fid eve if it exists. A alterative approach is to apply the GLRT. The geeralized log-likelihood ratio (GLR) is give by L G(S, S ) log max (, ) χ p(zi ) p(z i ) max (, ) χ 0 p(zi ). (8) p(z i ) If L G (S, S ) does ot have poit masses uder either H 0 or H, the GLRT has the followig structure δ GL (S, S, if L G (S, S ) τ ) = 0, if L G (S, S (9) ) < τ, where τ is the threshold for the GLR statistics determied by the false alarm costrait α. For the cociseess, we defie (ˆ, ˆ ) argmi (, ) χ L() + L ( ), (ˆ 0, ˆ 0) argmi (, ) χ 0 L() + L ( ). The, the geeralized log-likelihood ratio ca be writte as (0) L G (S, S ) = L(ˆ 0 ) + L (ˆ 0) L(ˆ ) L (ˆ ). () The mai difficulty i applyig GLRT is that the miimizers (ˆ, ˆ ) ad (ˆ 0, ˆ 0) i (0) are hard to compute. I the followig subsectio, we propose a empirical differece test which approximates the GLRT ad has reduced the computatioal complexity. 3.. Empirical Differece Test We eed the followig coditios to proceed with our aalysis ad establish the asymptotical ormality of the MLEs [9]. Assumptio Regularity coditios for MLE. Smoothess: L() ad L () have first, secod ad third derivatives for all.. Strog Covexity: For all, H() ad H () are positive defiite ad ivertible. 3. Boudedess: For all, the largest eigevalues of H() ad H () are upper bouded by λ M. We ote that the MLEs (ˆ ML, ˆ ML ) belog to either χ 0 or χ. If (ˆ ML, ˆ ML ) χ, i.e., (ˆ, ˆ ) = (ˆ ML, ˆ ML ), we have L G (S, S ) = L(ˆ 0 ) L(ˆ ML ) + L (ˆ 0) L (ˆ ML ) > 0. I additio, the worst-case false alarm probability of GLRT is give by max (, χ 0) P (, )L G (S, S ) τ, which we wish to upper bouded by α. Note that L G (S, S ) > 0 whe (ˆ ML, ˆ ML ) χ holds. I the followig, we focus o the case where α < max (, χ 0) P (, )L G (S, S ) 0, i.e., a relatively small false alarm costrait α. Thus, we
3 just eed to study the false alarm probability of GLRT whe (ˆ ML, ˆ ML ) χ ad τ > 0. Give (ˆ ML, ˆ ML ) χ, it is difficult to solve for (ˆ 0, ˆ 0) i (0) exactly. However, we ca costruct a upper boud for the GLR by approximatig (ˆ 0, ˆ 0) usig a liear combiatio of (ˆ ML, ˆ ML ). Let ˆ ˆ ML ˆ ML. The ˆ > ρ, 0 = ˆ ML + µ ˆ ˆ, 0 = ˆ ML + (µ + ρ) ˆ ˆ, () where µ [0, ˆ ρ] deotes the distace betwee 0 ad ˆ ML. It ca be verified that ( 0, 0) χ 0. The, the GLR i () ca be upper bouded as L G(S, S ) = L(ˆ 0) + L (ˆ 0) L(ˆ ) L(ˆ ) L( 0) + L ( 0) L(ˆ ) L(ˆ ) (a) = (ˆ 0) H( )(ˆ 0) + (ˆ 0) H ( )(ˆ 0) = µ ˆ ˆ H( ) ˆ ˆ + ( ˆ (µ + ρ)) ˆ H ( ˆ ) (3) ˆ ˆ (b) µ σ λmax(h( )) + ( ˆ (µ + ρ)) σ λ max(h ( )), where (a) follows from the Taylor s Theorem, ad deote the parameters i the correspodig remaiders; ad (b) follows from the fact that H( ) ad H ( ) are positive defiite ad ˆ ˆ is a uit vector. Note that λ max (H( )) ad λ max (H ( )) are bouded by λ M i Assumptio. Hece, P F(δ GL) = P (, )L G(S, S ) τ µ P (, ) σ λm + ( ˆ (µ + ρ)) σ λ M τ = P (, ) ˆ η, (4) for (, ) χ 0. The false alarm probability of GLRT ca be upper bouded by the probability that the empirical differece ˆ is larger tha aother threshold η. Note that the threshold η ca be set by lettig P (, ) ˆ η α for all (, ) χ 0, which is idepedet of the ukow quatities µ ad λ M. Thus, we propose the followig empirical differece test with the followig structure to approximate the GLRT,, if ˆ η δ ED = (5) 0, if ˆ < η. The beefits for usig δ ED are two-fold: ) Istead of costructig the more complicated GLR statistics, our EDT oly requires the computatio of the empirical differece ˆ betwee the MLEs, which is more tractable i practice. ) The distributio of the empirical differece ˆ is asymptotically Gaussia, which facilitates the settig of the threshold η to meet the false alarm costrait α. 4. APPROXIMATION FOR SETTING TEST THRESHOLD I this subsectio, we provide a method based o a χ approximatio [0] to set the threshold η i the EDT. Sice ˆ ML ad ˆ ML are the MLEs of ad with ad samples, respectively, we have (ˆML ) d. N (0, I ), (ˆ ML ) d. N (0, I ), from the asymptotical ormality of MLE [9], where I deotes the Fisher iformatio matrix of the probabilistic model p(z ). Thus, we ca approximate the distributio of usig a Gaussia distributio N (, Σ ), where Σ I + I. I practice, I ad I ca be estimated by replacig ad with the correspodig MLEs ˆ ML ad ˆ ML, respectively. To satisfy the false alarm costrait i (5), we eed to set the threshold η α based o the followig equatio i the EDT, max P (, ) ηα = α. (6), χ 0 The followig theorem characterizes the distributio of that results from the Gaussia approximatio. Theorem Suppose N (, Σ ), ad the covariace matrix Σ has the eige-decompositio Σ = P ΛP, where Λ = diag(λ,, λ d ) cotais all the eigevalues, ad P is a orthogoal matrix. The, d. = d λ i (U i + b i ), (7) where U i N (0, ), ad b = ( Λ) ( ). The distributio of is a liear combiatio of idepedet o-cetral chi-squared radom variables with degree of freedom of oe, which does ot have a simple closed form []. We therefore propose the followig approximatio method to set the threshold i the EDT. Note that d P F (δ ED ) = P (, ) λ i (U i + b i ) η d P (, ) (U i + b i ) η /λ max (Σ ), (8) for (, ) χ 0, ad d (U i + b i ) is a o-cetral chi-squared χ (k, γ) radom variable with degrees of freedom k = d, ad o-cetrality parameter γ = d b i ρ /λ mi (Σ ), where the iequality follows from the fact ρ uder H 0. Thus, max P (,, χ ) η 0 max P χ (d,, χ 0 d b i ) η /λ max (Σ ). (9)
4 We ca set the threshold η α with the χ approximatio [0] usig the followig equatio, P χ (d, ρ /λ mi (Σ )) η α/λ max (Σ ) = α (0) to esure that the false alarm probability is bouded by α. 5. NUMERICAL RESULTS I this sectio, we evaluate the performace of the proposed empirical differece test δ ED i liear regressio ad logistic regressio models. Liear regressio model: The datasets S ad S are geerated from the liear model y = X + ξ, where X R d deotes the iput variable, y R deotes the respose variable ad R d deotes the weight vector. We assume that all the elemets i oises ξ R are i.i.d. zero mea Gaussia radom variables geerated from N (0, σ ). The, the Fisher iformatio matrix I = XX /σ is idepedet of. I the simulatios, we set the dimesio d = 0, the umber of samples = = 40, σ = ad ρ =. Logistic regressio model: The datasets S ad S are geerated from the followig logistic model p(y i x i, ) = + exp( y i x i ), (x i, y i ) S, () Fig.. Compariso of the performaces of the GLRT ad EDT for the liear regressio model. Fig.. Compariso of the performace of EDT with the threshold η α ad the χ approximatio η α, for the liear regressio model with α = 0.. where x i R d deotes the feature vector, y i ± deotes the label, ad R d, = is the ormalized model parameter vector. The, the Fisher iformatio matrix [ ] I = E x + exp(x i ) + exp( x i )x ix i. () I the simulatios, we choose dimesio d = 5, the umber of samples = = 60, ad set ρ such that the agle betwee ad is π 4. To illustrate the performace of the proposed algorithms, we plot the probability P (, )δ = as a fuctio of i all three figures, where the ormalized model chage / ρ rages from 0 to. Note that whe < ρ, i.e., (, ) χ 0, P (, )δ = deotes the false alarm probability P F (δ) (i the left side of the figures). I cotrast, whe > ρ, (, ) χ ad P (, )δ = deotes the detectio probability P D (δ) (i the right side of the figures). Thus, the plot of P (, )δ = provides us with a illustratio of the test performace uder both hypotheses with differet model parameters. To verify the approximatio of the GLRT with the proposed EDT, we first compare the performace of these tests for the liear regressio model (the GLRT is ot computatioally feasible for logistic regressio) for two values of the false alarm costrait α = 0. ad α = 0.3. The thresholds of these tests η α are set usig 000 rus of Mote-Carlo simulatios such that the false alarm probabilities are equal to α as i (6). It is show i Fig. that the differece betwee Fig. 3. Compariso of the performace of EDT with the threshold η α ad the χ approximatio η α, for the logistic regressio model with α = 0.. the performace of EDT ad that of GLRT is egligible with oly = = 40 samples, which justifies the use of EDT. We ote that whe / ρ =, it is impossible to distiguish H 0 ad H eve if the umber of samples ad go to ifiity, i.e., the probabilities of false alarm ad detectio are both equal to α i this case. Fig. ad Fig. 3 compare the performace of EDT with the threshold η α computed by 000 rus of Mote-Carlo simulatios i (6), ad the threshold η α set by the proposed χ approximatio i (0), respectively, whe α = 0.. It ca be observed that i both liear regressio ad logistic regressio cases, the o-cetral chi-squared approximatio i (0) provides coservative estimates of the test thresholds η, thereby esurig that the false alarm costrait is met.
5 6. REFERENCES [] C. Wilso, V. V Veeravalli, ad A. Nedich, Adaptive sequetial stochastic optimizatio, IEEE Trasactios o Automatic Cotrol, 08. [] C. Wilso ad V. V Veeravalli, Adaptive sequetial optimizatio with applicatios to machie learig, i Proceedigs of IEEE Iteratioal Coferece o Acoustics, Speech ad Sigal Processig, 06, pp [3] Y. Bu, J. Lu, ad V. V Veeravalli, Active ad adaptive sequetial learig, arxiv preprit arxiv:805.70, 08. [4] M. Elahi, F. Ricci, ad N. Rubes, A survey of active learig i collaborative filterig recommeder systems, Computer Sciece Review, vol. 0, pp. 9 50, 06. [5] N. Rubes, M. Elahi, M. Sugiyama, ad D. Kapla, Active learig i recommeder systems, i Recommeder Systems Hadbook, pp Spriger, 05. [6] S. J Pa ad Q. Yag, A survey o trasfer learig, IEEE Trasactios o Kowledge ad Data Egieerig, vol., o. 0, pp , 00. [7] J. Geg, B. Zhag, L. M Huie, ad L. Lai, Olie chage detectio of liear regressio models, i Acoustics, Speech ad Sigal Processig (ICASSP), 06 IEEE Iteratioal Coferece o. IEEE, 06, pp [8] S. Zou, G. Fellouris, ad V. V Veeravalli, Quickest chage detectio uder trasiet dyamics: Theory ad asymptotic aalysis, arxiv preprit arxiv:7.086, 07. [9] A. W. Va der Vaart, Asymptotic statistics, Cambridge series i statistical ad probabilistic mathematics. Cambridge Uiversity Press, 000. [0] A. DasGupta, Asymptotic Theory of Statistics ad Probability, Spriger, 008. [] S. J. Press, Liear combiatios of o-cetral chisquare variates, The Aals of Mathematical Statistics, pp , 966.
Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d
Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y
More informationEfficient GMM LECTURE 12 GMM II
DECEMBER 1 010 LECTURE 1 II Efficiet The estimator depeds o the choice of the weight matrix A. The efficiet estimator is the oe that has the smallest asymptotic variace amog all estimators defied by differet
More informationOptimally Sparse SVMs
A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but
More informationECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015
ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],
More informationProblem Set 4 Due Oct, 12
EE226: Radom Processes i Systems Lecturer: Jea C. Walrad Problem Set 4 Due Oct, 12 Fall 06 GSI: Assae Gueye This problem set essetially reviews detectio theory ad hypothesis testig ad some basic otios
More informationTable 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab
Sectio 12 Tests of idepedece ad homogeeity I this lecture we will cosider a situatio whe our observatios are classified by two differet features ad we would like to test if these features are idepedet
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationLecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting
Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would
More informationEcon 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara
Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio
More information( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2
82 CHAPTER 4. MAXIMUM IKEIHOOD ESTIMATION Defiitio: et X be a radom sample with joit p.m/d.f. f X x θ. The geeralised likelihood ratio test g.l.r.t. of the NH : θ H 0 agaist the alterative AH : θ H 1,
More informationLecture 7: October 18, 2017
Iformatio ad Codig Theory Autum 207 Lecturer: Madhur Tulsiai Lecture 7: October 8, 207 Biary hypothesis testig I this lecture, we apply the tools developed i the past few lectures to uderstad the problem
More informationLecture 6 Simple alternatives and the Neyman-Pearson lemma
STATS 00: Itroductio to Statistical Iferece Autum 06 Lecture 6 Simple alteratives ad the Neyma-Pearso lemma Last lecture, we discussed a umber of ways to costruct test statistics for testig a simple ull
More informationProperties and Hypothesis Testing
Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.
More informationIntroductory statistics
CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key
More informationThe Method of Least Squares. To understand least squares fitting of data.
The Method of Least Squares KEY WORDS Curve fittig, least square GOAL To uderstad least squares fittig of data To uderstad the least squares solutio of icosistet systems of liear equatios 1 Motivatio Curve
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationExpectation and Variance of a random variable
Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio
More informationA RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS
J. Japa Statist. Soc. Vol. 41 No. 1 2011 67 73 A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS Yoichi Nishiyama* We cosider k-sample ad chage poit problems for idepedet data i a
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2018 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More informationLecture Notes 15 Hypothesis Testing (Chapter 10)
1 Itroductio Lecture Notes 15 Hypothesis Testig Chapter 10) Let X 1,..., X p θ x). Suppose we we wat to kow if θ = θ 0 or ot, where θ 0 is a specific value of θ. For example, if we are flippig a coi, we
More informationOutline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression
REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques
More informationResampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.
Jauary 1, 2019 Resamplig Methods Motivatio We have so may estimators with the property θ θ d N 0, σ 2 We ca also write θ a N θ, σ 2 /, where a meas approximately distributed as Oce we have a cosistet estimator
More informationQuantile regression with multilayer perceptrons.
Quatile regressio with multilayer perceptros. S.-F. Dimby ad J. Rykiewicz Uiversite Paris 1 - SAMM 90 Rue de Tolbiac, 75013 Paris - Frace Abstract. We cosider oliear quatile regressio ivolvig multilayer
More informationSection 14. Simple linear regression.
Sectio 14 Simple liear regressio. Let us look at the cigarette dataset from [1] (available to dowload from joural s website) ad []. The cigarette dataset cotais measuremets of tar, icotie, weight ad carbo
More informationIntro to Learning Theory
Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified
More informationSample Size Determination (Two or More Samples)
Sample Sie Determiatio (Two or More Samples) STATGRAPHICS Rev. 963 Summary... Data Iput... Aalysis Summary... 5 Power Curve... 5 Calculatios... 6 Summary This procedure determies a suitable sample sie
More information1 Review of Probability & Statistics
1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5
More informationG. R. Pasha Department of Statistics Bahauddin Zakariya University Multan, Pakistan
Deviatio of the Variaces of Classical Estimators ad Negative Iteger Momet Estimator from Miimum Variace Boud with Referece to Maxwell Distributio G. R. Pasha Departmet of Statistics Bahauddi Zakariya Uiversity
More informationTopic 9: Sampling Distributions of Estimators
Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be
More information5. Likelihood Ratio Tests
1 of 5 7/29/2009 3:16 PM Virtual Laboratories > 9. Hy pothesis Testig > 1 2 3 4 5 6 7 5. Likelihood Ratio Tests Prelimiaries As usual, our startig poit is a radom experimet with a uderlyig sample space,
More information10-701/ Machine Learning Mid-term Exam Solution
0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it
More informationAccuracy Assessment for High-Dimensional Linear Regression
Uiversity of Pesylvaia ScholarlyCommos Statistics Papers Wharto Faculty Research -016 Accuracy Assessmet for High-Dimesioal Liear Regressio Toy Cai Uiversity of Pesylvaia Zijia Guo Uiversity of Pesylvaia
More informationLecture 19: Convergence
Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may
More informationECE 901 Lecture 12: Complexity Regularization and the Squared Loss
ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality
More information1 Review and Overview
DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,
More informationGoodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)
Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................
More information1 Introduction to reducing variance in Monte Carlo simulations
Copyright c 010 by Karl Sigma 1 Itroductio to reducig variace i Mote Carlo simulatios 11 Review of cofidece itervals for estimatig a mea I statistics, we estimate a ukow mea µ = E(X) of a distributio by
More informationGoodness-Of-Fit For The Generalized Exponential Distribution. Abstract
Goodess-Of-Fit For The Geeralized Expoetial Distributio By Amal S. Hassa stitute of Statistical Studies & Research Cairo Uiversity Abstract Recetly a ew distributio called geeralized expoetial or expoetiated
More informationRank tests and regression rank scores tests in measurement error models
Rak tests ad regressio rak scores tests i measuremet error models J. Jurečková ad A.K.Md.E. Saleh Charles Uiversity i Prague ad Carleto Uiversity i Ottawa Abstract The rak ad regressio rak score tests
More informationChapter 8: Estimating with Confidence
Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig
More informationStatistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.
Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized
More informationLecture 33: Bootstrap
Lecture 33: ootstrap Motivatio To evaluate ad compare differet estimators, we eed cosistet estimators of variaces or asymptotic variaces of estimators. This is also importat for hypothesis testig ad cofidece
More informationMathematical Modeling of Optimum 3 Step Stress Accelerated Life Testing for Generalized Pareto Distribution
America Joural of Theoretical ad Applied Statistics 05; 4(: 6-69 Published olie May 8, 05 (http://www.sciecepublishiggroup.com/j/ajtas doi: 0.648/j.ajtas.05040. ISSN: 6-8999 (Prit; ISSN: 6-9006 (Olie Mathematical
More informationRandom Matrices with Blocks of Intermediate Scale Strongly Correlated Band Matrices
Radom Matrices with Blocks of Itermediate Scale Strogly Correlated Bad Matrices Jiayi Tog Advisor: Dr. Todd Kemp May 30, 07 Departmet of Mathematics Uiversity of Califoria, Sa Diego Cotets Itroductio Notatio
More informationarxiv: v1 [math.pr] 13 Oct 2011
A tail iequality for quadratic forms of subgaussia radom vectors Daiel Hsu, Sham M. Kakade,, ad Tog Zhag 3 arxiv:0.84v math.pr] 3 Oct 0 Microsoft Research New Eglad Departmet of Statistics, Wharto School,
More informationComplex Algorithms for Lattice Adaptive IIR Notch Filter
4th Iteratioal Coferece o Sigal Processig Systems (ICSPS ) IPCSIT vol. 58 () () IACSIT Press, Sigapore DOI:.7763/IPCSIT..V58. Complex Algorithms for Lattice Adaptive IIR Notch Filter Hog Liag +, Nig Jia
More informationThis exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.
Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the
More informationOptimization Methods MIT 2.098/6.255/ Final exam
Optimizatio Methods MIT 2.098/6.255/15.093 Fial exam Date Give: December 19th, 2006 P1. [30 pts] Classify the followig statemets as true or false. All aswers must be well-justified, either through a short
More information1 Inferential Methods for Correlation and Regression Analysis
1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet
More informationDefinitions and Theorems. where x are the decision variables. c, b, and a are constant coefficients.
Defiitios ad Theorems Remember the scalar form of the liear programmig problem, Miimize, Subject to, f(x) = c i x i a 1i x i = b 1 a mi x i = b m x i 0 i = 1,2,, where x are the decisio variables. c, b,
More informationA Note on Effi cient Conditional Simulation of Gaussian Distributions. April 2010
A Note o Effi ciet Coditioal Simulatio of Gaussia Distributios A D D C S S, U B C, V, BC, C April 2010 A Cosider a multivariate Gaussia radom vector which ca be partitioed ito observed ad uobserved compoetswe
More informationStatistical inference: example 1. Inferential Statistics
Statistical iferece: example 1 Iferetial Statistics POPULATION SAMPLE A clothig store chai regularly buys from a supplier large quatities of a certai piece of clothig. Each item ca be classified either
More informationIntroduction to Machine Learning DIS10
CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig
More informationFirst Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise
First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >
More informationStochastic Simulation
Stochastic Simulatio 1 Itroductio Readig Assigmet: Read Chapter 1 of text. We shall itroduce may of the key issues to be discussed i this course via a couple of model problems. Model Problem 1 (Jackso
More informationApril 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE
April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE TERRY SOO Abstract These otes are adapted from whe I taught Math 526 ad meat to give a quick itroductio to cofidece
More informationFrequentist Inference
Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for
More information17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15
17. Joit distributios of extreme order statistics Lehma 5.1; Ferguso 15 I Example 10., we derived the asymptotic distributio of the maximum from a radom sample from a uiform distributio. We did this usig
More informationOrthogonal Gaussian Filters for Signal Processing
Orthogoal Gaussia Filters for Sigal Processig Mark Mackezie ad Kiet Tieu Mechaical Egieerig Uiversity of Wollogog.S.W. Australia Abstract A Gaussia filter usig the Hermite orthoormal series of fuctios
More informationKolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data
Proceedigs 59th ISI World Statistics Cogress, 5-30 August 013, Hog Kog (Sessio STS046) p.09 Kolmogorov-Smirov type Tests for Local Gaussiaity i High-Frequecy Data George Tauche, Duke Uiversity Viktor Todorov,
More informationEmpirical Process Theory and Oracle Inequalities
Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi
More informationRegression with quadratic loss
Regressio with quadratic loss Maxim Ragisky October 13, 2015 Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X,Y, where, as before,
More informationMachine Learning Theory Tübingen University, WS 2016/2017 Lecture 12
Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig
More informationTHE KALMAN FILTER RAUL ROJAS
THE KALMAN FILTER RAUL ROJAS Abstract. This paper provides a getle itroductio to the Kalma filter, a umerical method that ca be used for sesor fusio or for calculatio of trajectories. First, we cosider
More informationLet us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.
Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,
More informationA statistical method to determine sample size to estimate characteristic value of soil parameters
A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig
More informationREGRESSION WITH QUADRATIC LOSS
REGRESSION WITH QUADRATIC LOSS MAXIM RAGINSKY Regressio with quadratic loss is aother basic problem studied i statistical learig theory. We have a radom couple Z = X, Y ), where, as before, X is a R d
More information2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2
Chapter 8 Comparig Two Treatmets Iferece about Two Populatio Meas We wat to compare the meas of two populatios to see whether they differ. There are two situatios to cosider, as show i the followig examples:
More informationx = Pr ( X (n) βx ) =
Exercise 93 / page 45 The desity of a variable X i i 1 is fx α α a For α kow let say equal to α α > fx α α x α Pr X i x < x < Usig a Pivotal Quatity: x α 1 < x < α > x α 1 ad We solve i a similar way as
More informationThe DOA Estimation of Multiple Signals based on Weighting MUSIC Algorithm
, pp.10-106 http://dx.doi.org/10.1457/astl.016.137.19 The DOA Estimatio of ultiple Sigals based o Weightig USIC Algorithm Chagga Shu a, Yumi Liu State Key Laboratory of IPOC, Beijig Uiversity of Posts
More informationSTA6938-Logistic Regression Model
Dr. Yig Zhag STA6938-Logistic Regressio Model Topic -Simple (Uivariate) Logistic Regressio Model Outlies:. Itroductio. A Example-Does the liear regressio model always work? 3. Maximum Likelihood Curve
More informationOutput Analysis (2, Chapters 10 &11 Law)
B. Maddah ENMG 6 Simulatio Output Aalysis (, Chapters 10 &11 Law) Comparig alterative system cofiguratio Sice the output of a simulatio is radom, the comparig differet systems via simulatio should be doe
More informationMachine Learning for Data Science (CS 4786)
Machie Learig for Data Sciece CS 4786) Lecture & 3: Pricipal Compoet Aalysis The text i black outlies high level ideas. The text i blue provides simple mathematical details to derive or get to the algorithm
More informationECONOMIC OPERATION OF POWER SYSTEMS
ECOOMC OEATO OF OWE SYSTEMS TOUCTO Oe of the earliest applicatios of o-lie cetralized cotrol was to provide a cetral facility, to operate ecoomically, several geeratig plats supplyig the loads of the system.
More informationMonte Carlo Integration
Mote Carlo Itegratio I these otes we first review basic umerical itegratio methods (usig Riema approximatio ad the trapezoidal rule) ad their limitatios for evaluatig multidimesioal itegrals. Next we itroduce
More informationECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
ECE 90 Lecture 4: Maximum Likelihood Estimatio ad Complexity Regularizatio R Nowak 5/7/009 Review : Maximum Likelihood Estimatio We have iid observatios draw from a ukow distributio Y i iid p θ, i,, where
More informationVector Quantization: a Limiting Case of EM
. Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z
More informationECE 901 Lecture 13: Maximum Likelihood Estimation
ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered
More informationMachine Learning Brett Bernstein
Machie Learig Brett Berstei Week 2 Lecture: Cocept Check Exercises Starred problems are optioal. Excess Risk Decompositio 1. Let X = Y = {1, 2,..., 10}, A = {1,..., 10, 11} ad suppose the data distributio
More informationA goodness-of-fit test based on the empirical characteristic function and a comparison of tests for normality
A goodess-of-fit test based o the empirical characteristic fuctio ad a compariso of tests for ormality J. Marti va Zyl Departmet of Mathematical Statistics ad Actuarial Sciece, Uiversity of the Free State,
More informationMachine Learning Theory (CS 6783)
Machie Learig Theory (CS 6783) Lecture 2 : Learig Frameworks, Examples Settig up learig problems. X : istace space or iput space Examples: Computer Visio: Raw M N image vectorized X = 0, 255 M N, SIFT
More informationBayesian Methods: Introduction to Multi-parameter Models
Bayesia Methods: Itroductio to Multi-parameter Models Parameter: θ = ( θ, θ) Give Likelihood p(y θ) ad prior p(θ ), the posterior p proportioal to p(y θ) x p(θ ) Margial posterior ( θ, θ y) is Iterested
More informationA collocation method for singular integral equations with cosecant kernel via Semi-trigonometric interpolation
Iteratioal Joural of Mathematics Research. ISSN 0976-5840 Volume 9 Number 1 (017) pp. 45-51 Iteratioal Research Publicatio House http://www.irphouse.com A collocatio method for sigular itegral equatios
More informationOn Random Line Segments in the Unit Square
O Radom Lie Segmets i the Uit Square Thomas A. Courtade Departmet of Electrical Egieerig Uiversity of Califoria Los Ageles, Califoria 90095 Email: tacourta@ee.ucla.edu I. INTRODUCTION Let Q = [0, 1] [0,
More informationMixtures of Gaussians and the EM Algorithm
Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity
More informationA Risk Comparison of Ordinary Least Squares vs Ridge Regression
Joural of Machie Learig Research 14 (2013) 1505-1511 Submitted 5/12; Revised 3/13; Published 6/13 A Risk Compariso of Ordiary Least Squares vs Ridge Regressio Paramveer S. Dhillo Departmet of Computer
More informationLecture 7: Properties of Random Samples
Lecture 7: Properties of Radom Samples 1 Cotiued From Last Class Theorem 1.1. Let X 1, X,...X be a radom sample from a populatio with mea µ ad variace σ
More informationHomework Set #3 - Solutions
EE 15 - Applicatios of Covex Optimizatio i Sigal Processig ad Commuicatios Dr. Adre Tkaceko JPL Third Term 11-1 Homework Set #3 - Solutios 1. a) Note that x is closer to x tha to x l i the Euclidea orm
More informationx iu i E(x u) 0. In order to obtain a consistent estimator of β, we find the instrumental variable z which satisfies E(z u) = 0. z iu i E(z u) = 0.
27 However, β MM is icosistet whe E(x u) 0, i.e., β MM = (X X) X y = β + (X X) X u = β + ( X X ) ( X u ) \ β. Note as follows: X u = x iu i E(x u) 0. I order to obtai a cosistet estimator of β, we fid
More informationEconomics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator
Ecoomics 24B Relatio to Method of Momets ad Maximum Likelihood OLSE as a Maximum Likelihood Estimator Uder Assumptio 5 we have speci ed the distributio of the error, so we ca estimate the model parameters
More informationInvestigating the Significance of a Correlation Coefficient using Jackknife Estimates
Iteratioal Joural of Scieces: Basic ad Applied Research (IJSBAR) ISSN 2307-4531 (Prit & Olie) http://gssrr.org/idex.php?joural=jouralofbasicadapplied ---------------------------------------------------------------------------------------------------------------------------
More informationLecture Note 8 Point Estimators and Point Estimation Methods. MIT Spring 2006 Herman Bennett
Lecture Note 8 Poit Estimators ad Poit Estimatio Methods MIT 14.30 Sprig 2006 Herma Beett Give a parameter with ukow value, the goal of poit estimatio is to use a sample to compute a umber that represets
More information[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION
[412] A TEST FOR HOMOGENEITY OF THE MARGINAL DISTRIBUTIONS IN A TWO-WAY CLASSIFICATION BY ALAN STUART Divisio of Research Techiques, Lodo School of Ecoomics 1. INTRODUCTION There are several circumstaces
More informationw (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.
2 5. Weighted umber of late jobs 5.1. Release dates ad due dates: maximimizig the weight of o-time jobs Oce we add release dates, miimizig the umber of late jobs becomes a sigificatly harder problem. For
More informationStatistical Inference Based on Extremum Estimators
T. Rotheberg Fall, 2007 Statistical Iferece Based o Extremum Estimators Itroductio Suppose 0, the true value of a p-dimesioal parameter, is kow to lie i some subset S R p : Ofte we choose to estimate 0
More informationComparison of Minimum Initial Capital with Investment and Non-investment Discrete Time Surplus Processes
The 22 d Aual Meetig i Mathematics (AMM 207) Departmet of Mathematics, Faculty of Sciece Chiag Mai Uiversity, Chiag Mai, Thailad Compariso of Miimum Iitial Capital with Ivestmet ad -ivestmet Discrete Time
More informationA quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population
A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate
More informationAsymptotic Results for the Linear Regression Model
Asymptotic Results for the Liear Regressio Model C. Fli November 29, 2000 1. Asymptotic Results uder Classical Assumptios The followig results apply to the liear regressio model y = Xβ + ε, where X is
More informationAlgebra of Least Squares
October 19, 2018 Algebra of Least Squares Geometry of Least Squares Recall that out data is like a table [Y X] where Y collects observatios o the depedet variable Y ad X collects observatios o the k-dimesioal
More informationLecture 3. Properties of Summary Statistics: Sampling Distribution
Lecture 3 Properties of Summary Statistics: Samplig Distributio Mai Theme How ca we use math to justify that our umerical summaries from the sample are good summaries of the populatio? Lecture Summary
More information