Lecture 3 January 16
|
|
- Gerald Hamilton
- 6 years ago
- Views:
Transcription
1 Stats 3b: Theory of Statistics Winter 28 Lecture 3 January 6 Lecturer: Yu Bai/John Duchi Scribe: Shuangning Li, Theodor Misiakiewicz Warning: these notes may contain factual errors Reading: VDV Chater ; ELST Chater Outline of Lecture 2:. Basic consistency and identifiability 2. Asymtotic Normality (a) Taylor exansions (b) Classical log-likelihood & asymtotic normality (c) Fisher Information Reca of Delta Method Last lecture, we discussed the Delta Method (aka Taylor exansions). The basic idea was as follows: Claim. If r n (T n θ) d T, and φ : R d R k is smooth, then r n (φ(t n ) φ(θ)) φ (θ)t, if φ (θ). Idea of roof: r n (φ(t n ) φ(θ)) = r n (φ (θ)(t n θ) + o (T n θ)) = r n (φ (θ)(t n θ)) + o (r n (T n θ)) = r n (φ (θ)(t n θ)) + o () d φ (θ)t. Notation: (from now on) Given distribution P on X, function f : X R d, P f := fdp = f(x)dp (x) = E P [f(x)] X Examle (Emirical distributions): Consider the observations x, x 2,..., x n X. Let the emirical distribution P n = n n i= x i. For any set A X, P n (A) = n {i [n] : x i A} = P n {x A}. Hence for any function f, P n f = n n i= f(x i).
2 Taylor exansions. Real-valued functions For f : R d R differentiable at x R d, f(y) = f(x) + f(x) T (y x) + o( y x ). (Remainder version) If f is twice differentiable, f(y) = f(x) + f( x) T (y x). (Mean value version) f(y) = f(x) + f(x) T (y x) + 2 (y x)t 2 f(x)(y x) + o( y x 2 ). (Remainder version) f(y) = f(x) + f(x) T (y x) + 2 (y x)t 2 f( x)(y x). (Mean value version) 2. Vector-valued functions f f T Let f : R d R k f 2 (x) f2 T, f(x) =. Define Df(x) = (x).. Rk d to be the Jacobian of f. f k fk T (x) Then, f(y) = f(x) + Df(x)(y x) + o( y x ). (Remainder version) But for mean value version, we don t necessarily have x such that f(y) = f(x) + Df( x)(y x). Examle 2 (Failure of mean value version): Let f : R R k, f(x) = 2x kx k. Take x =, y =, then f(y) f(x) = =. Yet Df( x) =. x x 2, then Df(x) =. x k 2 x. k x k.. Examle 3 (Quantitative continuity guarantees): Recall the oerator norm of A is A o = su Au 2, u 2 = this imlied that Ax 2 A o x 2. For f : R d R k, differentiable, assume that Df is L Lischitz, i.e. Df(x) Df(y) o L x y 2. (Roughly, this means that D 2 f(x) L.) Claim 2. We have f(y) = f(x) + Df(x)(y x) + R(y x), where R is a remainder matrix (deending on x, y) that satisfy R o L 2 y x and R(y x) L 2 y x 2. 2
3 Proof Define φ i (t) = f i (( t)x + ty), φ i : [, ] R. Note that φ i () = f i (x), φ i () = f i (y), and φ i = ( f i (( t)x + ty) ) T (y x). Then f T (( t)x + ty) φ f T 2 (( t)x + ty) (t) φ 2 Df(( t)x + ty)(y x) = (y x) = (t)... (( t)x + ty) φ k (t) Since φ i () φ i () = φ (t)dt, f(y) f(x) = = To bound the remainder term, f T k Df(( t)x + ty)(y x)dt ( Df(( t)x + ty) Df(x) ) (y x)dt + Df(x)(y x). ( ) Df(( t)x + ty) Df(x) (y x)dt ( Df(( t)x + ty) Df(x) ) (y x) dt Df(( t)x + ty) Df(x) o (y x) dt L t(y x) (y x) dt Lt (y x) 2 dt = L 2 (y x) 2. Consistency and asymtotic distribution: Setting:. We have some model family {P θ } θ Θ of distributions on X, where Θ R d. Also, assume all P θ have density with resect to base measure µ on X, i.e. = dp θ dµ. 2. We consider the log-likelihood of the distribution l θ (x) = log (x), with [ ] d l θ (x) := log (x) R d θ j j= [ ] 2 2 d l θ (x) := log (x) R d d θ i θ j i,j= For simlicity, we will denote: lθ l θ (x) and l θ 2 l θ (x). The gradient of the log-likelihood is often called the score function. We will use this term to refer to l θ (x) throughout future lectures. 3
4 3. Observe X i iid Pθ where θ is unknown. Our goal is to estimate θ. 4. A standard estimator is to choose ˆθ n to maximize the likelihood, i.e. the robability of the data. Main questions:. Consistency: does ˆθ n θ as n +? ˆθ n argmax P n l θ (x) θ Θ 2. Asymtotic distribution: does r n (ˆθ n θ ) converge in distribution? 3. Otimality? (in the next lecture) Consistency: Definition. (Identifiability). A model {P θ } θ Θ is identifiable if P θ P θ2 for all θ, θ 2 Θ (θ θ 2 ). Equivalently, D kl (P θ P θ2 ) > when θ θ 2. Recall that D kl (P θ P θ2 ) = log dp θ dp θ. dp θ2 Note that P θ P θ2 means that set A X such that P θ (A) P θ2 (A). Now that we have established what both identifiability and consistency mean, we can rove a basic result regarding the finite consistency of the Maximum Likelihood estimator (MLE). Proosition 3 (Finite Θ consistency of MLE). Suose {P θ } θ Θ is identifiable and card Θ <. Then, if ˆθ n := argmax θ Θ P n l θ (x), ˆθ n θ when X i iid Pθ. Proof of Proosition when x i iid Pθ. By the Strong Law of Large Numbers, we know that P n l θ (x) a.s. P θ l θ (x) [ P θ l θ (x) P θ l θ (x) = E θ log ] θ (x) (x) = D kl (P θ P θ ) We know that D kl (P θ P θ ) > unless θ = θ. Combining this remark with P n l θ (x) P n l θ (x) a.s. D kl (P θ P θ ), we deduce that there exists N(θ) such that for all n > N(θ), we have P n l θ (x) P n l θ (x) > with robability. It follows that for n > max θ Θ,θ θ N(θ), we have P n l θ (x) > P n l θ (x) for all θ θ. Therefore ˆθ n = θ and we conclude that, for sufficiently large n and finite Θ, we have ˆθ n = θ eventually. Remark The above result can fail for Θ infinite even if Θ is countable. Uniform law: One sufficient condition often used for consistency results is a uniform law, i.e. for iid x i P, we have suθ Θ P n l θ P l θ. In this case, if P θ l θ < P θ l θ 2ɛ and su Θ P n l θ P θ l θ ɛ, then ˆθ n θ. We will have: ˆθ n {θ : P θ l θ P θ l θ 2ɛ} 4
5 Now, that we have established some basic definitions and results regarding the consistency of estimators, we turn our attention to understanding their asymtotic behavior. Asymtotic Normality via Taylor Exansions: Definition.2 (Oerator norm). A o := su u 2 Au 2. Note: A R k d, u R d and Ax 2 A o x 2. Before we do anything, we have to make several assumtions.. We have a nice, smooth model, i.e. the Hessian is Lischitz-continuous. To be rigorous, the following must hold: 2 l θ (x) 2 l θ2 (x) o M(x) θ θ 2 2 E θ [M 2 (x)] < 2. The MLE, ˆθ n argmax θ Θ P n l θ (x), is consistent, i.e. ˆθ n θ under P θ. 3. Θ is a convex set. iid Theorem 4. Let x i Pθ, ˆθ n be the MLE (i.e. P n = ) and assume the conditions stated lˆθn above. Then, n(ˆθ n θ ) d N(, (P θ 2 l θ ) P θ l θ l T θ (P θ 2 l θ ) ). ( ) Remark Let us rewrite the asymtotic variance. Given that 2 l θ = θ = 2 T θ : 2 θ [ 2 ] 2 E θ = dµ = 2 dµ = 2 dµ = As a result: [ ( θ ) ( ) ] T E θ [ 2 θ l θ ] = E θ = Cov( l θ (x)) θ We define the Fisher Information as I θ := E θ [ l θ (x) l θ (x) T ] = Cov θ l θ where the final equality holds because E θ [ l θ (x)] = (θ maximizes E θ [l θ (x)]). To show this, assume that we can swa, E. Then, l θ (x) = log (x) = (x) (x). Using that result, we see that: [ ] θ θ E θ [ l θ ] = E = dµ = dµ = dµ = () =. We now have a more comact reresentation of the asymtotic distribution described in the Theorem above. n(ˆθn θ ) d N(, I θ I θ I θ ) = N(, I Consider I θ = 2 E[l θ (x)]. If the magnitude of the second derivative is large, that imlies that the log-likelihood is stee around the global maximum (making it easy to find). Alternatively, if the magnitude of 2 E[l θ (x)] is small, we do not have sufficient curvature to find the otimal θ. θ ) 5
6 Proof Let r(x) R d d be the remainder matrix in Taylor exansion of the gradients of the individual log likelihood terms around θ guaranteed by Taylor s theorem (which certainly deends on θ n θ ), that is, l θn (x) = l θ (x) + 2 l θ (x)( θ n θ ) + r(x)( θ n θ ), where by Taylor s theorem r(x) o M(x) θ n θ. Writing this out using the emirical distribution and that θ n = argmax θ P n l θ (X), we have P n l θn = = P n l θ + P n 2 l θ ( θ n θ ) + P n r(x)( θ n θ ). () But of course, exanding the term P n r(x) R d d, we find that P n r(x) = n n r(x i ) and i= P n r o n M(X i ) θ n θ = o n }{{} P (). i= }{{} a.s. E θ [M(X)] In articular, revisiting exression (), we have = P n l θ + P n 2 l θ ( θ n θ ) + o P ()( θ n θ ). = P n l θ + ( P θ 2 l θ + (P n P θ ) 2 l θ + o P () ) ( θ n θ ). The strong law of large numbers guarantees that (P n P θ ) 2 l θ = o P (), and multilying each side by n yields n(pθ 2 l θ + o P ())( θ n θ ) = np n l θ. Alying Slutsky s theorem gives the result: indeed, we have T n = np n l θ satisfies T n d N(, I θ ) by the central limit theorem, and noting that P θ 2 l θ + o P () is eventually invertible gives n( θn θ ) d N(, (P θ 2 l θ ) I θ (P θ 2 l θ ) ) as desired. 6
1 Extremum Estimators
FINC 9311-21 Financial Econometrics Handout Jialin Yu 1 Extremum Estimators Let θ 0 be a vector of k 1 unknown arameters. Extremum estimators: estimators obtained by maximizing or minimizing some objective
More informationLecture 2: Consistency of M-estimators
Lecture 2: Instructor: Deartment of Economics Stanford University Preared by Wenbo Zhou, Renmin University References Takeshi Amemiya, 1985, Advanced Econometrics, Harvard University Press Newey and McFadden,
More informationLECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]
LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for
More informationMaximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia
Maximum Likelihood Asymtotic Theory Eduardo Rossi University of Pavia Slutsky s Theorem, Cramer s Theorem Slutsky s Theorem Let {X N } be a random sequence converging in robability to a constant a, and
More informationLecture 3 September 1
STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have
More informationMaximum Likelihood Estimation
Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationDA Freedman Notes on the MLE Fall 2003
DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar
More informationMaximum Likelihood Estimation
Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function
More informationp-adic Measures and Bernoulli Numbers
-Adic Measures and Bernoulli Numbers Adam Bowers Introduction The constants B k in the Taylor series exansion t e t = t k B k k! k=0 are known as the Bernoulli numbers. The first few are,, 6, 0, 30, 0,
More informationElementary Analysis in Q p
Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationLecture 4 September 15
IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric
More informationLecture 23 Maximum Likelihood Estimation and Bayesian Inference
Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture
More informationElements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley
Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the
More informationMA3H1 TOPICS IN NUMBER THEORY PART III
MA3H1 TOPICS IN NUMBER THEORY PART III SAMIR SIKSEK 1. Congruences Modulo m In quadratic recirocity we studied congruences of the form x 2 a (mod ). We now turn our attention to situations where is relaced
More informationProblem Set 2 Solution
Problem Set 2 Solution Aril 22nd, 29 by Yang. More Generalized Slutsky heorem [Simle and Abstract] su L n γ, β n Lγ, β = su L n γ, β n Lγ, β n + Lγ, β n Lγ, β su L n γ, β n Lγ, β n + su Lγ, β n Lγ, β su
More informationLecture 10: Hypercontractivity
CS 880: Advanced Comlexity Theory /15/008 Lecture 10: Hyercontractivity Instructor: Dieter van Melkebeek Scribe: Baris Aydinlioglu This is a technical lecture throughout which we rove the hyercontractivity
More informationDepartment of Mathematics
Deartment of Mathematics Ma 3/03 KC Border Introduction to Probability and Statistics Winter 209 Sulement : Series fun, or some sums Comuting the mean and variance of discrete distributions often involves
More informationStatistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it
Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students
More informationEE/Stats 376A: Information theory Winter Lecture 5 Jan 24. Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B.
EE/Stats 376A: Information theory Winter 207 Lecture 5 Jan 24 Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B. 5. Outline Markov chains and stationary distributions Prefix codes
More informationNonlinear equations. Norms for R n. Convergence orders for iterative methods
Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector
More informationUnconstrained minimization of smooth functions
Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationAnalysis of some entrance probabilities for killed birth-death processes
Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationLecture 6. 2 Recurrence/transience, harmonic functions and martingales
Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification
More informationChapter 3. Point Estimation. 3.1 Introduction
Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.
More informationREVIEW OF DIFFERENTIAL CALCULUS
REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be
More informationSampling distribution of GLM regression coefficients
Sampling distribution of GLM regression coefficients Patrick Breheny February 5 Patrick Breheny BST 760: Advanced Regression 1/20 Introduction So far, we ve discussed the basic properties of the score,
More informationDIFFERENTIAL GEOMETRY. LECTURES 9-10,
DIFFERENTIAL GEOMETRY. LECTURES 9-10, 23-26.06.08 Let us rovide some more details to the definintion of the de Rham differential. Let V, W be two vector bundles and assume we want to define an oerator
More informationSharp gradient estimate and spectral rigidity for p-laplacian
Shar gradient estimate and sectral rigidity for -Lalacian Chiung-Jue Anna Sung and Jiaing Wang To aear in ath. Research Letters. Abstract We derive a shar gradient estimate for ositive eigenfunctions of
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS
PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Number 0, Pages 000 000 S 000-9939XX)0000-0 ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS WILLIAM D. BANKS AND ASMA HARCHARRAS
More informationChapter 7: Special Distributions
This chater first resents some imortant distributions, and then develos the largesamle distribution theory which is crucial in estimation and statistical inference Discrete distributions The Bernoulli
More informationTheoretical Statistics. Lecture 17.
Theoretical Statistics. Lecture 17. Peter Bartlett 1. Asymptotic normality of Z-estimators: classical conditions. 2. Asymptotic equicontinuity. 1 Recall: Delta method Theorem: Supposeφ : R k R m is differentiable
More informationTHE 2D CASE OF THE BOURGAIN-DEMETER-GUTH ARGUMENT
THE 2D CASE OF THE BOURGAIN-DEMETER-GUTH ARGUMENT ZANE LI Let e(z) := e 2πiz and for g : [0, ] C and J [0, ], define the extension oerator E J g(x) := g(t)e(tx + t 2 x 2 ) dt. J For a ositive weight ν
More informationMath 701: Secant Method
Math 701: Secant Method The secant method aroximates solutions to f(x = 0 using an iterative scheme similar to Newton s method in which the derivative has been relace by This results in the two-term recurrence
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationLecture 14: October 17
1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:
More informationStochastic Convergence, Delta Method & Moment Estimators
Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)
More informationMaximum likelihood estimation
Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More information3 Properties of Dedekind domains
18.785 Number theory I Fall 2016 Lecture #3 09/15/2016 3 Proerties of Dedekind domains In the revious lecture we defined a Dedekind domain as a noetherian domain A that satisfies either of the following
More informationTheoretical Statistics. Lecture 25.
Theoretical Statistics. Lecture 25. Peter Bartlett 1. Relative efficiency of tests [vdv14]: Rescaling rates. 2. Likelihood ratio tests [vdv15]. 1 Recall: Relative efficiency of tests Theorem: Suppose that
More informationElementary theory of L p spaces
CHAPTER 3 Elementary theory of L saces 3.1 Convexity. Jensen, Hölder, Minkowski inequality. We begin with two definitions. A set A R d is said to be convex if, for any x 0, x 1 2 A x = x 0 + (x 1 x 0 )
More informationThe Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001
The Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001 The Hasse-Minkowski Theorem rovides a characterization of the rational quadratic forms. What follows is a roof of the Hasse-Minkowski
More informationMathematical Economics: Lecture 9
Mathematical Economics: Lecture 9 Yu Ren WISE, Xiamen University October 17, 2011 Outline 1 Chapter 14: Calculus of Several Variables New Section Chapter 14: Calculus of Several Variables Partial Derivatives
More informationTheoretical Statistics. Lecture 23.
Theoretical Statistics. Lecture 23. Peter Bartlett 1. Recall: QMD and local asymptotic normality. [vdv7] 2. Convergence of experiments, maximum likelihood. 3. Relative efficiency of tests. [vdv14] 1 Local
More informationAn Overview of Witt Vectors
An Overview of Witt Vectors Daniel Finkel December 7, 2007 Abstract This aer offers a brief overview of the basics of Witt vectors. As an alication, we summarize work of Bartolo and Falcone to rove that
More informationNONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS. The Goldstein-Levitin-Polyak algorithm
- (23) NLP - NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS The Goldstein-Levitin-Polya algorithm We consider an algorithm for solving the otimization roblem under convex constraints. Although the convexity
More informationP n. This is called the law of large numbers but it comes in two forms: Strong and Weak.
Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationLecture 1: January 12
10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 1: January 12 Scribes: Seo-Jin Bang, Prabhat KC, Josue Orellana 1.1 Review We begin by going through some examples and key
More informationMATH 250: THE DISTRIBUTION OF PRIMES. ζ(s) = n s,
MATH 50: THE DISTRIBUTION OF PRIMES ROBERT J. LEMKE OLIVER For s R, define the function ζs) by. Euler s work on rimes ζs) = which converges if s > and diverges if s. In fact, though we will not exloit
More informationConvex Analysis and Economic Theory Winter 2018
Division of the Humanities and Social Sciences Ec 181 KC Border Conve Analysis and Economic Theory Winter 2018 Toic 16: Fenchel conjugates 16.1 Conjugate functions Recall from Proosition 14.1.1 that is
More informationMath 205A - Fall 2015 Homework #4 Solutions
Math 25A - Fall 25 Homework #4 Solutions Problem : Let f L and µ(t) = m{x : f(x) > t} the distribution function of f. Show that: (i) µ(t) t f L (). (ii) f L () = t µ(t)dt. (iii) For any increasing differentiable
More informationTheoretical Statistics. Lecture 22.
Theoretical Statistics. Lecture 22. Peter Bartlett 1. Recall: Asymptotic testing. 2. Quadratic mean differentiability. 3. Local asymptotic normality. [vdv7] 1 Recall: Asymptotic testing Consider the asymptotics
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationLORENZO BRANDOLESE AND MARIA E. SCHONBEK
LARGE TIME DECAY AND GROWTH FOR SOLUTIONS OF A VISCOUS BOUSSINESQ SYSTEM LORENZO BRANDOLESE AND MARIA E. SCHONBEK Abstract. In this aer we analyze the decay and the growth for large time of weak and strong
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More information1. Bounded linear maps. A linear map T : E F of real Banach
DIFFERENTIABLE MAPS 1. Bounded linear maps. A linear map T : E F of real Banach spaces E, F is bounded if M > 0 so that for all v E: T v M v. If v r T v C for some positive constants r, C, then T is bounded:
More information4. Score normalization technical details We now discuss the technical details of the score normalization method.
SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules
More informationChapter 2. Discrete Distributions
Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation
More informationPETER J. GRABNER AND ARNOLD KNOPFMACHER
ARITHMETIC AND METRIC PROPERTIES OF -ADIC ENGEL SERIES EXPANSIONS PETER J. GRABNER AND ARNOLD KNOPFMACHER Abstract. We derive a characterization of rational numbers in terms of their unique -adic Engel
More information10-725/36-725: Convex Optimization Prerequisite Topics
10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the
More informationAlmost 4000 years ago, Babylonians had discovered the following approximation to. x 2 dy 2 =1, (5.0.2)
Chater 5 Pell s Equation One of the earliest issues graled with in number theory is the fact that geometric quantities are often not rational. For instance, if we take a right triangle with two side lengths
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA. Asymptotic Inference in Exponential Families Let X j be a sequence of independent,
More informationMultivariate Analysis and Likelihood Inference
Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density
More information1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016
AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the
More informationSobolev Spaces with Weights in Domains and Boundary Value Problems for Degenerate Elliptic Equations
Sobolev Saces with Weights in Domains and Boundary Value Problems for Degenerate Ellitic Equations S. V. Lototsky Deartment of Mathematics, M.I.T., Room 2-267, 77 Massachusetts Avenue, Cambridge, MA 02139-4307,
More informationIntroduction to gradient descent
6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our
More informationExtension of Minimax to Infinite Matrices
Extension of Minimax to Infinite Matrices Chris Calabro June 21, 2004 Abstract Von Neumann s minimax theorem is tyically alied to a finite ayoff matrix A R m n. Here we show that (i) if m, n are both inite,
More informationLecture 17: Numerical Optimization October 2014
Lecture 17: Numerical Optimization 36-350 22 October 2014 Agenda Basics of optimization Gradient descent Newton s method Curve-fitting R: optim, nls Reading: Recipes 13.1 and 13.2 in The R Cookbook Optional
More informationAdvanced Calculus I. Part A, for both Section 200 and Section 501
Sring 2 Instructions Please write your solutions on your own aer. These roblems should be treated as essay questions. A roblem that says give an examle requires a suorting exlanation. In all roblems, you
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationRecall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n
Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible
More informationf(x) f(z) c x z > 0 1
INVERSE AND IMPLICIT FUNCTION THEOREMS I use df x for the linear transformation that is the differential of f at x.. INVERSE FUNCTION THEOREM Definition. Suppose S R n is open, a S, and f : S R n is a
More informationWe denote the derivative at x by DF (x) = L. With respect to the standard bases of R n and R m, DF (x) is simply the matrix of partial derivatives,
The derivative Let O be an open subset of R n, and F : O R m a continuous function We say F is differentiable at a point x O, with derivative L, if L : R n R m is a linear transformation such that, for
More informationSums of independent random variables
3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More information6 Binary Quadratic forms
6 Binary Quadratic forms 6.1 Fermat-Euler Theorem A binary quadratic form is an exression of the form f(x,y) = ax 2 +bxy +cy 2 where a,b,c Z. Reresentation of an integer by a binary quadratic form has
More informationf(r) = a d n) d + + a0 = 0
Math 400-00/Foundations of Algebra/Fall 07 Polynomials at the Foundations: Roots Next, we turn to the notion of a root of a olynomial in Q[x]. Definition 8.. r Q is a rational root of fx) Q[x] if fr) 0.
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationk- price auctions and Combination-auctions
k- rice auctions and Combination-auctions Martin Mihelich Yan Shu Walnut Algorithms March 6, 219 arxiv:181.3494v3 [q-fin.mf] 5 Mar 219 Abstract We rovide for the first time an exact analytical solution
More informationApplications to stochastic PDE
15 Alications to stochastic PE In this final lecture we resent some alications of the theory develoed in this course to stochastic artial differential equations. We concentrate on two secific examles:
More informationDifference of Convex Functions Programming for Reinforcement Learning (Supplementary File)
Difference of Convex Functions Programming for Reinforcement Learning Sulementary File Bilal Piot 1,, Matthieu Geist 1, Olivier Pietquin,3 1 MaLIS research grou SUPELEC - UMI 958 GeorgiaTech-CRS, France
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationECE 275A Homework 6 Solutions
ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =
More informationCHAPTER 2: SMOOTH MAPS. 1. Introduction In this chapter we introduce smooth maps between manifolds, and some important
CHAPTER 2: SMOOTH MAPS DAVID GLICKENSTEIN 1. Introduction In this chater we introduce smooth mas between manifolds, and some imortant concets. De nition 1. A function f : M! R k is a smooth function if
More informationOn Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law
On Isoerimetric Functions of Probability Measures Having Log-Concave Densities with Resect to the Standard Normal Law Sergey G. Bobkov Abstract Isoerimetric inequalities are discussed for one-dimensional
More information11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.
C PROPERTIES OF MATRICES 697 to whether the permutation i 1 i 2 i N is even or odd, respectively Note that I =1 Thus, for a 2 2 matrix, the determinant takes the form A = a 11 a 12 = a a 21 a 11 a 22 a
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationGeneral Linear Model Introduction, Classes of Linear models and Estimation
Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)
More informationLecture 3 - Linear and Logistic Regression
3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression
More informationLocation of solutions for quasi-linear elliptic equations with general gradient dependence
Electronic Journal of Qualitative Theory of Differential Equations 217, No. 87, 1 1; htts://doi.org/1.14232/ejqtde.217.1.87 www.math.u-szeged.hu/ejqtde/ Location of solutions for quasi-linear ellitic equations
More informationInference for Empirical Wasserstein Distances on Finite Spaces: Supplementary Material
Inference for Emirical Wasserstein Distances on Finite Saces: Sulementary Material Max Sommerfeld Axel Munk Keywords: otimal transort, Wasserstein distance, central limit theorem, directional Hadamard
More informationMA102: Multivariable Calculus
MA102: Multivariable Calculus Rupam Barman and Shreemayee Bora Department of Mathematics IIT Guwahati Differentiability of f : U R n R m Definition: Let U R n be open. Then f : U R n R m is differentiable
More informationElements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley
Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the
More information