Lecture 3 January 16

Size: px
Start display at page:

Download "Lecture 3 January 16"

Transcription

1 Stats 3b: Theory of Statistics Winter 28 Lecture 3 January 6 Lecturer: Yu Bai/John Duchi Scribe: Shuangning Li, Theodor Misiakiewicz Warning: these notes may contain factual errors Reading: VDV Chater ; ELST Chater Outline of Lecture 2:. Basic consistency and identifiability 2. Asymtotic Normality (a) Taylor exansions (b) Classical log-likelihood & asymtotic normality (c) Fisher Information Reca of Delta Method Last lecture, we discussed the Delta Method (aka Taylor exansions). The basic idea was as follows: Claim. If r n (T n θ) d T, and φ : R d R k is smooth, then r n (φ(t n ) φ(θ)) φ (θ)t, if φ (θ). Idea of roof: r n (φ(t n ) φ(θ)) = r n (φ (θ)(t n θ) + o (T n θ)) = r n (φ (θ)(t n θ)) + o (r n (T n θ)) = r n (φ (θ)(t n θ)) + o () d φ (θ)t. Notation: (from now on) Given distribution P on X, function f : X R d, P f := fdp = f(x)dp (x) = E P [f(x)] X Examle (Emirical distributions): Consider the observations x, x 2,..., x n X. Let the emirical distribution P n = n n i= x i. For any set A X, P n (A) = n {i [n] : x i A} = P n {x A}. Hence for any function f, P n f = n n i= f(x i).

2 Taylor exansions. Real-valued functions For f : R d R differentiable at x R d, f(y) = f(x) + f(x) T (y x) + o( y x ). (Remainder version) If f is twice differentiable, f(y) = f(x) + f( x) T (y x). (Mean value version) f(y) = f(x) + f(x) T (y x) + 2 (y x)t 2 f(x)(y x) + o( y x 2 ). (Remainder version) f(y) = f(x) + f(x) T (y x) + 2 (y x)t 2 f( x)(y x). (Mean value version) 2. Vector-valued functions f f T Let f : R d R k f 2 (x) f2 T, f(x) =. Define Df(x) = (x).. Rk d to be the Jacobian of f. f k fk T (x) Then, f(y) = f(x) + Df(x)(y x) + o( y x ). (Remainder version) But for mean value version, we don t necessarily have x such that f(y) = f(x) + Df( x)(y x). Examle 2 (Failure of mean value version): Let f : R R k, f(x) = 2x kx k. Take x =, y =, then f(y) f(x) = =. Yet Df( x) =. x x 2, then Df(x) =. x k 2 x. k x k.. Examle 3 (Quantitative continuity guarantees): Recall the oerator norm of A is A o = su Au 2, u 2 = this imlied that Ax 2 A o x 2. For f : R d R k, differentiable, assume that Df is L Lischitz, i.e. Df(x) Df(y) o L x y 2. (Roughly, this means that D 2 f(x) L.) Claim 2. We have f(y) = f(x) + Df(x)(y x) + R(y x), where R is a remainder matrix (deending on x, y) that satisfy R o L 2 y x and R(y x) L 2 y x 2. 2

3 Proof Define φ i (t) = f i (( t)x + ty), φ i : [, ] R. Note that φ i () = f i (x), φ i () = f i (y), and φ i = ( f i (( t)x + ty) ) T (y x). Then f T (( t)x + ty) φ f T 2 (( t)x + ty) (t) φ 2 Df(( t)x + ty)(y x) = (y x) = (t)... (( t)x + ty) φ k (t) Since φ i () φ i () = φ (t)dt, f(y) f(x) = = To bound the remainder term, f T k Df(( t)x + ty)(y x)dt ( Df(( t)x + ty) Df(x) ) (y x)dt + Df(x)(y x). ( ) Df(( t)x + ty) Df(x) (y x)dt ( Df(( t)x + ty) Df(x) ) (y x) dt Df(( t)x + ty) Df(x) o (y x) dt L t(y x) (y x) dt Lt (y x) 2 dt = L 2 (y x) 2. Consistency and asymtotic distribution: Setting:. We have some model family {P θ } θ Θ of distributions on X, where Θ R d. Also, assume all P θ have density with resect to base measure µ on X, i.e. = dp θ dµ. 2. We consider the log-likelihood of the distribution l θ (x) = log (x), with [ ] d l θ (x) := log (x) R d θ j j= [ ] 2 2 d l θ (x) := log (x) R d d θ i θ j i,j= For simlicity, we will denote: lθ l θ (x) and l θ 2 l θ (x). The gradient of the log-likelihood is often called the score function. We will use this term to refer to l θ (x) throughout future lectures. 3

4 3. Observe X i iid Pθ where θ is unknown. Our goal is to estimate θ. 4. A standard estimator is to choose ˆθ n to maximize the likelihood, i.e. the robability of the data. Main questions:. Consistency: does ˆθ n θ as n +? ˆθ n argmax P n l θ (x) θ Θ 2. Asymtotic distribution: does r n (ˆθ n θ ) converge in distribution? 3. Otimality? (in the next lecture) Consistency: Definition. (Identifiability). A model {P θ } θ Θ is identifiable if P θ P θ2 for all θ, θ 2 Θ (θ θ 2 ). Equivalently, D kl (P θ P θ2 ) > when θ θ 2. Recall that D kl (P θ P θ2 ) = log dp θ dp θ. dp θ2 Note that P θ P θ2 means that set A X such that P θ (A) P θ2 (A). Now that we have established what both identifiability and consistency mean, we can rove a basic result regarding the finite consistency of the Maximum Likelihood estimator (MLE). Proosition 3 (Finite Θ consistency of MLE). Suose {P θ } θ Θ is identifiable and card Θ <. Then, if ˆθ n := argmax θ Θ P n l θ (x), ˆθ n θ when X i iid Pθ. Proof of Proosition when x i iid Pθ. By the Strong Law of Large Numbers, we know that P n l θ (x) a.s. P θ l θ (x) [ P θ l θ (x) P θ l θ (x) = E θ log ] θ (x) (x) = D kl (P θ P θ ) We know that D kl (P θ P θ ) > unless θ = θ. Combining this remark with P n l θ (x) P n l θ (x) a.s. D kl (P θ P θ ), we deduce that there exists N(θ) such that for all n > N(θ), we have P n l θ (x) P n l θ (x) > with robability. It follows that for n > max θ Θ,θ θ N(θ), we have P n l θ (x) > P n l θ (x) for all θ θ. Therefore ˆθ n = θ and we conclude that, for sufficiently large n and finite Θ, we have ˆθ n = θ eventually. Remark The above result can fail for Θ infinite even if Θ is countable. Uniform law: One sufficient condition often used for consistency results is a uniform law, i.e. for iid x i P, we have suθ Θ P n l θ P l θ. In this case, if P θ l θ < P θ l θ 2ɛ and su Θ P n l θ P θ l θ ɛ, then ˆθ n θ. We will have: ˆθ n {θ : P θ l θ P θ l θ 2ɛ} 4

5 Now, that we have established some basic definitions and results regarding the consistency of estimators, we turn our attention to understanding their asymtotic behavior. Asymtotic Normality via Taylor Exansions: Definition.2 (Oerator norm). A o := su u 2 Au 2. Note: A R k d, u R d and Ax 2 A o x 2. Before we do anything, we have to make several assumtions.. We have a nice, smooth model, i.e. the Hessian is Lischitz-continuous. To be rigorous, the following must hold: 2 l θ (x) 2 l θ2 (x) o M(x) θ θ 2 2 E θ [M 2 (x)] < 2. The MLE, ˆθ n argmax θ Θ P n l θ (x), is consistent, i.e. ˆθ n θ under P θ. 3. Θ is a convex set. iid Theorem 4. Let x i Pθ, ˆθ n be the MLE (i.e. P n = ) and assume the conditions stated lˆθn above. Then, n(ˆθ n θ ) d N(, (P θ 2 l θ ) P θ l θ l T θ (P θ 2 l θ ) ). ( ) Remark Let us rewrite the asymtotic variance. Given that 2 l θ = θ = 2 T θ : 2 θ [ 2 ] 2 E θ = dµ = 2 dµ = 2 dµ = As a result: [ ( θ ) ( ) ] T E θ [ 2 θ l θ ] = E θ = Cov( l θ (x)) θ We define the Fisher Information as I θ := E θ [ l θ (x) l θ (x) T ] = Cov θ l θ where the final equality holds because E θ [ l θ (x)] = (θ maximizes E θ [l θ (x)]). To show this, assume that we can swa, E. Then, l θ (x) = log (x) = (x) (x). Using that result, we see that: [ ] θ θ E θ [ l θ ] = E = dµ = dµ = dµ = () =. We now have a more comact reresentation of the asymtotic distribution described in the Theorem above. n(ˆθn θ ) d N(, I θ I θ I θ ) = N(, I Consider I θ = 2 E[l θ (x)]. If the magnitude of the second derivative is large, that imlies that the log-likelihood is stee around the global maximum (making it easy to find). Alternatively, if the magnitude of 2 E[l θ (x)] is small, we do not have sufficient curvature to find the otimal θ. θ ) 5

6 Proof Let r(x) R d d be the remainder matrix in Taylor exansion of the gradients of the individual log likelihood terms around θ guaranteed by Taylor s theorem (which certainly deends on θ n θ ), that is, l θn (x) = l θ (x) + 2 l θ (x)( θ n θ ) + r(x)( θ n θ ), where by Taylor s theorem r(x) o M(x) θ n θ. Writing this out using the emirical distribution and that θ n = argmax θ P n l θ (X), we have P n l θn = = P n l θ + P n 2 l θ ( θ n θ ) + P n r(x)( θ n θ ). () But of course, exanding the term P n r(x) R d d, we find that P n r(x) = n n r(x i ) and i= P n r o n M(X i ) θ n θ = o n }{{} P (). i= }{{} a.s. E θ [M(X)] In articular, revisiting exression (), we have = P n l θ + P n 2 l θ ( θ n θ ) + o P ()( θ n θ ). = P n l θ + ( P θ 2 l θ + (P n P θ ) 2 l θ + o P () ) ( θ n θ ). The strong law of large numbers guarantees that (P n P θ ) 2 l θ = o P (), and multilying each side by n yields n(pθ 2 l θ + o P ())( θ n θ ) = np n l θ. Alying Slutsky s theorem gives the result: indeed, we have T n = np n l θ satisfies T n d N(, I θ ) by the central limit theorem, and noting that P θ 2 l θ + o P () is eventually invertible gives n( θn θ ) d N(, (P θ 2 l θ ) I θ (P θ 2 l θ ) ) as desired. 6

1 Extremum Estimators

1 Extremum Estimators FINC 9311-21 Financial Econometrics Handout Jialin Yu 1 Extremum Estimators Let θ 0 be a vector of k 1 unknown arameters. Extremum estimators: estimators obtained by maximizing or minimizing some objective

More information

Lecture 2: Consistency of M-estimators

Lecture 2: Consistency of M-estimators Lecture 2: Instructor: Deartment of Economics Stanford University Preared by Wenbo Zhou, Renmin University References Takeshi Amemiya, 1985, Advanced Econometrics, Harvard University Press Newey and McFadden,

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia Maximum Likelihood Asymtotic Theory Eduardo Rossi University of Pavia Slutsky s Theorem, Cramer s Theorem Slutsky s Theorem Let {X N } be a random sequence converging in robability to a constant a, and

More information

Lecture 3 September 1

Lecture 3 September 1 STAT 383C: Statistical Modeling I Fall 2016 Lecture 3 September 1 Lecturer: Purnamrita Sarkar Scribe: Giorgio Paulon, Carlos Zanini Disclaimer: These scribe notes have been slightly proofread and may have

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 8 Maximum Likelihood Estimation 8. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

DA Freedman Notes on the MLE Fall 2003

DA Freedman Notes on the MLE Fall 2003 DA Freedman Notes on the MLE Fall 2003 The object here is to provide a sketch of the theory of the MLE. Rigorous presentations can be found in the references cited below. Calculus. Let f be a smooth, scalar

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Chapter 7 Maximum Likelihood Estimation 7. Consistency If X is a random variable (or vector) with density or mass function f θ (x) that depends on a parameter θ, then the function f θ (X) viewed as a function

More information

p-adic Measures and Bernoulli Numbers

p-adic Measures and Bernoulli Numbers -Adic Measures and Bernoulli Numbers Adam Bowers Introduction The constants B k in the Taylor series exansion t e t = t k B k k! k=0 are known as the Bernoulli numbers. The first few are,, 6, 0, 30, 0,

More information

Elementary Analysis in Q p

Elementary Analysis in Q p Elementary Analysis in Q Hannah Hutter, May Szedlák, Phili Wirth November 17, 2011 This reort follows very closely the book of Svetlana Katok 1. 1 Sequences and Series In this section we will see some

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Lecture 4 September 15

Lecture 4 September 15 IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric

More information

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

MA3H1 TOPICS IN NUMBER THEORY PART III

MA3H1 TOPICS IN NUMBER THEORY PART III MA3H1 TOPICS IN NUMBER THEORY PART III SAMIR SIKSEK 1. Congruences Modulo m In quadratic recirocity we studied congruences of the form x 2 a (mod ). We now turn our attention to situations where is relaced

More information

Problem Set 2 Solution

Problem Set 2 Solution Problem Set 2 Solution Aril 22nd, 29 by Yang. More Generalized Slutsky heorem [Simle and Abstract] su L n γ, β n Lγ, β = su L n γ, β n Lγ, β n + Lγ, β n Lγ, β su L n γ, β n Lγ, β n + su Lγ, β n Lγ, β su

More information

Lecture 10: Hypercontractivity

Lecture 10: Hypercontractivity CS 880: Advanced Comlexity Theory /15/008 Lecture 10: Hyercontractivity Instructor: Dieter van Melkebeek Scribe: Baris Aydinlioglu This is a technical lecture throughout which we rove the hyercontractivity

More information

Department of Mathematics

Department of Mathematics Deartment of Mathematics Ma 3/03 KC Border Introduction to Probability and Statistics Winter 209 Sulement : Series fun, or some sums Comuting the mean and variance of discrete distributions often involves

More information

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students

More information

EE/Stats 376A: Information theory Winter Lecture 5 Jan 24. Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B.

EE/Stats 376A: Information theory Winter Lecture 5 Jan 24. Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B. EE/Stats 376A: Information theory Winter 207 Lecture 5 Jan 24 Lecturer: David Tse Scribe: Michael X, Nima H, Geng Z, Anton J, Vivek B. 5. Outline Markov chains and stationary distributions Prefix codes

More information

Nonlinear equations. Norms for R n. Convergence orders for iterative methods

Nonlinear equations. Norms for R n. Convergence orders for iterative methods Nonlinear equations Norms for R n Assume that X is a vector space. A norm is a mapping X R with x such that for all x, y X, α R x = = x = αx = α x x + y x + y We define the following norms on the vector

More information

Unconstrained minimization of smooth functions

Unconstrained minimization of smooth functions Unconstrained minimization of smooth functions We want to solve min x R N f(x), where f is convex. In this section, we will assume that f is differentiable (so its gradient exists at every point), and

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales

Lecture 6. 2 Recurrence/transience, harmonic functions and martingales Lecture 6 Classification of states We have shown that all states of an irreducible countable state Markov chain must of the same tye. This gives rise to the following classification. Definition. [Classification

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

REVIEW OF DIFFERENTIAL CALCULUS

REVIEW OF DIFFERENTIAL CALCULUS REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be

More information

Sampling distribution of GLM regression coefficients

Sampling distribution of GLM regression coefficients Sampling distribution of GLM regression coefficients Patrick Breheny February 5 Patrick Breheny BST 760: Advanced Regression 1/20 Introduction So far, we ve discussed the basic properties of the score,

More information

DIFFERENTIAL GEOMETRY. LECTURES 9-10,

DIFFERENTIAL GEOMETRY. LECTURES 9-10, DIFFERENTIAL GEOMETRY. LECTURES 9-10, 23-26.06.08 Let us rovide some more details to the definintion of the de Rham differential. Let V, W be two vector bundles and assume we want to define an oerator

More information

Sharp gradient estimate and spectral rigidity for p-laplacian

Sharp gradient estimate and spectral rigidity for p-laplacian Shar gradient estimate and sectral rigidity for -Lalacian Chiung-Jue Anna Sung and Jiaing Wang To aear in ath. Research Letters. Abstract We derive a shar gradient estimate for ositive eigenfunctions of

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS

ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS PROCEEDINGS OF THE AMERICAN MATHEMATICAL SOCIETY Volume 00, Number 0, Pages 000 000 S 000-9939XX)0000-0 ON THE NORM OF AN IDEMPOTENT SCHUR MULTIPLIER ON THE SCHATTEN CLASS WILLIAM D. BANKS AND ASMA HARCHARRAS

More information

Chapter 7: Special Distributions

Chapter 7: Special Distributions This chater first resents some imortant distributions, and then develos the largesamle distribution theory which is crucial in estimation and statistical inference Discrete distributions The Bernoulli

More information

Theoretical Statistics. Lecture 17.

Theoretical Statistics. Lecture 17. Theoretical Statistics. Lecture 17. Peter Bartlett 1. Asymptotic normality of Z-estimators: classical conditions. 2. Asymptotic equicontinuity. 1 Recall: Delta method Theorem: Supposeφ : R k R m is differentiable

More information

THE 2D CASE OF THE BOURGAIN-DEMETER-GUTH ARGUMENT

THE 2D CASE OF THE BOURGAIN-DEMETER-GUTH ARGUMENT THE 2D CASE OF THE BOURGAIN-DEMETER-GUTH ARGUMENT ZANE LI Let e(z) := e 2πiz and for g : [0, ] C and J [0, ], define the extension oerator E J g(x) := g(t)e(tx + t 2 x 2 ) dt. J For a ositive weight ν

More information

Math 701: Secant Method

Math 701: Secant Method Math 701: Secant Method The secant method aroximates solutions to f(x = 0 using an iterative scheme similar to Newton s method in which the derivative has been relace by This results in the two-term recurrence

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Lecture 14: October 17

Lecture 14: October 17 1-725/36-725: Convex Optimization Fall 218 Lecture 14: October 17 Lecturer: Lecturer: Ryan Tibshirani Scribes: Pengsheng Guo, Xian Zhou Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer:

More information

Stochastic Convergence, Delta Method & Moment Estimators

Stochastic Convergence, Delta Method & Moment Estimators Stochastic Convergence, Delta Method & Moment Estimators Seminar on Asymptotic Statistics Daniel Hoffmann University of Kaiserslautern Department of Mathematics February 13, 2015 Daniel Hoffmann (TU KL)

More information

Maximum likelihood estimation

Maximum likelihood estimation Maximum likelihood estimation Guillaume Obozinski Ecole des Ponts - ParisTech Master MVA Maximum likelihood estimation 1/26 Outline 1 Statistical concepts 2 A short review of convex analysis and optimization

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

3 Properties of Dedekind domains

3 Properties of Dedekind domains 18.785 Number theory I Fall 2016 Lecture #3 09/15/2016 3 Proerties of Dedekind domains In the revious lecture we defined a Dedekind domain as a noetherian domain A that satisfies either of the following

More information

Theoretical Statistics. Lecture 25.

Theoretical Statistics. Lecture 25. Theoretical Statistics. Lecture 25. Peter Bartlett 1. Relative efficiency of tests [vdv14]: Rescaling rates. 2. Likelihood ratio tests [vdv15]. 1 Recall: Relative efficiency of tests Theorem: Suppose that

More information

Elementary theory of L p spaces

Elementary theory of L p spaces CHAPTER 3 Elementary theory of L saces 3.1 Convexity. Jensen, Hölder, Minkowski inequality. We begin with two definitions. A set A R d is said to be convex if, for any x 0, x 1 2 A x = x 0 + (x 1 x 0 )

More information

The Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001

The Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001 The Hasse Minkowski Theorem Lee Dicker University of Minnesota, REU Summer 2001 The Hasse-Minkowski Theorem rovides a characterization of the rational quadratic forms. What follows is a roof of the Hasse-Minkowski

More information

Mathematical Economics: Lecture 9

Mathematical Economics: Lecture 9 Mathematical Economics: Lecture 9 Yu Ren WISE, Xiamen University October 17, 2011 Outline 1 Chapter 14: Calculus of Several Variables New Section Chapter 14: Calculus of Several Variables Partial Derivatives

More information

Theoretical Statistics. Lecture 23.

Theoretical Statistics. Lecture 23. Theoretical Statistics. Lecture 23. Peter Bartlett 1. Recall: QMD and local asymptotic normality. [vdv7] 2. Convergence of experiments, maximum likelihood. 3. Relative efficiency of tests. [vdv14] 1 Local

More information

An Overview of Witt Vectors

An Overview of Witt Vectors An Overview of Witt Vectors Daniel Finkel December 7, 2007 Abstract This aer offers a brief overview of the basics of Witt vectors. As an alication, we summarize work of Bartolo and Falcone to rove that

More information

NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS. The Goldstein-Levitin-Polyak algorithm

NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS. The Goldstein-Levitin-Polyak algorithm - (23) NLP - NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS The Goldstein-Levitin-Polya algorithm We consider an algorithm for solving the otimization roblem under convex constraints. Although the convexity

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

Econometrics I, Estimation

Econometrics I, Estimation Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the

More information

Lecture 1: January 12

Lecture 1: January 12 10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 1: January 12 Scribes: Seo-Jin Bang, Prabhat KC, Josue Orellana 1.1 Review We begin by going through some examples and key

More information

MATH 250: THE DISTRIBUTION OF PRIMES. ζ(s) = n s,

MATH 250: THE DISTRIBUTION OF PRIMES. ζ(s) = n s, MATH 50: THE DISTRIBUTION OF PRIMES ROBERT J. LEMKE OLIVER For s R, define the function ζs) by. Euler s work on rimes ζs) = which converges if s > and diverges if s. In fact, though we will not exloit

More information

Convex Analysis and Economic Theory Winter 2018

Convex Analysis and Economic Theory Winter 2018 Division of the Humanities and Social Sciences Ec 181 KC Border Conve Analysis and Economic Theory Winter 2018 Toic 16: Fenchel conjugates 16.1 Conjugate functions Recall from Proosition 14.1.1 that is

More information

Math 205A - Fall 2015 Homework #4 Solutions

Math 205A - Fall 2015 Homework #4 Solutions Math 25A - Fall 25 Homework #4 Solutions Problem : Let f L and µ(t) = m{x : f(x) > t} the distribution function of f. Show that: (i) µ(t) t f L (). (ii) f L () = t µ(t)dt. (iii) For any increasing differentiable

More information

Theoretical Statistics. Lecture 22.

Theoretical Statistics. Lecture 22. Theoretical Statistics. Lecture 22. Peter Bartlett 1. Recall: Asymptotic testing. 2. Quadratic mean differentiability. 3. Local asymptotic normality. [vdv7] 1 Recall: Asymptotic testing Consider the asymptotics

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The

More information

Lecture 1: Introduction

Lecture 1: Introduction Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One

More information

LORENZO BRANDOLESE AND MARIA E. SCHONBEK

LORENZO BRANDOLESE AND MARIA E. SCHONBEK LARGE TIME DECAY AND GROWTH FOR SOLUTIONS OF A VISCOUS BOUSSINESQ SYSTEM LORENZO BRANDOLESE AND MARIA E. SCHONBEK Abstract. In this aer we analyze the decay and the growth for large time of weak and strong

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

1. Bounded linear maps. A linear map T : E F of real Banach

1. Bounded linear maps. A linear map T : E F of real Banach DIFFERENTIABLE MAPS 1. Bounded linear maps. A linear map T : E F of real Banach spaces E, F is bounded if M > 0 so that for all v E: T v M v. If v r T v C for some positive constants r, C, then T is bounded:

More information

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Chapter 2. Discrete Distributions

Chapter 2. Discrete Distributions Chapter. Discrete Distributions Objectives ˆ Basic Concepts & Epectations ˆ Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric Distributions ˆ Introduction to the Maimum Likelihood Estimation

More information

PETER J. GRABNER AND ARNOLD KNOPFMACHER

PETER J. GRABNER AND ARNOLD KNOPFMACHER ARITHMETIC AND METRIC PROPERTIES OF -ADIC ENGEL SERIES EXPANSIONS PETER J. GRABNER AND ARNOLD KNOPFMACHER Abstract. We derive a characterization of rational numbers in terms of their unique -adic Engel

More information

10-725/36-725: Convex Optimization Prerequisite Topics

10-725/36-725: Convex Optimization Prerequisite Topics 10-725/36-725: Convex Optimization Prerequisite Topics February 3, 2015 This is meant to be a brief, informal refresher of some topics that will form building blocks in this course. The content of the

More information

Almost 4000 years ago, Babylonians had discovered the following approximation to. x 2 dy 2 =1, (5.0.2)

Almost 4000 years ago, Babylonians had discovered the following approximation to. x 2 dy 2 =1, (5.0.2) Chater 5 Pell s Equation One of the earliest issues graled with in number theory is the fact that geometric quantities are often not rational. For instance, if we take a right triangle with two side lengths

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA. Asymptotic Inference in Exponential Families Let X j be a sequence of independent,

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 A Characterization of Convex Functions. 2.1 First-order Taylor approximation. AM 221: Advanced Optimization Spring 2016 AM 221: Advanced Optimization Spring 2016 Prof. Yaron Singer Lecture 8 February 22nd 1 Overview In the previous lecture we saw characterizations of optimality in linear optimization, and we reviewed the

More information

Sobolev Spaces with Weights in Domains and Boundary Value Problems for Degenerate Elliptic Equations

Sobolev Spaces with Weights in Domains and Boundary Value Problems for Degenerate Elliptic Equations Sobolev Saces with Weights in Domains and Boundary Value Problems for Degenerate Ellitic Equations S. V. Lototsky Deartment of Mathematics, M.I.T., Room 2-267, 77 Massachusetts Avenue, Cambridge, MA 02139-4307,

More information

Introduction to gradient descent

Introduction to gradient descent 6-1: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction to gradient descent Derivation and intuitions Hessian 6-2: Introduction to gradient descent Prof. J.C. Kao, UCLA Introduction Our

More information

Extension of Minimax to Infinite Matrices

Extension of Minimax to Infinite Matrices Extension of Minimax to Infinite Matrices Chris Calabro June 21, 2004 Abstract Von Neumann s minimax theorem is tyically alied to a finite ayoff matrix A R m n. Here we show that (i) if m, n are both inite,

More information

Lecture 17: Numerical Optimization October 2014

Lecture 17: Numerical Optimization October 2014 Lecture 17: Numerical Optimization 36-350 22 October 2014 Agenda Basics of optimization Gradient descent Newton s method Curve-fitting R: optim, nls Reading: Recipes 13.1 and 13.2 in The R Cookbook Optional

More information

Advanced Calculus I. Part A, for both Section 200 and Section 501

Advanced Calculus I. Part A, for both Section 200 and Section 501 Sring 2 Instructions Please write your solutions on your own aer. These roblems should be treated as essay questions. A roblem that says give an examle requires a suorting exlanation. In all roblems, you

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n

Recall that in order to prove Theorem 8.8, we argued that under certain regularity conditions, the following facts are true under H 0 : 1 n Chapter 9 Hypothesis Testing 9.1 Wald, Rao, and Likelihood Ratio Tests Suppose we wish to test H 0 : θ = θ 0 against H 1 : θ θ 0. The likelihood-based results of Chapter 8 give rise to several possible

More information

f(x) f(z) c x z > 0 1

f(x) f(z) c x z > 0 1 INVERSE AND IMPLICIT FUNCTION THEOREMS I use df x for the linear transformation that is the differential of f at x.. INVERSE FUNCTION THEOREM Definition. Suppose S R n is open, a S, and f : S R n is a

More information

We denote the derivative at x by DF (x) = L. With respect to the standard bases of R n and R m, DF (x) is simply the matrix of partial derivatives,

We denote the derivative at x by DF (x) = L. With respect to the standard bases of R n and R m, DF (x) is simply the matrix of partial derivatives, The derivative Let O be an open subset of R n, and F : O R m a continuous function We say F is differentiable at a point x O, with derivative L, if L : R n R m is a linear transformation such that, for

More information

Sums of independent random variables

Sums of independent random variables 3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

6 Binary Quadratic forms

6 Binary Quadratic forms 6 Binary Quadratic forms 6.1 Fermat-Euler Theorem A binary quadratic form is an exression of the form f(x,y) = ax 2 +bxy +cy 2 where a,b,c Z. Reresentation of an integer by a binary quadratic form has

More information

f(r) = a d n) d + + a0 = 0

f(r) = a d n) d + + a0 = 0 Math 400-00/Foundations of Algebra/Fall 07 Polynomials at the Foundations: Roots Next, we turn to the notion of a root of a olynomial in Q[x]. Definition 8.. r Q is a rational root of fx) Q[x] if fr) 0.

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

k- price auctions and Combination-auctions

k- price auctions and Combination-auctions k- rice auctions and Combination-auctions Martin Mihelich Yan Shu Walnut Algorithms March 6, 219 arxiv:181.3494v3 [q-fin.mf] 5 Mar 219 Abstract We rovide for the first time an exact analytical solution

More information

Applications to stochastic PDE

Applications to stochastic PDE 15 Alications to stochastic PE In this final lecture we resent some alications of the theory develoed in this course to stochastic artial differential equations. We concentrate on two secific examles:

More information

Difference of Convex Functions Programming for Reinforcement Learning (Supplementary File)

Difference of Convex Functions Programming for Reinforcement Learning (Supplementary File) Difference of Convex Functions Programming for Reinforcement Learning Sulementary File Bilal Piot 1,, Matthieu Geist 1, Olivier Pietquin,3 1 MaLIS research grou SUPELEC - UMI 958 GeorgiaTech-CRS, France

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

ECE 275A Homework 6 Solutions

ECE 275A Homework 6 Solutions ECE 275A Homework 6 Solutions. The notation used in the solutions for the concentration (hyper) ellipsoid problems is defined in the lecture supplement on concentration ellipsoids. Note that θ T Σ θ =

More information

CHAPTER 2: SMOOTH MAPS. 1. Introduction In this chapter we introduce smooth maps between manifolds, and some important

CHAPTER 2: SMOOTH MAPS. 1. Introduction In this chapter we introduce smooth maps between manifolds, and some important CHAPTER 2: SMOOTH MAPS DAVID GLICKENSTEIN 1. Introduction In this chater we introduce smooth mas between manifolds, and some imortant concets. De nition 1. A function f : M! R k is a smooth function if

More information

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law

On Isoperimetric Functions of Probability Measures Having Log-Concave Densities with Respect to the Standard Normal Law On Isoerimetric Functions of Probability Measures Having Log-Concave Densities with Resect to the Standard Normal Law Sergey G. Bobkov Abstract Isoerimetric inequalities are discussed for one-dimensional

More information

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly. C PROPERTIES OF MATRICES 697 to whether the permutation i 1 i 2 i N is even or odd, respectively Note that I =1 Thus, for a 2 2 matrix, the determinant takes the form A = a 11 a 12 = a a 21 a 11 a 22 a

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Lecture 3 - Linear and Logistic Regression

Lecture 3 - Linear and Logistic Regression 3 - Linear and Logistic Regression-1 Machine Learning Course Lecture 3 - Linear and Logistic Regression Lecturer: Haim Permuter Scribe: Ziv Aharoni Throughout this lecture we talk about how to use regression

More information

Location of solutions for quasi-linear elliptic equations with general gradient dependence

Location of solutions for quasi-linear elliptic equations with general gradient dependence Electronic Journal of Qualitative Theory of Differential Equations 217, No. 87, 1 1; htts://doi.org/1.14232/ejqtde.217.1.87 www.math.u-szeged.hu/ejqtde/ Location of solutions for quasi-linear ellitic equations

More information

Inference for Empirical Wasserstein Distances on Finite Spaces: Supplementary Material

Inference for Empirical Wasserstein Distances on Finite Spaces: Supplementary Material Inference for Emirical Wasserstein Distances on Finite Saces: Sulementary Material Max Sommerfeld Axel Munk Keywords: otimal transort, Wasserstein distance, central limit theorem, directional Hadamard

More information

MA102: Multivariable Calculus

MA102: Multivariable Calculus MA102: Multivariable Calculus Rupam Barman and Shreemayee Bora Department of Mathematics IIT Guwahati Differentiability of f : U R n R m Definition: Let U R n be open. Then f : U R n R m is differentiable

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information