High-dimensional data analysis, fall
|
|
- Prosper Gardner
- 6 years ago
- Views:
Transcription
1 High-dimensional data analysis, fall Yeast understanding basic life functions 11,904 p-values Blomberg et al. 2003, 2010 Arabidopsis Thaliana association mapping 3,745 p-values Zhao et al fmri brain scans function of brain language network appr. 3 mill. p-values Taylor et al. 2006
2 Slides for B&vdG , 10.7: Stable solutions Exercises: 10.1
3 B&vdG 10.2: Subsampling, stablility and selection Sometimes the aim is prediction, sometimes variable selection (and sometimes both) both are important, but selection is harder! Setting: Y = Xβ + ε + think of Lasso (but ideas more general) Recall from B&vdG 2 that the regularisation path is the set of p functions of λ defined as follows {β j λ ; λ Λ, j = 1, p}, where Λ typically is some interval [λ min, λ max ], and that S λ = j; β j λ 0. Now write S λ = S λ I to indicate dependence on the sample, which above is I = 1, n.
4 Let I be a random subset of {1, n} of size m = n 2 selcted by drawing without replacement, and for a subset K (typically K = {j}) of 1,, p let the subsampling probability be Π K λ = P [K S λ I } = #size m subsets I with K S λ I } n m Here Π K λ may be estimated by drawing randomly without replacement a large number B of subsets I 1, I B subsets, and computing Π K λ = 1 B B b=1 1{K S λ I b }
5 B&vdG argue that the stability path {Π {j} λ ; λ Λ, j = 1, p}, is better for variable selection than the regularization path. Typically they use λ min = 0, and take λ max as the smallest value of λ for which all β j are estimated with zero (this value can be seen to be max 1 j p 2 X jy /n.)
6 B&vdG : Vitamin B2 production using bacillus subtilus Numerical experiment using n = 115 values of the logarithm of vitamin B2 production p = 4088 gene expression values 6 genes where selected at random from the 200 genes with the highest marginal empirical correlation with the log vitamin B2 production response varaible. The other genes where subjected to a random permutation of rows, so that their possible connenctions with the response variable disappeared.
7 Regularization path Stability path x-axis: λ/λ max (in reverse ordering) Y-axis: left: the β j (λ), right: the Π j (λ) Red lines are the non-permuted genes
8 B&vdG : Motif regression Heat shock experiment for finding transcription factor binding sites in DNA sequeces. Subset containing n = 1200 gene expression values p = 666 motif scores Lasso estimates β j = β j (λ CV ) with λ CV chosen by 10-fold cross-validation, and the corresponding subsampling probabilities Π j = Π j (λ CV ) for the 9 most promising motifs: Should one use ordering from β j or from Π j?
9 Numerical experiment: choose 5 covariates at random, set correponding β-s to values which lead to very low signalnoise ratio (=0.1), set all other β-s to zero, simulate with i.i.d. N 0, 1 error variables ε t. Gives following result: x-axis: Π j (λ CV ), y-axis: β j (λ CV ) Red crosses are the active genes
10 B&vdG 10.3: Stability selection Traditionally: select one element, say S(λ 0 ) from the set of models {S λ ; λ Λ} Alternatively: select a value Π trh and select the model S stable = j; max λ Λ Π j λ > Π trh (and then perhaps reestimate the β-s in this set with ols). Often Λ = {λ CV }. Type 1 error: select a covariate which isn t active, i.e. a j not in S 0 Type 2 error: not select a covariate which is active, i.e. a j S 0 Want to make probability of both errors small.
11 S Λ λ Λ S λ, q Λ = E S Λ V = S 0 c S stable = #type 1 errors Thm 10.1 Assume {1(j S Λ ; j S 0 c )} has excangable distribution and that Then, for Π trh > 1/2, E(S 0 S Λ ) E(S 0 c S Λ ) S 0 S 0 c. E V E(V): = PFER = Per Family Error 1 2Π trh 1 q Λ 2 p. E(V)/p = PCER = Per Comparison error rate
12 Type 1 error control: for a given value ν choose Π trh such that E V ν. If ν is choosen as some suitable small number, say ν = α = 0.05 one then gets Type 1 (or, equivalently, PFER) error control, P V > 0 E V However, sometimes bigger ν-values are also of interest, e.g. if one wants to control PCER. By Thm 10.1, E V ν holds if the threshold is chosen as 1 2Π trh 1 q λ 2 p = ν Π trh = (1 + q 2 λ pν )/2 (only useful if q λ 2 < pν so that Π trh < 1) Homework: Problem 10.1
13 But here q Λ = ES Λ I isn t known. One way to handle this is to beforehand decide on a value q and then use a procedure which at most selects q covariates. Then of course ES Λ I q. Possible ways of doing this include use standard Lasso but only select the q covariates with the largest absolute values of the regression coefficients; select the q variables which enter first in the regularization path. This instead leads to the problem of selecting q. An alternative is to turn things around and decide on a value of Π trh, say Π trh = 0.9 and then use q = νp(2π trh 1).
14 B&vdG 10.4: A numerical experiment 2.5 Red triangles: stability selection, controlled to E V 2.5 Black dots: crossvalidated Lasso Each pair from a different simulation set-up
15 B&vdG 10.7: Proofs Read!
16
Statistical Learning with the Lasso, spring The Lasso
Statistical Learning with the Lasso, spring 2017 1 Yeast: understanding basic life functions p=11,904 gene values n number of experiments ~ 10 Blomberg et al. 2003, 2010 The Lasso fmri brain scans function
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Model complexity scores (v3) Ramesh Johari ramesh.johari@stanford.edu Fall 2015 1 / 34 Estimating prediction error 2 / 34 Estimating prediction error We saw how we can estimate
More informationMS&E 226. In-Class Midterm Examination Solutions Small Data October 20, 2015
MS&E 226 In-Class Midterm Examination Solutions Small Data October 20, 2015 PROBLEM 1. Alice uses ordinary least squares to fit a linear regression model on a dataset containing outcome data Y and covariates
More informationSTAT 200C: High-dimensional Statistics
STAT 200C: High-dimensional Statistics Arash A. Amini May 30, 2018 1 / 57 Table of Contents 1 Sparse linear models Basis Pursuit and restricted null space property Sufficient conditions for RNS 2 / 57
More informationVariable Selection in Structured High-dimensional Covariate Spaces
Variable Selection in Structured High-dimensional Covariate Spaces Fan Li 1 Nancy Zhang 2 1 Department of Health Care Policy Harvard University 2 Department of Statistics Stanford University May 14 2007
More informationModel-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate
Model-Free Knockoffs: High-Dimensional Variable Selection that Controls the False Discovery Rate Lucas Janson, Stanford Department of Statistics WADAPT Workshop, NIPS, December 2016 Collaborators: Emmanuel
More informationarxiv: v2 [stat.me] 16 May 2009
Stability Selection Nicolai Meinshausen and Peter Bühlmann University of Oxford and ETH Zürich May 16, 9 ariv:0809.2932v2 [stat.me] 16 May 9 Abstract Estimation of structure, such as in variable selection,
More informationMachine Learning Linear Classification. Prof. Matteo Matteucci
Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)
More informationAnnouncements August 31
Announcements August 31 Homeworks 1.1 and 1.2 are due Friday. The first quiz is on Friday, during recitation. Quizzes mostly test your understanding of the homework. There will generally be a quiz every
More informationCOMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection
COMP 551 Applied Machine Learning Lecture 13: Dimension reduction and feature selection Instructor: Herke van Hoof (herke.vanhoof@cs.mcgill.ca) Based on slides by:, Jackie Chi Kit Cheung Class web page:
More informationExam: high-dimensional data analysis January 20, 2014
Exam: high-dimensional data analysis January 20, 204 Instructions: - Write clearly. Scribbles will not be deciphered. - Answer each main question not the subquestions on a separate piece of paper. - Finish
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationSUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING
Submitted to the Annals of Applied Statistics SUPPLEMENTARY APPENDICES FOR WAVELET-DOMAIN REGRESSION AND PREDICTIVE INFERENCE IN PSYCHIATRIC NEUROIMAGING By Philip T. Reiss, Lan Huo, Yihong Zhao, Clare
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University WHOA-PSI Workshop, St Louis, 2017 Quotes from Day 1 and Day 2 Good model or pure model? Occam s razor We really
More informationPrediction of double gene knockout measurements
Prediction of double gene knockout measurements Sofia Kyriazopoulou-Panagiotopoulou sofiakp@stanford.edu December 12, 2008 Abstract One way to get an insight into the potential interaction between a pair
More informationGeert Geeven. April 14, 2010
iction of Gene Regulatory Interactions NDNS+ Workshop April 14, 2010 Today s talk - Outline Outline Biological Background Construction of Predictors The main aim of my project is to better understand the
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationKnockoffs as Post-Selection Inference
Knockoffs as Post-Selection Inference Lucas Janson Harvard University Department of Statistics blank line blank line WHOA-PSI, August 12, 2017 Controlled Variable Selection Conditional modeling setup:
More informationDeciphering Math Notation. Billy Skorupski Associate Professor, School of Education
Deciphering Math Notation Billy Skorupski Associate Professor, School of Education Agenda General overview of data, variables Greek and Roman characters in math and statistics Parameters vs. Statistics
More informationFinding Limits Graphically and Numerically
Finding Limits Graphically and Numerically 1. Welcome to finding limits graphically and numerically. My name is Tuesday Johnson and I m a lecturer at the University of Texas El Paso. 2. With each lecture
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More information3 Comparison with Other Dummy Variable Methods
Stats 300C: Theory of Statistics Spring 2018 Lecture 11 April 25, 2018 Prof. Emmanuel Candès Scribe: Emmanuel Candès, Michael Celentano, Zijun Gao, Shuangning Li 1 Outline Agenda: Knockoffs 1. Introduction
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationSTA 302f16 Assignment Five 1
STA 30f16 Assignment Five 1 Except for Problem??, these problems are preparation for the quiz in tutorial on Thursday October 0th, and are not to be handed in As usual, at times you may be asked to prove
More informationIEOR 265 Lecture 3 Sparse Linear Regression
IOR 65 Lecture 3 Sparse Linear Regression 1 M Bound Recall from last lecture that the reason we are interested in complexity measures of sets is because of the following result, which is known as the M
More informationStatistical Inference
Statistical Inference J. Daunizeau Institute of Empirical Research in Economics, Zurich, Switzerland Brain and Spine Institute, Paris, France SPM Course Edinburgh, April 2011 Image time-series Spatial
More informationReview of One-way Tables and SAS
Stat 504, Lecture 7 1 Review of One-way Tables and SAS In-class exercises: Ex1, Ex2, and Ex3 from http://v8doc.sas.com/sashtml/proc/z0146708.htm To calculate p-value for a X 2 or G 2 in SAS: http://v8doc.sas.com/sashtml/lgref/z0245929.htmz0845409
More information7.1 Sampling Error The Need for Sampling Distributions
7.1 Sampling Error The Need for Sampling Distributions Tom Lewis Fall Term 2009 Tom Lewis () 7.1 Sampling Error The Need for Sampling Distributions Fall Term 2009 1 / 5 Outline 1 Tom Lewis () 7.1 Sampling
More informationIntroduction to Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras
Introduction to Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras Module - 03 Simplex Algorithm Lecture 15 Infeasibility In this class, we
More informationBiostatistics Advanced Methods in Biostatistics IV
Biostatistics 140.754 Advanced Methods in Biostatistics IV Jeffrey Leek Assistant Professor Department of Biostatistics jleek@jhsph.edu Lecture 12 1 / 36 Tip + Paper Tip: As a statistician the results
More informationData analysis strategies for high dimensional social science data M3 Conference May 2013
Data analysis strategies for high dimensional social science data M3 Conference May 2013 W. Holmes Finch, Maria Hernández Finch, David E. McIntosh, & Lauren E. Moss Ball State University High dimensional
More informationk P (X = k)
Math 224 Spring 208 Homework Drew Armstrong. Suppose that a fair coin is flipped 6 times in sequence and let X be the number of heads that show up. Draw Pascal s triangle down to the sixth row (recall
More informationSemi-Penalized Inference with Direct FDR Control
Jian Huang University of Iowa April 4, 2016 The problem Consider the linear regression model y = p x jβ j + ε, (1) j=1 where y IR n, x j IR n, ε IR n, and β j is the jth regression coefficient, Here p
More informationUNIVERSITY OF TORONTO Faculty of Arts and Science
UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator
More informationIntroduction and Single Predictor Regression. Correlation
Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation
More informationStat 602 Exam 1 Spring 2017 (corrected version)
Stat 602 Exam Spring 207 (corrected version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This is a very long Exam. You surely won't be able to
More informationLinear Classification: Linear Programming
Linear Classification: Linear Programming Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong 1 / 21 Y Tao Linear Classification: Linear Programming Recall the definition
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationGoing from graphic solutions to algebraic
Going from graphic solutions to algebraic 2 variables: Graph constraints Identify corner points of feasible area Find which corner point has best objective value More variables: Think about constraints
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationLECTURE 15: SIMPLE LINEAR REGRESSION I
David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different
More informationVariable selection with error control: Another look at Stability Selection
Variable selection with error control: Another look at Stability Selection Richard J. Samworth and Rajen D. Shah University of Cambridge RSS Journal Webinar 25 October 2017 Samworth & Shah (Cambridge)
More informationContents. Data. Introduction & recap Variance components Hierarchical model RFX and summary statistics Variance/covariance matrix «Take home» message
SPM course, CRC, Liege,, Septembre 2009 Contents Group analysis (RF) Variance components Hierarchical model RF and summary statistics Variance/covariance matrix «Tae home» message C. Phillips, Centre de
More informationSTA 4322 Exam I Name: Introduction to Statistics Theory
STA 4322 Exam I Name: Introduction to Statistics Theory Fall 2013 UF-ID: Instructions: There are 100 total points. You must show your work to receive credit. Read each part of each question carefully.
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationThe Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA
The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA Presented by Dongjun Chung March 12, 2010 Introduction Definition Oracle Properties Computations Relationship: Nonnegative Garrote Extensions:
More informationGLS and FGLS. Econ 671. Purdue University. Justin L. Tobias (Purdue) GLS and FGLS 1 / 22
GLS and FGLS Econ 671 Purdue University Justin L. Tobias (Purdue) GLS and FGLS 1 / 22 In this lecture we continue to discuss properties associated with the GLS estimator. In addition we discuss the practical
More informationCSC 411: Lecture 04: Logistic Regression
CSC 411: Lecture 04: Logistic Regression Raquel Urtasun & Rich Zemel University of Toronto Sep 23, 2015 Urtasun & Zemel (UofT) CSC 411: 04-Prob Classif Sep 23, 2015 1 / 16 Today Key Concepts: Logistic
More informationOrdinary Least Squares Regression
Ordinary Least Squares Regression Goals for this unit More on notation and terminology OLS scalar versus matrix derivation Some Preliminaries In this class we will be learning to analyze Cross Section
More informationWhat is a semigroup? What is a group? What is the difference between a semigroup and a group?
The second exam will be on Thursday, July 5, 2012. The syllabus will be Sections IV.5 (RSA Encryption), III.1, III.2, III.3, III.4 and III.8, III.9, plus the handout on Burnside coloring arguments. Of
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statistical Methods: Beyond Linear Regression John R. Stevens Utah State University Notes 3. Statistical Methods II Mathematics Educators Worshop 28 March 2009 1 http://www.stat.usu.edu/~jrstevens/pcmi
More informationSpatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University
Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under
More informationSTA 414/2104, Spring 2014, Practice Problem Set #1
STA 44/4, Spring 4, Practice Problem Set # Note: these problems are not for credit, and not to be handed in Question : Consider a classification problem in which there are two real-valued inputs, and,
More informationGenerating Function Notes , Fall 2005, Prof. Peter Shor
Counting Change Generating Function Notes 80, Fall 00, Prof Peter Shor In this lecture, I m going to talk about generating functions We ve already seen an example of generating functions Recall when we
More informationOutline Introduction OLS Design of experiments Regression. Metamodeling. ME598/494 Lecture. Max Yi Ren
1 / 34 Metamodeling ME598/494 Lecture Max Yi Ren Department of Mechanical Engineering, Arizona State University March 1, 2015 2 / 34 1. preliminaries 1.1 motivation 1.2 ordinary least square 1.3 information
More informationReview. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda
Review DS GA 1002 Statistical and Mathematical Models http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall16 Carlos Fernandez-Granda Probability and statistics Probability: Framework for dealing with
More informationLesson 8: Why Stay with Whole Numbers?
Student Outcomes Students use function notation, evaluate functions for inputs in their domains, and interpret statements that use function notation in terms of a context. Students create functions that
More informationProbability and Samples. Sampling. Point Estimates
Probability and Samples Sampling We want the results from our sample to be true for the population and not just the sample But our sample may or may not be representative of the population Sampling error
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationHomework 5. Convex Optimization /36-725
Homework 5 Convex Optimization 10-725/36-725 Due Tuesday November 22 at 5:30pm submitted to Christoph Dann in Gates 8013 (Remember to a submit separate writeup for each problem, with your name at the top)
More informationReconstruction from Anisotropic Random Measurements
Reconstruction from Anisotropic Random Measurements Mark Rudelson and Shuheng Zhou The University of Michigan, Ann Arbor Coding, Complexity, and Sparsity Workshop, 013 Ann Arbor, Michigan August 7, 013
More informationLinear Classification: Linear Programming
Yufei Tao Department of Computer Science and Engineering Chinese University of Hong Kong Recall the definition of linear classification. Definition 1. Let R d denote the d-dimensional space where the domain
More informationP-Values for High-Dimensional Regression
P-Values for High-Dimensional Regression Nicolai einshausen Lukas eier Peter Bühlmann November 13, 2008 Abstract Assigning significance in high-dimensional regression is challenging. ost computationally
More informationMultilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2
Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do
More informationST 740: Linear Models and Multivariate Normal Inference
ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More information10708 Graphical Models: Homework 2
10708 Graphical Models: Homework 2 Due Monday, March 18, beginning of class Feburary 27, 2013 Instructions: There are five questions (one for extra credit) on this assignment. There is a problem involves
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationSimilar Shapes and Gnomons
Similar Shapes and Gnomons May 12, 2013 1. Similar Shapes For now, we will say two shapes are similar if one shape is a magnified version of another. 1. In the picture below, the square on the left is
More informationLecture 4: Heteroskedasticity
Lecture 4: Heteroskedasticity Econometric Methods Warsaw School of Economics (4) Heteroskedasticity 1 / 24 Outline 1 What is heteroskedasticity? 2 Testing for heteroskedasticity White Goldfeld-Quandt Breusch-Pagan
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationarxiv: v1 [stat.me] 30 Dec 2017
arxiv:1801.00105v1 [stat.me] 30 Dec 2017 An ISIS screening approach involving threshold/partition for variable selection in linear regression 1. Introduction Yu-Hsiang Cheng e-mail: 96354501@nccu.edu.tw
More informationEconomics 672 Fall 2017 Tauchen. Jump Regression
Economics 672 Fall 2017 Tauchen 1 Main Model In the jump regression setting we have Jump Regression X = ( Z Y where Z is the log of the market index and Y is the log of an asset price. The dynamics are
More informationDynamic Programming: Matrix chain multiplication (CLRS 15.2)
Dynamic Programming: Matrix chain multiplication (CLRS.) The problem Given a sequence of matrices A, A, A,..., A n, find the best way (using the minimal number of multiplications) to compute their product.
More informationIntroduction to Econometrics Midterm Examination Fall 2005 Answer Key
Introduction to Econometrics Midterm Examination Fall 2005 Answer Key Please answer all of the questions and show your work Clearly indicate your final answer to each question If you think a question is
More informationAIM HIGH SCHOOL. Curriculum Map W. 12 Mile Road Farmington Hills, MI (248)
AIM HIGH SCHOOL Curriculum Map 2923 W. 12 Mile Road Farmington Hills, MI 48334 (248) 702-6922 www.aimhighschool.com COURSE TITLE: Statistics DESCRIPTION OF COURSE: PREREQUISITES: Algebra 2 Students will
More informationProblems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.
Math 224 Fall 2017 Homework 1 Drew Armstrong Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman. Section 1.1, Exercises 4,5,6,7,9,12. Solutions to Book Problems.
More informationPOL 681 Lecture Notes: Statistical Interactions
POL 681 Lecture Notes: Statistical Interactions 1 Preliminaries To this point, the linear models we have considered have all been interpreted in terms of additive relationships. That is, the relationship
More informationSection 6.2, 6.3 Orthogonal Sets, Orthogonal Projections
Section 6. 6. Orthogonal Sets Orthogonal Projections Main Ideas in these sections: Orthogonal set = A set of mutually orthogonal vectors. OG LI. Orthogonal Projection of y onto u or onto an OG set {u u
More informationThe MNet Estimator. Patrick Breheny. Department of Biostatistics Department of Statistics University of Kentucky. August 2, 2010
Department of Biostatistics Department of Statistics University of Kentucky August 2, 2010 Joint work with Jian Huang, Shuangge Ma, and Cun-Hui Zhang Penalized regression methods Penalized methods have
More informationHomework #3 RELEASE DATE: 10/28/2013 DUE DATE: extended to 11/18/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM.
Homework #3 RELEASE DATE: 10/28/2013 DUE DATE: extended to 11/18/2013, BEFORE NOON QUESTIONS ABOUT HOMEWORK MATERIALS ARE WELCOMED ON THE FORUM. Unless granted by the instructor in advance, you must turn
More informationEconometrics in a nutshell: Variation and Identification Linear Regression Model in STATA. Research Methods. Carlos Noton.
1/17 Research Methods Carlos Noton Term 2-2012 Outline 2/17 1 Econometrics in a nutshell: Variation and Identification 2 Main Assumptions 3/17 Dependent variable or outcome Y is the result of two forces:
More informationBiol478/ August
Biol478/595 29 August # Day Inst. Topic Hwk Reading August 1 M 25 MG Introduction 2 W 27 MG Sequences and Evolution Handouts 3 F 29 MG Sequences and Evolution September M 1 Labor Day 4 W 3 MG Database
More informationGradient Boosting (Continued)
Gradient Boosting (Continued) David Rosenberg New York University April 4, 2016 David Rosenberg (New York University) DS-GA 1003 April 4, 2016 1 / 31 Boosting Fits an Additive Model Boosting Fits an Additive
More informationIntroduction 1. STA442/2101 Fall See last slide for copyright information. 1 / 33
Introduction 1 STA442/2101 Fall 2016 1 See last slide for copyright information. 1 / 33 Background Reading Optional Chapter 1 of Linear models with R Chapter 1 of Davison s Statistical models: Data, and
More informationSTAT 501 Assignment 1 Name Spring Written Assignment: Due Monday, January 22, in class. Please write your answers on this assignment
STAT 5 Assignment Name Spring Reading Assignment: Johnson and Wichern, Chapter, Sections.5 and.6, Chapter, and Chapter. Review matrix operations in Chapter and Supplement A. Examine the matrix properties
More informationCSC236 Week 11. Larry Zhang
CSC236 Week 11 Larry Zhang 1 Announcements Next week s lecture: Final exam review This week s tutorial: Exercises with DFAs PS9 will be out later this week s. 2 Recap Last week we learned about Deterministic
More informationMachine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression
Machine Learning and Computational Statistics, Spring 2017 Homework 2: Lasso Regression Due: Monday, February 13, 2017, at 10pm (Submit via Gradescope) Instructions: Your answers to the questions below,
More informationEconometrics I G (Part I) Fall 2004
Econometrics I G31.2100 (Part I) Fall 2004 Instructor: Time: Professor Christopher Flinn 269 Mercer Street, Room 302 Phone: 998 8925 E-mail: christopher.flinn@nyu.edu Homepage: http://www.econ.nyu.edu/user/flinnc
More informationChapter 11. Regression with a Binary Dependent Variable
Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score
More informationCorrelate. A method for the integrative analysis of two genomic data sets
Correlate A method for the integrative analysis of two genomic data sets Sam Gross, Balasubramanian Narasimhan, Robert Tibshirani, and Daniela Witten February 19, 2010 Introduction Sparse Canonical Correlation
More informationThe risk of machine learning
/ 33 The risk of machine learning Alberto Abadie Maximilian Kasy July 27, 27 2 / 33 Two key features of machine learning procedures Regularization / shrinkage: Improve prediction or estimation performance
More informationQuantile based Permutation Thresholds for QTL Hotspots. Brian S Yandell and Elias Chaibub Neto 17 March 2012
Quantile based Permutation Thresholds for QTL Hotspots Brian S Yandell and Elias Chaibub Neto 17 March 2012 2012 Yandell 1 Fisher on inference We may at once admit that any inference from the particular
More informationSTAT 461/561- Assignments, Year 2015
STAT 461/561- Assignments, Year 2015 This is the second set of assignment problems. When you hand in any problem, include the problem itself and its number. pdf are welcome. If so, use large fonts and
More informationFused elastic net EPoC
Fused elastic net EPoC Tobias Abenius November 28, 2013 Data Set and Method Use mrna and copy number aberration (CNA) to build a network using an extension of our EPoC method (Jörnsten et al., 2011; Abenius
More informationStat Lecture 20. Last class we introduced the covariance and correlation between two jointly distributed random variables.
Stat 260 - Lecture 20 Recap of Last Class Last class we introduced the covariance and correlation between two jointly distributed random variables. Today: We will introduce the idea of a statistic and
More informationLinear Mixed Models: Methodology and Algorithms
Linear Mixed Models: Methodology and Algorithms David M. Allen University of Kentucky January 8, 2018 1 The Linear Mixed Model This Chapter introduces some terminology and definitions relating to the main
More informationEffective Linear Discriminant Analysis for High Dimensional, Low Sample Size Data
Effective Linear Discriant Analysis for High Dimensional, Low Sample Size Data Zhihua Qiao, Lan Zhou and Jianhua Z. Huang Abstract In the so-called high dimensional, low sample size (HDLSS) settings, LDA
More information