Statistical NLP: Lecture 7. Collocations. (Ch 5) Introduction

Size: px
Start display at page:

Download "Statistical NLP: Lecture 7. Collocations. (Ch 5) Introduction"

Transcription

1 Statistical NLP: Lecture 7 Collocations Ch 5 Introduction Collocations are characterized b limited compositionalit. Large overlap between the concepts of collocations and terms, technical term and terminological phrase. Collocations sometimes reflect interesting attitudes in English towards different tpes of substances: strong cigarettes, tea, coffee versus powerful drug e.g., heroin

2 Definition w.r.t Computational and Statistical Literature [A collocation is defined as] a sequence of two or more consecutive words, that has characteristics of a sntactic and semantic unit, and whose eact and unambiguous meaning or connotation cannot be derived directl from the meaning or connotation of its components. [Chouekra, 988] 3 Other Definitions/Notions w.r.t. Linguistic Literature Collocations are not necessaril adjacent Tpical criteria for collocations: noncompositionalit, non-substitutabilit, nonmodifiabilit. Collocations cannot be translated into other languages. Generalization to weaker cases strong association of words, but not necessaril fied occurrence. 4

3 Linguistic Subclasses of Collocations Light verbs: verbs with little semantic content Verb particle constructions or Phrasal Verbs Proper Nouns/Names Terminological Epressions 5 Overview of the Collocation Detecting Techniques Surveed Selection of Collocations b Frequenc Selection of Collocation based on Mean and Variance of the distance between focal word and collocating word. Hpothesis Testing Mutual Information 6

4 Frequenc Justeson & Katz, 995. Selecting the most frequentl occurring bigrams. Passing the results through a part-ofspeech filter Simple method that works ver well. 7 Mean and Variance I Smadja et al., 993 Frequenc-based search works well for fied phrases. However, man collocations consist of two words in more fleible relationships. The method computes the mean and variance of the offset signed distance between the two words in the corpus. If the offsets are randoml distributed i.e., no collocation, then the variance/sample deviation will be high. 8

5 Mean and Variance II n = number of times two words collocate µ= sample mean d i = the value of each sample Sample deviation: s = n i i= n d µ 9 Eample of fleible collocation knocked on the door 3 knocked at the door 3 knocked on John s door 5 knocked on the metal front door 5 µ = /4 = 4 s = sqrt / 3 =.5 If s is big => no collocation If µ is not zero => fleible collocation 0

6 Hpothesis Testing: Overview High frequenc and low variance can be accidental. We want to determine whether the cooccurrence is random or whether it occurs more often than chance. This is a classical problem in Statistics called Hpothesis Testing. We formulate a null hpothesis H 0 no association - onl chance and calculate the probabilit p that a collocation would occur if H 0 were true, and then reject H 0 if p is too low. Otherwise, retain H 0 as possible. Hpothesis Testing: The t-test The t-test looks at the mean and variance of a sample of measurements, where the null hpothesis is that the sample is drawn from a distribution with mean µ. The test looks at the difference between the observed and epected means, scaled b the variance of the data, and tells us how likel one is to get a sample of that mean and variance assuming that the sample is drawn from a normal distribution with mean µ. To appl the t-test to collocations, we think of the tet corpus as a long sequence of N bigrams.

7 Hpothesis Testing: Formula N = number of bigrams µ = sample mean for H 0 = observed sample mean t µ = p = probabilit that the event would occur if H 0 were true s N Significance level p < 0.05 means 95% confidence p < 0.0 means 99% confidence 3 Eample new companies collocation or not? w = new w new w = companies O = 8 O = 4667 w companies O = 580 O = Pnew = / Pcompanies = / H 0 : Pnew companies = Pnew * Pcompanies = = µ = 8 / = s = p-p p t = = <.576 => We cannot reject null hpothesis

8 Hpothesis testing of differences Church & Hanks, 989 We ma also want to find words whose cooccurrence patterns best distinguish between two words. This application can be useful for leicograph. The t-test is etended to the comparison of the means of two normal populations. Here, the null hpothesis is that the average difference is 0. 5 Hpothesis testing of difs. II t = s s + n n Pw v = Cw v / N Pw v = C w v / N Eample: strong tea vs. powerful tea t = C w v C w v C w v + C w v 6

9 t-test for statistical significance of the difference between two sstems Sstem Sstem scores 7,6,55,60,68,49, 4,7,76,55,64 total n Mean i si ^ = sum ij - i^ df 0 0 4,55,75,45,54,5, 55,36,58,55,67 7 t-test for differences continued Pooled s = / = 3.4 t = s n = = For rejecting the hpothesis that Sstem is better then Sstem with a probabilit level of α = 0.05, the critical value is t=.75 from statistics table We cannot conclude the superiorit of Sstem because of the large variance in scores 8

10 Chi-Square test I: Method Use of the t-test has been criticized because it assumes that probabilities are approimatel normall distributed not true, generall. The Chi-Square test does not make this assumption. The essence of the test is to compare observed frequencies with frequencies epected for independence. If the difference between observed and epected frequencies is large, then we can reject the null hpothesis of independence. 9 Chi-Square test II: Formula X E E X = i, j=, O + O = N =...; E = O O ij + O E O + O N =...; E E ij ij N OO O + O N =... O O O + O O + O 0

11 Chi-Square test III: Applications One of the earl uses of the Chi square test in Statistical NLP was the identification of translation pairs in aligned corpora Church & Gale, 99. A more recent application is to use Chi square as a metric for corpus similarit Kilgariff and Rose, 998 Nevertheless, the Chi-Square test should not be used in small corpora. Eample new companies collocation or not? w = new w new w = companies O = 8 O = 4667 w = companies O = 580 O = E ij = marginal probabilities = totals of row i and column j converted into proportions = epected values for independence X =.55 < 3.84 needed for p < 0.05, one degree of freedom for table

12 Likelihood Ratios I: Within a single corpus Dunning, 993 Likelihood ratios are more appropriate for sparse data than the Chi-Square test. In addition, the are easier to interpret than the Chi-Square statistic. In appling the likelihood ratio test to collocation discover, we eamine the following two alternative eplanations for the occurrence frequenc of a bigram w w: The occurrence of w is independent of the previous occurrence of w The occurrence of w is dependent of the previous occurrence of w 3 Log likelihood H : P w H : P w c p = ; p N w = P w w = p L H b c; c, p b c logλ = log = L H b c ; c, p b c = logl c logl c w = p, c, p + logl c, c, p logl c k wherel k, n, = c = ; p c p = P w n k c c = ; c N c w = C w ; c = C w ; c c; N c, p c ; N c, p c, N c, p and b - binomial distrib. c, N c, p = C ww ; 4

13 Likelihood Ratios II: Between two or more corpora Damerau, 993 Ratios of relative frequencies between two or more different corpora can be used to discover collocations that are characteristic of a corpus when compared to other corpora. This approach is most useful for the discover of subject-specific collocations. 5 Mutual Information I An information-theoretic measure for discovering collocations is pointwise mutual information Church et al., 89, 9 Pointwise Mutual Information is roughl a measure of how much one word tells us about the other. Pointwise mutual information does not work well with sparse data. 6

14 7 Mutual Information II, log,, log,, log,, C C N C PMI P P P PMI P P P Y X P MI = = = PMI = E MI 8 Eample PMInew, companies = = log 8 * / 4675 * 588 =.546 PMIhouse, commons = 4. PMIvideocasette, recorder = 5.94

Collocations. (M&S Ch 5)

Collocations. (M&S Ch 5) Colloations M&S Ch 5 Introdution Colloations are haraterized by limited ompositionality. Large overlap between the onepts of olloations and terms tehnial term and terminologial phrase. Colloations sometimes

More information

The Fusion of Parametric and Non-Parametric Hypothesis Tests

The Fusion of Parametric and Non-Parametric Hypothesis Tests The Fusion of arametric and Non-arametric pothesis Tests aul Frank Singer, h.d. Ratheon Co. EO/E1/A173.O. Bo 902 El Segundo, CA 90245 310-647-2643 FSinger@ratheon.com Abstract: This paper considers the

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and can be printed and given to the

More information

Ling 289 Contingency Table Statistics

Ling 289 Contingency Table Statistics Ling 289 Contingency Table Statistics Roger Levy and Christopher Manning This is a summary of the material that we ve covered on contingency tables. Contingency tables: introduction Odds ratios Counting,

More information

From the help desk: It s all about the sampling

From the help desk: It s all about the sampling The Stata Journal (2002) 2, Number 2, pp. 90 20 From the help desk: It s all about the sampling Allen McDowell Stata Corporation amcdowell@stata.com Jeff Pitblado Stata Corporation jsp@stata.com Abstract.

More information

Regular Physics - Notes Ch. 1

Regular Physics - Notes Ch. 1 Regular Phsics - Notes Ch. 1 What is Phsics? the stud of matter and energ and their relationships; the stud of the basic phsical laws of nature which are often stated in simple mathematical equations.

More information

Unit #4: Collocation analysis in R Stefan Evert 23 June 2016

Unit #4: Collocation analysis in R Stefan Evert 23 June 2016 Unit #4: Collocation analysis in R Stefan Evert 23 June 2016 Contents Co-occurrence data & contingency tables 1 Notation................................................... 1 Contingency tables in R..........................................

More information

Mathematical Statistics. Gregg Waterman Oregon Institute of Technology

Mathematical Statistics. Gregg Waterman Oregon Institute of Technology Mathematical Statistics Gregg Waterman Oregon Institute of Technolog c Gregg Waterman This work is licensed under the Creative Commons Attribution. International license. The essence of the license is

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Math Review Summer 0 Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Classif the hpothesis test as two-tailed, left-tailed, or right-tailed.

More information

Biostatistics in Research Practice - Regression I

Biostatistics in Research Practice - Regression I Biostatistics in Research Practice - Regression I Simon Crouch 30th Januar 2007 In scientific studies, we often wish to model the relationships between observed variables over a sample of different subjects.

More information

Words vs. Terms. Words vs. Terms. Words vs. Terms. Information Retrieval cares about terms You search for em, Google indexes em Query:

Words vs. Terms. Words vs. Terms. Words vs. Terms. Information Retrieval cares about terms You search for em, Google indexes em Query: Words vs. Terms Words vs. Terms Information Retrieval cares about You search for em, Google indexes em Query: What kind of monkeys live in Costa Rica? 600.465 - Intro to NLP - J. Eisner 1 600.465 - Intro

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures AB = BA = I,

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures AB = BA = I, FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 7 MATRICES II Inverse of a matri Sstems of linear equations Solution of sets of linear equations elimination methods 4

More information

Interaction Network Analysis

Interaction Network Analysis CSI/BIF 5330 Interaction etwork Analsis Young-Rae Cho Associate Professor Department of Computer Science Balor Universit Biological etworks Definition Maps of biochemical reactions, interactions, regulations

More information

Glossary. Also available at BigIdeasMath.com: multi-language glossary vocabulary flash cards

Glossary. Also available at BigIdeasMath.com: multi-language glossary vocabulary flash cards Glossar This student friendl glossar is designed to be a reference for ke vocabular, properties, and mathematical terms. Several of the entries include a short eample to aid our understanding of important

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 3: Finding Distinctive Terms (Jan 29, 2019) David Bamman, UC Berkeley https://www.nytimes.com/interactive/ 2017/11/07/upshot/modern-love-what-wewrite-when-we-write-about-love.html

More information

16.400/453J Human Factors Engineering. Design of Experiments II

16.400/453J Human Factors Engineering. Design of Experiments II J Human Factors Engineering Design of Experiments II Review Experiment Design and Descriptive Statistics Research question, independent and dependent variables, histograms, box plots, etc. Inferential

More information

CONTINUOUS SPATIAL DATA ANALYSIS

CONTINUOUS SPATIAL DATA ANALYSIS CONTINUOUS SPATIAL DATA ANALSIS 1. Overview of Spatial Stochastic Processes The ke difference between continuous spatial data and point patterns is that there is now assumed to be a meaningful value, s

More information

3.0 PROBABILITY, RANDOM VARIABLES AND RANDOM PROCESSES

3.0 PROBABILITY, RANDOM VARIABLES AND RANDOM PROCESSES 3.0 PROBABILITY, RANDOM VARIABLES AND RANDOM PROCESSES 3.1 Introduction In this chapter we will review the concepts of probabilit, rom variables rom processes. We begin b reviewing some of the definitions

More information

Statistical tables are attached Two Hours UNIVERSITY OF MANCHESTER. May 2007 Final Draft

Statistical tables are attached Two Hours UNIVERSITY OF MANCHESTER. May 2007 Final Draft Statistical tables are attached wo Hours UNIVERSIY OF MNHESER Ma 7 Final Draft Medical Statistics M377 Electronic calculators ma be used provided that the cannot store tet nswer LL si questions in SEION

More information

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D. Research Design - - Topic 15a Introduction to Multivariate Analses 009 R.C. Gardner, Ph.D. Major Characteristics of Multivariate Procedures Overview of Multivariate Techniques Bivariate Regression and

More information

nm nm

nm nm The Quantum Mechanical Model of the Atom You have seen how Bohr s model of the atom eplains the emission spectrum of hdrogen. The emission spectra of other atoms, however, posed a problem. A mercur atom,

More information

Limits 4: Continuity

Limits 4: Continuity Limits 4: Continuit 55 Limits 4: Continuit Model : Continuit I. II. III. IV. z V. VI. z a VII. VIII. IX. Construct Your Understanding Questions (to do in class). Which is the correct value of f (a) in

More information

Lecture 2: CDF and EDF

Lecture 2: CDF and EDF STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all

More information

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8

CIVL /8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 CIVL - 7904/8904 T R A F F I C F L O W T H E O R Y L E C T U R E - 8 Chi-square Test How to determine the interval from a continuous distribution I = Range 1 + 3.322(logN) I-> Range of the class interval

More information

LINEAR REGRESSION ANALYSIS

LINEAR REGRESSION ANALYSIS LINEAR REGRESSION ANALYSIS MODULE V Lecture - 2 Correcting Model Inadequacies Through Transformation and Weighting Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technolog Kanpur

More information

Answers Investigation 3

Answers Investigation 3 Answers Investigation Applications. a., b. s = n c. The numbers seem to be increasing b a greater amount each time. The square number increases b consecutive odd integers:,, 7,, c X X=. a.,,, b., X 7 X=

More information

Compare several group means: ANOVA

Compare several group means: ANOVA 1 - Introduction Compare several group means: ANOVA Pierre Legendre, Université de Montréal August 009 Objective: compare the means of several () groups for a response variable of interest. The groups

More information

CBA4 is live in practice mode this week exam mode from Saturday!

CBA4 is live in practice mode this week exam mode from Saturday! Announcements CBA4 is live in practice mode this week exam mode from Saturday! Material covered: Confidence intervals (both cases) 1 sample hypothesis tests (both cases) Hypothesis tests for 2 means as

More information

LESSON #12 - FORMS OF A LINE COMMON CORE ALGEBRA II

LESSON #12 - FORMS OF A LINE COMMON CORE ALGEBRA II LESSON # - FORMS OF A LINE COMMON CORE ALGEBRA II Linear functions come in a variet of forms. The two shown below have been introduced in Common Core Algebra I and Common Core Geometr. TWO COMMON FORMS

More information

HÉNON HEILES HAMILTONIAN CHAOS IN 2 D

HÉNON HEILES HAMILTONIAN CHAOS IN 2 D ABSTRACT HÉNON HEILES HAMILTONIAN CHAOS IN D MODELING CHAOS & COMPLEXITY 008 YOUVAL DAR PHYSICS DAR@PHYSICS.UCDAVIS.EDU Chaos in two degrees of freedom, demonstrated b using the Hénon Heiles Hamiltonian

More information

Roger Johnson Structure and Dynamics: The 230 space groups Lecture 3

Roger Johnson Structure and Dynamics: The 230 space groups Lecture 3 Roger Johnson Structure and Dnamics: The 23 space groups Lecture 3 3.1. Summar In the first two lectures we considered the structure and dnamics of single molecules. In this lecture we turn our attention

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

EDEXCEL ANALYTICAL METHODS FOR ENGINEERS H1 UNIT 2 - NQF LEVEL 4 OUTCOME 4 - STATISTICS AND PROBABILITY TUTORIAL 3 LINEAR REGRESSION

EDEXCEL ANALYTICAL METHODS FOR ENGINEERS H1 UNIT 2 - NQF LEVEL 4 OUTCOME 4 - STATISTICS AND PROBABILITY TUTORIAL 3 LINEAR REGRESSION EDEXCEL AALYTICAL METHODS FOR EGIEERS H1 UIT - QF LEVEL 4 OUTCOME 4 - STATISTICS AD PROBABILITY TUTORIAL 3 LIEAR REGRESSIO Tabular and graphical form: data collection methods; histograms; bar charts; line

More information

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies

The t-test: A z-score for a sample mean tells us where in the distribution the particular mean lies The t-test: So Far: Sampling distribution benefit is that even if the original population is not normal, a sampling distribution based on this population will be normal (for sample size > 30). Benefit

More information

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis

STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis STAT 135 Lab 9 Multiple Testing, One-Way ANOVA and Kruskal-Wallis Rebecca Barter April 6, 2015 Multiple Testing Multiple Testing Recall that when we were doing two sample t-tests, we were testing the equality

More information

Structural Equation Modeling Lab 5 In Class Modification Indices Example

Structural Equation Modeling Lab 5 In Class Modification Indices Example Structural Equation Modeling Lab 5 In Class Modification Indices Example. Model specifications sntax TI Modification Indices DA NI=0 NO=0 MA=CM RA FI='E:\Teaching\SEM S09\Lab 5\jsp6.psf' SE 7 6 5 / MO

More information

Engineering Mathematics I

Engineering Mathematics I Engineering Mathematics I_ 017 Engineering Mathematics I 1. Introduction to Differential Equations Dr. Rami Zakaria Terminolog Differential Equation Ordinar Differential Equations Partial Differential

More information

Unit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents

Unit 3 NOTES Honors Common Core Math 2 1. Day 1: Properties of Exponents Unit NOTES Honors Common Core Math Da : Properties of Eponents Warm-Up: Before we begin toda s lesson, how much do ou remember about eponents? Use epanded form to write the rules for the eponents. OBJECTIVE

More information

INF Introduction to classifiction Anne Solberg

INF Introduction to classifiction Anne Solberg INF 4300 8.09.17 Introduction to classifiction Anne Solberg anne@ifi.uio.no Introduction to classification Based on handout from Pattern Recognition b Theodoridis, available after the lecture INF 4300

More information

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics Chapter 13 Student Lecture Notes 13-1 Department of Quantitative Methods & Information Sstems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analsis QMIS 0 Dr. Mohammad

More information

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS Copright, 8, RE Kass, EN Brown, and U Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS Chapter 6 Random Vectors and Multivariate Distributions 6 Random Vectors In Section?? we etended

More information

6. Vector Random Variables

6. Vector Random Variables 6. Vector Random Variables In the previous chapter we presented methods for dealing with two random variables. In this chapter we etend these methods to the case of n random variables in the following

More information

Ch 3 Alg 2 Note Sheet.doc 3.1 Graphing Systems of Equations

Ch 3 Alg 2 Note Sheet.doc 3.1 Graphing Systems of Equations Ch 3 Alg Note Sheet.doc 3.1 Graphing Sstems of Equations Sstems of Linear Equations A sstem of equations is a set of two or more equations that use the same variables. If the graph of each equation =.4

More information

Inferential Statistics and Methods for Tuning and Configuring

Inferential Statistics and Methods for Tuning and Configuring DM63 HEURISTICS FOR COMBINATORIAL OPTIMIZATION Lecture 13 and Methods for Tuning and Configuring Marco Chiarandini 1 Summar 2 3 Eperimental Designs 4 Sequential Testing (Racing) Summar 1 Summar 2 3 Eperimental

More information

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24 L545 Dept. of Linguistics, Indiana University Spring 2013 1 / 24 Morphosyntax We just finished talking about morphology (cf. words) And pretty soon we re going to discuss syntax (cf. sentences) In between,

More information

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN

Language Models. Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Language Models Data Science: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM PHILIP KOEHN Data Science: Jordan Boyd-Graber UMD Language Models 1 / 8 Language models Language models answer

More information

1.6 ELECTRONIC STRUCTURE OF THE HYDROGEN ATOM

1.6 ELECTRONIC STRUCTURE OF THE HYDROGEN ATOM 1.6 ELECTRONIC STRUCTURE OF THE HYDROGEN ATOM 23 How does this wave-particle dualit require us to alter our thinking about the electron? In our everda lives, we re accustomed to a deterministic world.

More information

4452 Mathematical Modeling Lecture 13: Chaos and Fractals

4452 Mathematical Modeling Lecture 13: Chaos and Fractals Math Modeling Lecture 13: Chaos and Fractals Page 1 442 Mathematical Modeling Lecture 13: Chaos and Fractals Introduction In our tetbook, the discussion on chaos and fractals covers less than 2 pages.

More information

Learning From Data Lecture 7 Approximation Versus Generalization

Learning From Data Lecture 7 Approximation Versus Generalization Learning From Data Lecture 7 Approimation Versus Generalization The VC Dimension Approimation Versus Generalization Bias and Variance The Learning Curve M. Magdon-Ismail CSCI 4100/6100 recap: The Vapnik-Chervonenkis

More information

Speech and Language Processing

Speech and Language Processing Speech and Language Processing Lecture 5 Neural network based acoustic and language models Information and Communications Engineering Course Takahiro Shinoaki 08//6 Lecture Plan (Shinoaki s part) I gives

More information

Exponential, Logistic, and Logarithmic Functions

Exponential, Logistic, and Logarithmic Functions CHAPTER 3 Eponential, Logistic, and Logarithmic Functions 3.1 Eponential and Logistic Functions 3.2 Eponential and Logistic Modeling 3.3 Logarithmic Functions and Their Graphs 3.4 Properties of Logarithmic

More information

Physics Lab 1 - Measurements

Physics Lab 1 - Measurements Phsics 203 - Lab 1 - Measurements Introduction An phsical science requires measurement. This lab will involve making several measurements of the fundamental units of length, mass, and time. There is no

More information

LESSON #11 - FORMS OF A LINE COMMON CORE ALGEBRA II

LESSON #11 - FORMS OF A LINE COMMON CORE ALGEBRA II LESSON # - FORMS OF A LINE COMMON CORE ALGEBRA II Linear functions come in a variet of forms. The two shown below have been introduced in Common Core Algebra I and Common Core Geometr. TWO COMMON FORMS

More information

Count-Min Tree Sketch: Approximate counting for NLP

Count-Min Tree Sketch: Approximate counting for NLP Count-Min Tree Sketch: Approximate counting for NLP Guillaume Pitel, Geoffroy Fouquier, Emmanuel Marchand and Abdul Mouhamadsultane exensa firstname.lastname@exensa.com arxiv:64.5492v [cs.ir] 9 Apr 26

More information

Review. One-way ANOVA, I. What s coming up. Multiple comparisons

Review. One-way ANOVA, I. What s coming up. Multiple comparisons Review One-way ANOVA, I 9.07 /15/00 Earlier in this class, we talked about twosample z- and t-tests for the difference between two conditions of an independent variable Does a trial drug work better than

More information

Trigonometric. equations. Topic: Periodic functions and applications. Simple trigonometric. equations. Equations using radians Further trigonometric

Trigonometric. equations. Topic: Periodic functions and applications. Simple trigonometric. equations. Equations using radians Further trigonometric Trigonometric equations 6 sllabusref eferenceence Topic: Periodic functions and applications In this cha 6A 6B 6C 6D 6E chapter Simple trigonometric equations Equations using radians Further trigonometric

More information

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015 AMS7: WEEK 7. CLASS 1 More on Hypothesis Testing Monday May 11th, 2015 Testing a Claim about a Standard Deviation or a Variance We want to test claims about or 2 Example: Newborn babies from mothers taking

More information

Computer Vision & Digital Image Processing

Computer Vision & Digital Image Processing Computer Vision & Digital Image Processing Image Segmentation Dr. D. J. Jackson Lecture 6- Image segmentation Segmentation divides an image into its constituent parts or objects Level of subdivision depends

More information

Significance tests for the evaluation of ranking methods

Significance tests for the evaluation of ranking methods Significance tests for the evaluation of ranking methods Stefan Evert Institut für maschinelle Sprachverarbeitung Universität Stuttgart Azenbergstr. 12, 70174 Stuttgart, Germany evert@ims.uni-stuttgart.de

More information

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent:

In a one-way ANOVA, the total sums of squares among observations is partitioned into two components: Sums of squares represent: Activity #10: AxS ANOVA (Repeated subjects design) Resources: optimism.sav So far in MATH 300 and 301, we have studied the following hypothesis testing procedures: 1) Binomial test, sign-test, Fisher s

More information

LESSON #48 - INTEGER EXPONENTS COMMON CORE ALGEBRA II

LESSON #48 - INTEGER EXPONENTS COMMON CORE ALGEBRA II LESSON #8 - INTEGER EXPONENTS COMMON CORE ALGEBRA II We just finished our review of linear functions. Linear functions are those that grow b equal differences for equal intervals. In this unit we will

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Suhasini Subba Rao Motivations for the ANOVA We defined the F-distribution, this is mainly used in

More information

Eigenvectors and Eigenvalues 1

Eigenvectors and Eigenvalues 1 Ma 2015 page 1 Eigenvectors and Eigenvalues 1 In this handout, we will eplore eigenvectors and eigenvalues. We will begin with an eploration, then provide some direct eplanation and worked eamples, and

More information

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests

1 Descriptive statistics. 2 Scores and probability distributions. 3 Hypothesis testing and one-sample t-test. 4 More on t-tests Overall Overview INFOWO Statistics lecture S3: Hypothesis testing Peter de Waal Department of Information and Computing Sciences Faculty of Science, Universiteit Utrecht 1 Descriptive statistics 2 Scores

More information

Toda s Theorem: PH P #P

Toda s Theorem: PH P #P CS254: Computational Compleit Theor Prof. Luca Trevisan Final Project Ananth Raghunathan 1 Introduction Toda s Theorem: PH P #P The class NP captures the difficult of finding certificates. However, in

More information

x y plane is the plane in which the stresses act, yy xy xy Figure 3.5.1: non-zero stress components acting in the x y plane

x y plane is the plane in which the stresses act, yy xy xy Figure 3.5.1: non-zero stress components acting in the x y plane 3.5 Plane Stress This section is concerned with a special two-dimensional state of stress called plane stress. It is important for two reasons: () it arises in real components (particularl in thin components

More information

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work.

0.24 adults 2. (c) Prove that, regardless of the possible values of and, the covariance between X and Y is equal to zero. Show all work. 1 A socioeconomic stud analzes two discrete random variables in a certain population of households = number of adult residents and = number of child residents It is found that their joint probabilit mass

More information

Space frames. b) R z φ z. R x. Figure 1 Sign convention: a) Displacements; b) Reactions

Space frames. b) R z φ z. R x. Figure 1 Sign convention: a) Displacements; b) Reactions Lecture notes: Structural Analsis II Space frames I. asic concepts. The design of a building is generall accomplished b considering the structure as an assemblage of planar frames, each of which is designed

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Consider a slender rod, fixed at one end and stretched, as illustrated in Fig ; the original position of the rod is shown dotted.

Consider a slender rod, fixed at one end and stretched, as illustrated in Fig ; the original position of the rod is shown dotted. 4.1 Strain If an object is placed on a table and then the table is moved, each material particle moves in space. The particles undergo a displacement. The particles have moved in space as a rigid bod.

More information

Methods for Advanced Mathematics (C3) Coursework Numerical Methods

Methods for Advanced Mathematics (C3) Coursework Numerical Methods Woodhouse College 0 Page Introduction... 3 Terminolog... 3 Activit... 4 Wh use numerical methods?... Change of sign... Activit... 6 Interval Bisection... 7 Decimal Search... 8 Coursework Requirements on

More information

H.Algebra 2 Summer Review Packet

H.Algebra 2 Summer Review Packet H.Algebra Summer Review Packet 1 Correlation of Algebra Summer Packet with Algebra 1 Objectives A. Simplifing Polnomial Epressions Objectives: The student will be able to: Use the commutative, associative,

More information

Lecture 7: Hypothesis Testing and ANOVA

Lecture 7: Hypothesis Testing and ANOVA Lecture 7: Hypothesis Testing and ANOVA Goals Overview of key elements of hypothesis testing Review of common one and two sample tests Introduction to ANOVA Hypothesis Testing The intent of hypothesis

More information

Determinants. We said in Section 3.3 that a 2 2 matrix a b. Determinant of an n n Matrix

Determinants. We said in Section 3.3 that a 2 2 matrix a b. Determinant of an n n Matrix 3.6 Determinants We said in Section 3.3 that a 2 2 matri a b is invertible if and onl if its c d erminant, ad bc, is nonzero, and we saw the erminant used in the formula for the inverse of a 2 2 matri.

More information

Embeddings Learned By Matrix Factorization

Embeddings Learned By Matrix Factorization Embeddings Learned By Matrix Factorization Benjamin Roth; Folien von Hinrich Schütze Center for Information and Language Processing, LMU Munich Overview WordSpace limitations LinAlgebra review Input matrix

More information

4 The Cartesian Coordinate System- Pictures of Equations

4 The Cartesian Coordinate System- Pictures of Equations The Cartesian Coordinate Sstem- Pictures of Equations Concepts: The Cartesian Coordinate Sstem Graphs of Equations in Two Variables -intercepts and -intercepts Distance in Two Dimensions and the Pthagorean

More information

Introduction Direct Variation Rates of Change Scatter Plots. Introduction. EXAMPLE 1 A Mathematical Model

Introduction Direct Variation Rates of Change Scatter Plots. Introduction. EXAMPLE 1 A Mathematical Model APPENDIX B Mathematical Modeling B1 Appendi B Mathematical Modeling B.1 Modeling Data with Linear Functions Introduction Direct Variation Rates of Change Scatter Plots Introduction The primar objective

More information

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs)

The One-Way Independent-Samples ANOVA. (For Between-Subjects Designs) The One-Way Independent-Samples ANOVA (For Between-Subjects Designs) Computations for the ANOVA In computing the terms required for the F-statistic, we won t explicitly compute any sample variances or

More information

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.

Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6. Chapter 7 Reading 7.1, 7.2 Questions 3.83, 6.11, 6.12, 6.17, 6.25, 6.29, 6.33, 6.35, 6.50, 6.51, 6.53, 6.55, 6.59, 6.60, 6.65, 6.69, 6.70, 6.77, 6.79, 6.89, 6.112 Introduction In Chapter 5 and 6, we emphasized

More information

14.1 Systems of Linear Equations in Two Variables

14.1 Systems of Linear Equations in Two Variables 86 Chapter 1 Sstems of Equations and Matrices 1.1 Sstems of Linear Equations in Two Variables Use the method of substitution to solve sstems of equations in two variables. Use the method of elimination

More information

Math 123 Summary of Important Algebra & Trigonometry Concepts Chapter 1 & Appendix D, Stewart, Calculus Early Transcendentals

Math 123 Summary of Important Algebra & Trigonometry Concepts Chapter 1 & Appendix D, Stewart, Calculus Early Transcendentals Math Summar of Important Algebra & Trigonometr Concepts Chapter & Appendi D, Stewart, Calculus Earl Transcendentals Function a rule that assigns to each element in a set D eactl one element, called f (

More information

Glossary. Also available at BigIdeasMath.com: multi-language glossary vocabulary flash cards. An equation that contains an absolute value expression

Glossary. Also available at BigIdeasMath.com: multi-language glossary vocabulary flash cards. An equation that contains an absolute value expression Glossar This student friendl glossar is designed to be a reference for ke vocabular, properties, and mathematical terms. Several of the entries include a short eample to aid our understanding of important

More information

Applied Natural Language Processing

Applied Natural Language Processing Applied Natural Language Processing Info 256 Lecture 7: Testing (Feb 12, 2019) David Bamman, UC Berkeley Significance in NLP You develop a new method for text classification; is it better than what comes

More information

Camera calibration. Outline. Pinhole camera. Camera projection models. Nonlinear least square methods A camera calibration tool

Camera calibration. Outline. Pinhole camera. Camera projection models. Nonlinear least square methods A camera calibration tool Outline Camera calibration Camera projection models Camera calibration i Nonlinear least square methods A camera calibration tool Applications Digital Visual Effects Yung-Yu Chuang with slides b Richard

More information

Topic 22 Analysis of Variance

Topic 22 Analysis of Variance Topic 22 Analysis of Variance Comparing Multiple Populations 1 / 14 Outline Overview One Way Analysis of Variance Sample Means Sums of Squares The F Statistic Confidence Intervals 2 / 14 Overview Two-sample

More information

Topic 21 Goodness of Fit

Topic 21 Goodness of Fit Topic 21 Goodness of Fit Contingency Tables 1 / 11 Introduction Two-way Table Smoking Habits The Hypothesis The Test Statistic Degrees of Freedom Outline 2 / 11 Introduction Contingency tables, also known

More information

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006

2 and F Distributions. Barrow, Statistics for Economics, Accounting and Business Studies, 4 th edition Pearson Education Limited 2006 and F Distributions Lecture 9 Distribution The distribution is used to: construct confidence intervals for a variance compare a set of actual frequencies with expected frequencies test for association

More information

Chapter 9 Inferences from Two Samples

Chapter 9 Inferences from Two Samples Chapter 9 Inferences from Two Samples 9-1 Review and Preview 9-2 Two Proportions 9-3 Two Means: Independent Samples 9-4 Two Dependent Samples (Matched Pairs) 9-5 Two Variances or Standard Deviations Review

More information

x. 4. 2x 10 4x. 10 x

x. 4. 2x 10 4x. 10 x CCGPS UNIT Semester 1 COORDINATE ALGEBRA Page 1 of Reasoning with Equations and Quantities Name: Date: Understand solving equations as a process of reasoning and eplain the reasoning MCC9-1.A.REI.1 Eplain

More information

Lab 5 Forces Part 1. Physics 211 Lab. You will be using Newton s 2 nd Law to help you examine the nature of these forces.

Lab 5 Forces Part 1. Physics 211 Lab. You will be using Newton s 2 nd Law to help you examine the nature of these forces. b Lab 5 Forces Part 1 Phsics 211 Lab Introduction This is the first week of a two part lab that deals with forces and related concepts. A force is a push or a pull on an object that can be caused b a variet

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Deep Nonlinear Non-Gaussian Filtering for Dynamical Systems

Deep Nonlinear Non-Gaussian Filtering for Dynamical Systems Deep Nonlinear Non-Gaussian Filtering for Dnamical Sstems Arash Mehrjou Department of Empirical Inference Ma Planck Institute for Intelligent Sstems arash.mehrjou@tuebingen.mpg.de Bernhard Schölkopf Department

More information

CHAPTER 2: Partial Derivatives. 2.2 Increments and Differential

CHAPTER 2: Partial Derivatives. 2.2 Increments and Differential CHAPTER : Partial Derivatives.1 Definition of a Partial Derivative. Increments and Differential.3 Chain Rules.4 Local Etrema.5 Absolute Etrema 1 Chapter : Partial Derivatives.1 Definition of a Partial

More information

2: Distributions of Several Variables, Error Propagation

2: Distributions of Several Variables, Error Propagation : Distributions of Several Variables, Error Propagation Distribution of several variables. variables The joint probabilit distribution function of two variables and can be genericall written f(, with the

More information

Study/Resource Guide. for Students and Parents. Coordinate Algebra

Study/Resource Guide. for Students and Parents. Coordinate Algebra Georgia Milestones Assessment Sstem Stud/Resource Guide for Students and Parents Coordinate Algebra Stud/Resource Guide The Stud/Resource Guides are intended to serve as a resource for parents and students.

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 151014 Introduction to classifiction Anne Solberg anne@ifiuiono Based on Chapter 1-6 in Duda and Hart: Pattern Classification 151014 INF 4300 1 Introduction to classification One of the most challenging

More information

9.07 Introduction to Probability and Statistics for Brain and Cognitive Sciences Emery N. Brown

9.07 Introduction to Probability and Statistics for Brain and Cognitive Sciences Emery N. Brown 9.07 Introduction to Probabilit and Statistics for Brain and Cognitive Sciences Emer N. Brown I. Objectives Lecture 4: Transformations of Random Variables, Joint Distributions of Random Variables A. Understand

More information

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!! (preferred!)!!

Probability theory and inference statistics! Dr. Paola Grosso! SNE research group!!  (preferred!)!! Probability theory and inference statistics Dr. Paola Grosso SNE research group p.grosso@uva.nl paola.grosso@os3.nl (preferred) Roadmap Lecture 1: Monday Sep. 22nd Collecting data Presenting data Descriptive

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information