Info 2950, Lecture 24

Similar documents
Language Modeling. Michael Collins, Columbia University

The Language Modeling Problem (Fall 2007) Smoothed Estimation, and Language Modeling. The Language Modeling Problem (Continued) Overview

Probabilistic Language Modeling

CMPSCI 240: Reasoning Under Uncertainty

n(1 p i ) n 1 p i = 1 3 i=1 E(X i p = p i )P(p = p i ) = 1 3 p i = n 3 (p 1 + p 2 + p 3 ). p i i=1 P(X i = 1 p = p i )P(p = p i ) = p1+p2+p3

Math 180B Problem Set 3

Dynamic Programming Lecture #4

Hidden Markov Models

P (A) = P (B) = P (C) = P (D) =

MATH Solutions to Probability Exercises

Chap 4 Probability p227 The probability of any outcome in a random phenomenon is the proportion of times the outcome would occur in a long series of

Lecture 6 - Random Variables and Parameterized Sample Spaces

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Statistics 251: Statistical Methods

Probability Density Functions and the Normal Distribution. Quantitative Understanding in Biology, 1.2

MAT2377. Ali Karimnezhad. Version September 9, Ali Karimnezhad

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

Lecture 6 Probability

University of California, Berkeley, Statistics 134: Concepts of Probability. Michael Lugo, Spring Exam 1

Lecture Lecture 5

Dept. of Linguistics, Indiana University Fall 2015

CSCI 5832 Natural Language Processing. Today 1/31. Probability Basics. Lecture 6. Probability. Language Modeling (N-grams)

Chapter 35 out of 37 from Discrete Mathematics for Neophytes: Number Theory, Probability, Algorithms, and Other Stuff by J. M. Cargal.

Math 493 Final Exam December 01

Discrete Random Variables. Discrete Random Variables

Analysis of Algorithms CMPSC 565

Info 2950, Lecture 4

Common Discrete Distributions

Markov Processes Hamid R. Rabiee

Math , Fall 2012: HW 5 Solutions

Topic 3 Random variables, expectation, and variance, II

Section 13.3 Probability

This lecture will review everything you learned in Basic tools in probability

Random Variable. Pr(X = a) = Pr(s)

Econ 113. Lecture Module 2

Introduction to discrete probability. The rules Sample space (finite except for one example)

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

Lecture 1 Introduction to Probability and Set Theory Text: A Course in Probability by Weiss

Lecture Notes 3 Convergence (Chapter 5)

the time it takes until a radioactive substance undergoes a decay

Lecture 5: Introduction to Markov Chains

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

Lecture 9 Classification of States

STOCHASTIC PROCESSES Basic notions

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Properties of Probability

Answers to All Exercises

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP

N/4 + N/2 + N = 2N 2.

Fundamentals of Probability CE 311S

Intro to Stats Lecture 11

Grades 7 & 8, Math Circles 24/25/26 October, Probability

Discrete Mathematics and Probability Theory Fall 2013 Vazirani Note 12. Random Variables: Distribution and Expectation

Discrete Mathematics and Probability Theory Summer 2014 James Cook Final Exam

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

MATH MW Elementary Probability Course Notes Part I: Models and Counting

CS280, Spring 2004: Final

Statistics for Business and Economics

Hidden Markov Models and some applications

Expectation of geometric distribution

Lectures on Elementary Probability. William G. Faris

To find the median, find the 40 th quartile and the 70 th quartile (which are easily found at y=1 and y=2, respectively). Then we interpolate:

Discrete Random Variables

Discrete Mathematics and Probability Theory Fall 2012 Vazirani Note 14. Random Variables: Distribution and Expectation

CS206 Review Sheet 3 October 24, 2018

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Conditional Probability

Discrete Mathematics for CS Spring 2006 Vazirani Lecture 22

Introduction to Stochastic Processes

HIDDEN MARKOV MODELS

Markov Chains and Related Matters

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Assignment 5 SOLUTIONS. Part A Getting a 3 or less on a single roll of a 10-Sided Die. 2. Printout of the first 50 lines of your four data columns.

RANDOM WALKS AND THE PROBABILITY OF RETURNING HOME

Markov chains. Randomness and Computation. Markov chains. Markov processes

Chapter 2: Random Variables

Example 1. The sample space of an experiment where we flip a pair of coins is denoted by:

Examples of frequentist probability include games of chance, sample surveys, and randomized experiments. We will focus on frequentist probability sinc

From M&Ms to Mathematics, or, How I learned to answer questions and help my kids love math.

Class 26: review for final exam 18.05, Spring 2014

Sequences and Information

Random Variables Example:

Markov Chains. As part of Interdisciplinary Mathematical Modeling, By Warren Weckesser Copyright c 2006.

Assignment 16 Assigned Weds Oct 11

ECE 302, Final 3:20-5:20pm Mon. May 1, WTHR 160 or WTHR 172.

1 Ways to Describe a Stochastic Process

Hidden Markov Models and some applications

6.041SC Probabilistic Systems Analysis and Applied Probability, Fall 2013 Transcript Tutorial:A Random Number of Coin Flips

Discrete Mathematics and Probability Theory Fall 2010 Tse/Wagner MT 2 Soln

Math 20 Spring Discrete Probability. Midterm Exam

Chapter 8: An Introduction to Probability and Statistics

Lecture 1. ABC of Probability

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

Probability and Discrete Distributions

Language as a Stochastic Process

Expectation is linear. So far we saw that E(X + Y ) = E(X) + E(Y ). Let α R. Then,

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

Discrete Finite Probability Probability 1

Transcription:

Info 2950, Lecture 24 2 May 2017 An additional "bonus" was added to problem 4 on Fri afternoon 28 Apr, and some additional typos plus slight rewording to resolve questions about problem 2 at end of afternoon. [Note: make sure your version has a problem 1C) and a problem 4C)iii) just before the bonus 4D] Prob Set 7: due Wed 3 May Prob Set 8: due 11 May (end of classes)

Case / Deaton (2015) http://www.pnas.org/content/112/49/15078.full.pdf

WONDER = Wide-ranging Online Data for Epidemiologic Research https://wonder.cdc.gov/

use a dictionary, after screening for white, non-hispanic, F, then fdata[row[10]].append(row[-1]) actually use row[10][1:-1], or int(row[10][3:-1]) to parse the HHSn then fdata[ HHS1 ] gives the list of values, and so on

https://research.googleblog.com/2006/08/all-our-n-gram-are-belong-to-you.html http://storage.googleapis.com/books/ngrams/books/datasetsv2.html "unigram" and "bigram" counts: ucount = Counter(seq) bcount = Counter([(seq[i],seq[i+1]) for i in range(len(seq)-1)]) and more generally the n-gram counts can be obtained using ncount = Counter([tuple(seq[i:i+n]) for i in range(0,len(seq)-n)]) Any of these can be converted to probability arrays by dividing by the sum, e.g.: nprob = np.array(ncount.values()) / float(sum(ncount.values()))

Chomsky (1957): Neither (a) 'colorless green ideas sleep furiously' nor (b) 'furiously sleep ideas green colorless, nor any of their parts, has ever occurred in the past linguistic experience of an English speaker. But (a) is grammatical, while (b) is not. Pereira (2001): trained statistical model on newspaper text (a) is 200,000 times more probable than (b) http://norvig.com/chomsky.html

4 3 2.2 7.4.8 5.3 1 0 6 Only recurrent states are 1 and 5, each trivially recurrent since it can only go to itself, and hence each forms its own recurrent class. Assume the Markov chain is in state 0 just before the first step, somequestions:

4 3 2.2 7.4.8 5 1 0 6.3 (a) What is the probability of being in state 7 after n steps? The unique path is a first step from 0 to 7, then n 1stepsfrom7toitself,hencethe probability is (.4) n 1.

4 3 2.2 7.4.8 5 1 0 6.3 (b) What is the probability of reaching state 2 for the first time on thek th step? The unique path for this is a first step from 0 to 7, then k 2stepsfrom7toitself,then astepfrom7to2,hencetheprobabilityis()(.4) k 2 (.2)

4 3 2.2 7.4.8 5 1 0 6.3 (c) What is the probability of never reaching state 1? The only way to avoid state 1 is to follow the path from 0 7 4. From state 7 it s twice as likely to go to state 4 as to state 2, therefore 2/3 of the time one will eventually end up in the direction of state 4 from state 7. Since the probability of going from state 0 to state 7 is 1/2, the overall probability of going from state 0 to state 4, and never reaching state 1, is (1/2) (2/3) = 1/3.

Sum a geometric series The infinite sum 1X S(q) = q n =1+q + q 2 + q 3 +... n=0 satisfies the relation qs(q) =q + q 2 + q 3 +...= S(q) 1 so 1 = S(q)(1 q) and S(q) = 1 1 q

Another way to reproduce the above is to note that the overall probability to reach state 2 from state 7 is to sum over paths that return k times to state 7, and use 1X n=0 p n = 1 1 p Then p 72 1 X k=0 p k 77 = p 72 1 p 77 =.2/.6 =1/3 Similarly, the probability to eventually reach state 4 from state 7 is p 74 1 X k=0 p k 77 = p 74 1 p 77 =.4/.6 =2/3

4 3 2.2 7.4.8 5 1 0 6.3 (d) What is the expected number of steps until hitting state 5?

waiting time =1/p! p If the probability of leaving a state (and never returning) is p, then the probability of leaving after exactly n steps is (1 p) n 1 p With q =1 p, the expectation value for the number of steps to leave is thus E[n] = 1X n=0 nq n 1 p = p @ @q 1X n=0 q n = p @ @q 1 1 q = p 1 (1 q) 2 = p p 2 = 1 p So for example it takes on average two flips of a fair coin to get the first head, or on average six rolls of a die to get the first 3 (of course in any particular trial, sometimes more and sometimes less). The expected number of steps to leave a state is also called the waiting time

4 3 2.2 7.4.8 5 X 1 X 0 6.3 (d) What is the expected number of steps until hitting state 5? The expected number of steps to leave a state E[n] is called the waiting time. If the exit probability is p, then the waiting time satisfies E[n] =1/p. Thus the expected numbers of steps to remain at states 7,4,6 respectively are: 1/.6 =5/3, 1/.8 =5/4, 1/.3 = 10/3, and the expected number of steps until hitting state 5 is thus 1+5/3+5/4+10/3 =71/4 simulation, cell 49: http://nbviewer.jupyter.org/url/courses.cit.cornell.edu/info2950_2017sp/resources/markov.ipynb

Important enough to prove in a second way, without derivative (see E/K book). eqs 20.2-20.4, p.638 Let X be a random variable counting the number of steps it takes for a process with probability p to occur for the first time, e.g., the number of flips of a coin to get first H. Has expectation value: E[X] =1 Pr[X = 1] + 2 Pr[X = 2] + 3 Pr[X = 3] +... Reorganize the j copies of Pr[X = j] using Pr[X k] =Pr[X = k]+pr[x = k + 1] +... and that Pr[X k] isjust(1 p) k 1 (probability that didn t happen in first k 1 steps), E[X] =1 Pr[X = 1] + 2 Pr[X = 2] + 3 Pr[X = 3] +... =Pr[X 1] + Pr[X 2] + Pr[X 3] +... =1+(1 p)+(1 p) 2 +...= 1 1 (1 p) = 1 p.

4 3 2.2 7.4.8 5 1 0 6.3 (e) What is the probability of eventually hitting state 5? Same probability of 1/3 as (c), since any path that doesn t reach 1 eventually hits 5. More specifically, the probability of going from state 0 to state 7 is 1/2, of eventually going from 7 4is2/3,andfromstate4oneeventuallyhitsstate5, so the overall probability of going from state 0 to state 5 is p(0 5) = (1/2) (2/3) = 1/3.

4 3 2.2 7.4.8 5 1 0 6.3 (f) What is the probability of being in state 4 after two steps, given that one is in state 5 after 8 steps? This is given by the joint probability of being in state 4 after twostepsand being in state 5after8steps,dividedbytheoverallprobabilityofbeinginstate5after8steps p(state 4 after 2 steps state 5 after 8 steps) = p(state 4 after 2 steps, state 5 after 8 steps) p(state 5 after 8 steps)

probset 8 Nucleobases: A = Adenine G = Guanine C = Cytosine T=Thymine paired along (double-stranded) backbone: AT and CG https://en.wikipedia.org/wiki/nucleobase Size of the human genome = 3000Mbp = 3x10 9 3x10 9 x 2 (bits/base) / 8 (bytes/bit) = 750MB

https://ghr.nlm.nih.gov/gene/dmd#location DMD gene (dystrophin), length = 2,241,765 (the biggest!) Molecular Location: base pairs 31,119,219 to 33,339,609 on the X chromosome CGTTAAATGCAAACGCTGCTCTGGCTCATGTGTTTGCTCCGAGGTATAGGTTTTGTTCGACTGACGTATC TATTATAGTACTGCTTTACTGTGTATCTCAATAAAGCACGCAGTTATGTTACAAAAAAGTA

https://ghr.nlm.nih.gov/gene/opn1mw#location OPN1MW gene (opsin 1, medium wave sensitive), length = 14266 Molecular Location: base pairs 154,182,596 to 154,196,861 on the X chromosome CCCACTGGCCGGTATAAAGCACCGTGACCCTCAGGTGACGCACCAGGGCCGGCTGCCGTCGGGGACAGGG TCTTGAAGGTCTCCGTGATCTCCTGCAGGAGACGAAAATGCACGCACCAGAAGTCA