Stat251/551 (Spring 2017) Stochastic Processes Lecture: 1 Introduction to Stochastic Processes Lecturer: Sahand Negahban Scribe: Sahand Negahban 1 Organization Issues We will use canvas as the course webpage. HW, solutions, lecture notes will be linked from canvas. The course book written by Prof. Chang is posted on classes V2. 1.1 Assignments There will be roughly eight homework assignments. We have posted the first one online. Homework assignments will be typically assigned on Monday and due the following Wednesday. Exceptions will be noted. Students are also expected to act as scribe a small number of lectures preparing a latex version of lecture notes. I will posted the template on classesv2. There will be three midterm exams for undergraduates and two for graduate students. Graduate students will complete a final project in lieu of the final midterm. Undergraduates with sufficiently high grades may also request to do a project. Students may complete projects in groups of at most 3. The breakdowns of how grades will be assigned can be found on the course syllabus. 2 Overview The course will introduce basic ideas in stochastic processes. We will follow the textbook developed by Prof. Joe Chang. The book is posted on classesv2. Stochastic processes are simply collections of random objects. There are many examples and applications of stochastic processes: Images: We can think of an image as a collection of pixels. For simplicity let us consider a grey-scale image. Then each pixel tells us an intensity of brightness. Furthermore, pixels are related to eachother. Pixels that are close to each other should usually have similar colors. If we consider the image in Figure 2 that is 5 pixels by 5 pixels then we can think of each circle as a pixel that is organized on a grid. Stock prices: We can plot the price of an equity as a function of time in Figure 2 Protein folding: In bioinformatics we are frequently interested in understanding the dynamics of a specific protein. This might be because imaging technology is not sufficiently advanced or we wish to develop a new protein and understand how it behaves without needing to create the protein in the lab. Often times the dynamics of these systems are treated as a Markov chain, which we will discuss in more detail in the upcoming lectures. 1
2 Introduction to Stochastic Processes Figure 1: Cartoon of a five by five pixel image of the letter Y Electromagnetic signals: In Electrical Engineering applications we often wish to estimate some electromagnetic signal, for example cell phone, GPS, wifi, or radio signals. These are frequently modeled as stochastic processes because there are certain factors that are unknown to the receiver. For example the receiver does not know what the transmitter aims to send and the transmitter s signal can be corrupted by noise. Speech recognition: One simple and successful model for speech recognition is known as a Hidden Markov Model or HMM. We can think of spoken language as a sequence (x 1, x 2, x 3,...) of idealized phonemes where we only can hear the speakers interpretation of what that phoneme should sound like. For speech recognition application we wish to infer the collection of the most likely phonemes. 3 Prerequisites Before continuing with our discussion of stochastic processes we will lay down some notation that arises from first year probability that you are all expected to understand. 3.1 Probability We refer to an experiment as any event whose result is unknown in advance. The result of that experiment is known as the outcome. We will denote the sample space Ω to be the set of all possible outcomes. Example (Flipping two coins). Flipping a coin (usually) has one of two possible outcomes: heads (H) or tails (T). Two coin flips now have four possible outcomes: HH, HT, TH, and TT. Therefore, Ω = {HH, HT, TH, TT}. (Note that we have said nothing about probabilities). We use probability to help us understand the uncertainty. Probability theory has been a very successful model for understanding uncertainty. Probability allows us to formally capture that uncertainty: Probability theory is nothing but common sense reduced to calculation. Probability allows us to quantify the uncertainty of events. An event is any subset of the sample space Ω. For example, if we flip two coins then we can consider the event that each of the coin flips was the same. Formally, we let A = {HH,TT}. Clearly A is a subset of the sample space Ω from above. We can also consider another event B = {HH,HT}. Clearly B is
Introduction to Stochastic Processes 3 0 20 40 60 80 100 1996 04 12 1998 08 17 2000 12 19 2003 05 05 2005 09 08 2008 01 16 2010 05 24 2012 09 26 Figure 2: Daily stock prices for Yahoo! the event that the first coin flip is heads. Since A and B are both sets we can do the standard operations like intersections, set complements, etc... Assigning probabilities: Probability is a way of quantifying the level of uncertainty of an event. We use a function P(A) that assignes a number between 0 and 1 to any (technically measurable) set or event. Furthermore, we assume that for any countably infinite set of disjoint subsets A 1, A 2, A 3,... we have that P( A i ) = P(A i ) i=1 Recall that two events A and B are independent if i=1 P(A B) = P(A)P(B). (1) For example we usually assume that the probability that a fair coin flip comes up heads is one half. That is, P ({H}) = 0.5. Furthermore, we assume that all coin flips from the same coin are independent. Note that we distinguish between a normal P and a probability function P. 3.2 Conditional Probabilities A conditional probability is intuitively the probablity that a certain event happens if we know that another one happen. More concretely, given two sets A and B we define the conditional probability of A given B as P(A B) = P(A B) P(B) Note that if two events A and B are independent then P(A B) = P(A). This point is intuitively clear since independence should mean that B has no information about A. An alternative way to consider conditional
4 Introduction to Stochastic Processes probabilities is P(A B) = P(A B)P(B) The above formulation is a bit more intuitive. It effectively says that the probability of both A and B happening is equal to the probability that B happens times the probability that A happens when we know that B already happened. Given conditional probabilities we have Bayes Rule P(A B) = P(B A)P(A) P(B) We also have the Law of Total Probability which states that P(A) = P(A B)P(B) + P(A B c )P(B c ) (Recall that B c is the set complement.) Using Bayes rule and the law of total probability we have that P(A B) = P(B A)P(A) P(B A)P(A) + P(B A c )P(A c ) 3.3 Random Variables and Expectations For the technically inclined recall that a random variable X is a measurable real-valued function that maps values ω Ω to the reals R. An example of this is a coin flip. We might consider the random variable X to be function that maps an outcome of a heads to 1 and an outcome of a tails to 0. This is a bit abstract so we will often just talk about the probability that a random variable takes on a certain value. 3.3.1 Discrete Random Variables Discrete-valued random variables can take on a finite or countably-infinite number of values. For each possible value x of the random variable X we will assign some positive probability P(X = x) > 0. Example (Binomial distribution). Suppose that a basketball player take n free throws. Suppose that each free throw has probability of p of success and that all free throws are independent of eachother. Let the random variable X n count the total number of free throws that go in. Then P(X n = k) = ( n k ) p k (1 p) n k for k {0, 1,..., n}. The distribution of X n is known as the Binomial distribution with parameters n and p. Clearly in n coin flips the maximum number of heads is n and the minimum is 0. Thus, the total number of values that X n can take is n + 1, so this is an example of a discrete random variable that takes on a finite number of values. Next we consider a discrete random variable that can take on an infinite number of values Example (Poisson distribution). A random variable X follows the Poisson distribution if λ λk P(X = k) = e k! for any integer k 0. Here, X can take on a countably-infinite number of values. Poissons are frequently used to model the number of photons that hit a silicon sensor in extrem low-light situations. (Like the ones in your digital cameras). Often times we will associate a probability mass function p X (k) or PMF with the discrete random variable X such that p X (k) = P(X = k). We will frequently drop the subscript X when the context is clear.
Introduction to Stochastic Processes 5 3.3.2 Continuous Random Variables Next we consider continuous random variables. These random variables can take on any values in the real line. For example the total time it takes you to finish your homework. A continuous random variable X has an associated probability density function or PDF f X (t) if for all a b P(a X b) = b a f X (t)dt We often drop the subscript and write f(x) with the lower-case letter x to make clear that f is the PDF of X. We require that f(x) 0 and f(x) = 1. Note that in the continuous case it is not true that P(X = x) = f X (x). 3.3.3 Joint Distributions Getting closer to stochastic processes, if we have a collection of random variables X 1, X 2,..., X n we wish to discuss the joint randomness. When the random variables are discrete we can simply write p(x 1, x 2,..., x n ) = P(X 1 = x 1,..., X n = x n ) and assign probabilities to those joint outcomes. That gives us the joint PMF. For continuous random variables we specific a joint PDF f(x 1,..., x n ) such that P((X 1,..., X n ) A) = f(x 1, x 2,..., x n )dx 1 dx 2... dx n. (x 1,x 2,...,x n) A From the joint distribution we can always recover the marginal distributions. In the discrete case we have p X (x) = P(X = x) = y p X,Y (x, y) and in the continuous case we have f X (x) = f X,Y (x, y) y Throughout the course when we write a sum as x (or an integral as ) we mean that the sum (respectively y integral) should be taken over all possible values of the index. 3.3.4 Independence of Random Variables We say that two random variables X and Y are independent if P(X A, Y B) = P(X A)P(Y B). Note that if two random variables X and Y are independent then for any two function f and g we have that the random variables f(x) and g(y ) are also independent. 3.3.5 Expected Values We denote the expected value of a random variable X as E(X). For a discrete valued random variable we let E(X) = xp(x = x). x
6 Introduction to Stochastic Processes For a continuous valued random variable we let E(X) = xf X (x)dx We define the variance of a random variable to be var(x) = E((X E(X)) 2 ) and the mean is simply EX. Recall that for two random variable X and Y E(X + Y ) = E(X) + E(Y ). x