BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 4: Introduction to Probability and Random Data

Similar documents
BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 4: Introduction to Probability and Statistics

3.2 A2 - Just Like Derivatives but Backwards

EECE 2150 Circuits and Signals Final Exam Fall 2016 Dec 9

Math 171 Spring 2017 Final Exam. Problem Worth

Time Domain Analysis of Linear Systems Ch2. University of Central Oklahoma Dr. Mohamed Bingabr

Variations. ECE 6540, Lecture 02 Multivariate Random Variables & Linear Algebra

BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 5: Introduction to Statistics

BIOE 198MI Biomedical Data Analysis. Spring Semester Lab 6: Least Squares Curve Fitting Methods

A Step Towards the Cognitive Radar: Target Detection under Nonstationary Clutter

Wave Motion. Chapter 14 of Essential University Physics, Richard Wolfson, 3 rd Edition

10.4 The Cross Product

EECE 2150 Circuits and Signals Final Exam Fall 2016 Dec 12

Haar Basis Wavelets and Morlet Wavelets

Lecture 7 Random Signal Analysis

Exam Programme VWO Mathematics A

ECE 6540, Lecture 06 Sufficient Statistics & Complete Statistics Variations

Quantum Mechanics. An essential theory to understand properties of matter and light. Chemical Electronic Magnetic Thermal Optical Etc.

EE 341 A Few Review Problems Fall 2018

Lecture No. 1 Introduction to Method of Weighted Residuals. Solve the differential equation L (u) = p(x) in V where L is a differential operator

Lecture 3. STAT161/261 Introduction to Pattern Recognition and Machine Learning Spring 2018 Prof. Allie Fletcher

2 Background: Fourier Series Analysis and Synthesis

Model-building and parameter estimation

Chapter 22 : Electric potential

ECE Homework Set 2

Physics with Matlab and Mathematica Exercise #1 28 Aug 2012

F.1 Greatest Common Factor and Factoring by Grouping

Discrete Structures Lecture Solving Congruences. mathematician of the eighteenth century). Also, the equation gggggg(aa, bb) =

General Linear Models. with General Linear Hypothesis Tests and Likelihood Ratio Tests

Work, Energy, and Power. Chapter 6 of Essential University Physics, Richard Wolfson, 3 rd Edition

EM375 STATISTICS AND MEASUREMENT UNCERTAINTY INTRODUCTION TO PROBABILITY

CDS 101/110: Lecture 3.1 Linear Systems

Decision Science Letters

Stochastic Processes

Radial Basis Function (RBF) Networks

Prof. Dr.-Ing. Armin Dekorsy Department of Communications Engineering. Stochastic Processes and Linear Algebra Recap Slides

GEORGIA INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL AND COMPUTER ENGINEERING

Angular Momentum, Electromagnetic Waves

ECE-340, Spring 2015 Review Questions

Logarithmic Functions

SECTION 2: VECTORS AND MATRICES. ENGR 112 Introduction to Engineering Computing

IT 403 Statistics and Data Analysis Final Review Guide

Unit II. Page 1 of 12

Lesson 9: Law of Cosines

SECTION 7: FAULT ANALYSIS. ESE 470 Energy Distribution Systems

Lecture 2: Plasma particles with E and B fields

Lab 4: Quantization, Oversampling, and Noise Shaping

Now, suppose that the signal is of finite duration (length) NN. Specifically, the signal is zero outside the range 0 nn < NN. Then

ELEG 3143 Probability & Stochastic Process Ch. 6 Stochastic Process

Lesson 1: Successive Differences in Polynomials

SOLUTIONS to ECE 2026 Summer 2017 Problem Set #2

P 1.5 X 4.5 / X 2 and (iii) The smallest value of n for

Multiple Choice Answers. Math 110: Algebra for Trig and Calculus Tuesday, October 17, 2017 Exam 2 Fall 2017

CHAPTER 5 Wave Properties of Matter and Quantum Mechanics I

Grover s algorithm. We want to find aa. Search in an unordered database. QC oracle (as usual) Usual trick

NCSS Statistical Software. Harmonic Regression. This section provides the technical details of the model that is fit by this procedure.

A brief summary of the chapters and sections that we cover in Math 141. Chapter 2 Matrices

ECE Lecture 4. Overview Simulation & MATLAB

1 Introduction & Objective

Lesson 24: True and False Number Sentences

Worksheets for GCSE Mathematics. Solving Equations. Mr Black's Maths Resources for Teachers GCSE 1-9. Algebra

Lesson 18: Recognizing Equations of Circles

Control of Mobile Robots

DISCRETE RANDOM VARIABLES EXCEL LAB #3

SECTION 5: POWER FLOW. ESE 470 Energy Distribution Systems

CDS 101/110: Lecture 3.1 Linear Systems

6.5 Dividing Polynomials

National 5 Mathematics. Practice Paper E. Worked Solutions

33. SOLVING LINEAR INEQUALITIES IN ONE VARIABLE

Electric Field and Electric Potential (A)

Physics 141 Second Mid-Term Examination Spring 2015 March 31, 2015

ELEC273 Lecture Notes Set 11 AC Circuit Theorems

TECHNICAL NOTE AUTOMATIC GENERATION OF POINT SPRING SUPPORTS BASED ON DEFINED SOIL PROFILES AND COLUMN-FOOTING PROPERTIES

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

EE120 Fall 2016 HOMEWORK 3 Solutions. Problem 1 Solution. 1.a. GSI: Phillip Sandborn

Mathematical Notation

MATH REFRESHER ANSWER SHEET (Note: Only this answer sheet and the following graph page will be evaluated)

!P x. !E x. Bad Things Happen to Good Signals Spring 2011 Lecture #6. Signal-to-Noise Ratio (SNR) Definition of Mean, Power, Energy ( ) 2.

EE3210 Lab 3: Periodic Signal Representation by Fourier Series

An Approach for Entropy Assessment of Ring Oscillator- Based Noise Sources. InfoGard, a UL Company Dr. Joshua Hill

Calculus Summer Math Practice. 1. Find inverse functions Describe in words how you use algebra to determine the inverse function.

The domain and range of lines is always R Graphed Examples:

ECE KIRCHHOFF'S LAWS - INVESTIGATION 8 KIRCHHOFF'S VOLTAGE LAW

4.1 All in a Row (BC)

Gravitation. Chapter 8 of Essential University Physics, Richard Wolfson, 3 rd Edition

Random Vectors Part A

EXPERIMENTALLY DETERMINING THE TRANSFER FUNCTION OF A SPRING- MASS SYSTEM

Experiment 3: Resonance in LRC Circuits Driven by Alternating Current

Worksheets for GCSE Mathematics. Quadratics. mr-mathematics.com Maths Resources for Teachers. Algebra

Stochastic Modelling

F.1 Greatest Common Factor and Factoring by Grouping

Module 7 (Lecture 27) RETAINING WALLS

Specialist Mathematics 2019 v1.2

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 Text Processing and High-Dimensional Data

CH 61 USING THE GCF IN EQUATIONS AND FORMULAS

Monte Carlo (MC) simulation for system reliability

Rotational Motion. Chapter 10 of Essential University Physics, Richard Wolfson, 3 rd Edition

Lecture 3. Linear Regression

MATH EVALUATION. What will you learn in this Lab?

Multiple Regression Analysis: Inference ECONOMETRICS (ECON 360) BEN VAN KAMMEN, PHD

Hopfield Network for Associative Memory

Transcription:

BIOE 98MI Biomedical Data Analysis. Spring Semester 209. Lab 4: Introduction to Probability and Random Data A. Random Variables Randomness is a component of all measurement data, either from acquisition noise, uncertainties in the measurement process, or randomness in the measured object. Let random variable X, with specific values x, represent measurement data that are not exactly predictable. While deterministic variables are known exactly, random variables are known only statistically. This means the likelihood of the next data value is predictable by a frequency distribution. Examples of random 8 300 7 XX = {xx} 250 frequency distribution of random variable X found by 6 histogramming 000 values x 200 5 4 50 event x 3 2 freqneucy of events p(x) 00 50 0 0 0 20 30 40 50 60 time sequence 0-2 0 2 4 6 8 0 event x Figure. Illustration of random variable X having specific values x (left). Each value is drawn from a frequency distribution (right) that is called a probability density function (pdf). This specific discrete random variable is from a Poisson process that will not be discussed in this lab. variables include the number of people in a crowd with blue eyes and the number of cells undergoing division is a cell culture. B. Random Process. Random process X is a dependent random variable expressed as a function of deterministic independent variables like time t and position (ξξ, γγ). A random process of all three variables is written as XX(tt, ξξ, γγ). Examples include image data recorded over time; e.g., a movie. C. Simulating -D Deterministic and Random Processes using Normally Distributed Process. In the following script, we simulate a voltage signal with noise. Let s generate N = 00 noise samples x(t) and plot just the first 5 samples. Then histogram the entire sequence and plot the normal pdf from which those samples were drawn to show there is a match. %% Normal pdf describing measurement noise X(t) clear all;close all; t=0:0.0:0.5; %time axis [s] for plotting y=0*sqrt(0.5-t); %arbitrary deterministic voltage y(t) subplot(2,,);plot(t,y);xlabel('t [s]');ylabel('y(t) [V]') N=00; %number of samples for histogram x=randn(,n); %generate normal random samples subplot(2,,2);stem(t,x(:5));xlabel('t [s]');ylabel('x(t) [V]') subplot(2,,);hold on;plot(t,y+x(:5),'o') legend('show', 'y(t)','y(t)+x(t)');hold off

figure;histogram(x,'normalization','pdf') xlabel('x [V]');ylabel('pdf(X)') hold on; z=pdf('norm',-4:0.:4,0,);plot(-4:0.:4,z);hold off figure;histfit(x);xlabel('x [V]');ylabel('hist(X)') % There are many things to note from this script and about random variable distributions in general. We can generate random-variable samples drawn from a standard normal probability density function (pdf) using randn(n,m); We can also model the normal pdf from which samples are drawn. That pdf has mean µ and standard deviation σ and is found using normpdf(x,m,s) or pdf('norm',x,m,s). The general equation is 2 2 ( x µ ) /(2 σ ) z px ( ; µσ, ) = e but the standard normal equation is 2 /2 pz ( ;0,) = e. σ 2π 2π where x = σ z + µ. Also dddd pp(xx) = not only for normal pdfs but for any pdf; e.g., Poisson, binomial. The probability that x falls between values a bb aa and b is Pr(aa < xx < bb) = dddd pp(xx) (fig 2). stem(x,y) is how to plot samples. What happens when N in the code is varied? What is the difference between histogram and histfit functions? (Hint: help histfit). What are the inputs to pdf? (See 2 nd to last line in code above) The horizontal axis in pdf plot often represents measurements and therefore has units. Exercise : (Lab basic skills revisited) (a) Using equations above for pp(zz; 0,) and pp(xx; μμ =, σσ = 2) to plot each. Do not use pdf or similar Figure 2. Three normal pdfs all with mean µ but with three different σ. The probability that samples fall within the range of a and b is Pr(a < x < b). The peaks are different because the areas intrinsic function from Matlab; program the equations. Be sure to label axes and add a legend. (b) Repeat using myplot. D. Visualization Options. One alternative way to visualize the relationship between random samples and the frequency distribution from which they are drawn (the pdf) is found in Fig. 3. Here we have a noisy time-varying voltage, vv(tt), measured at an outlet of a house with problems. Samples are summarized by the histogram where the histogram axis is aligned with the voltage axis. 00 s of data is sampled at 0 ms (00 s / 0.0 s = 0,000 samples). The mean voltage is vv (tt) = 20 V, and the standard deviation is σ = 0 V. 2

Fig. 3. Noisy voltage trace on left. A histogram of 0 4 voltage values on the right is over 00 s of measurements ( t=0.0s). E. Probability and pdf area: introducing cumulative distribution function (CDF). The total area under any pdf must equal one. In words, the probability of something occurring is certain. In math terms, Pr( < vv < ) = dddd pp(vv) = σσ 2ππ dddd ee (vv μμ)2 /2σσ 2 =.0 = CCCCCC(). What is the probability of finding voltages in Fig 3 at any instant of time with values between σσ ±σ? A. Pr( σσ < vv < σσ) = dddd pp(vv) 0.68. I found that number via CDFs. Let s explore. σσ aa Cumulative distribution function (CDF) is CCCCCC(aa) = Pr( < vv < aa) = dddd pp(vv). See Fig 4 (left). In Matlab: cdf('norm',0,0,), normcdf(0,0,), and cdf('norm',20,20,0) all equal 0.5. Do you see why? Find Pr(-σ<X<σ), Pr(-2σ<X<2σ), and Pr(-<X<). Fig 4. Standard normal pdfs (converted from the voltage pdf in Fig 3. Illustrated are (left) Pr(- < X < 0) = 0.5; (center) Pr(-σ < X < σ) = 0.68; (right) Pr(-2σ < X < 2σ) = 0.95. In contrast we might have the probability value α and now wish to know the value of z = a where CDF(a) = α. For this, we use the Matlab function icdf('norm',0.5,0,) which we find is zero. Do you see why? Let s plot the results and see why this is true. I hope that you have a clearer sense of what is meant by statistically knowing a random variable. This case assumes either a normal random voltage v(t) or its standard normal equivalent z(t). F. Mean, variance and standard deviation of a normal random variable. Note that the expression XX~pp(xx; μμ, σσ) is shorthand to indicate X is a normal random variable that is specified entirely by just two parameters, µ and σ. We can (but won t) show µ is the population mean or ensemble mean, and σ is population standard deviation or ensemble standard deviation. Both values are model parameters that are difficult to measure exactly. 3

σ 2 indicates the ensemble variance for a normal random process. The ensemble mean is the most likely value and the peak of the normal pdf. The ensemble standard deviation is a measure of the normal pdf width. Estimate µ : the sample mean from N measurements or samples is xx = NN xx NN nn= nn. Estimate σ 2 : the sample variance from N samples is ss 2 = NN (xx NN nn= nn xx ) 2. The sample standard deviation is the square root of the sample variance, ss = ss 2. Another Matlab tool of interest is rng used to seed a random number generator. rng('default');c=randn(3,) %resets seed to reproduce c w/ea call rng('shuffle');c=randn(3,) %seed randn with clock time to vary c Assignment (Introduction Section of Lab Report): In this assignment, we apply waveform averaging to noisy measurements to improve data quality. The first stage of the assignment is to simulate M = 00 voltage waveforms each consisting of the same deterministic signal plus independent, zero-mean noise. The next stage is to average signals and measure the standard deviation of the waveform-averaged noise. Finally, I ask that you report on changes to the signal-to-noise ratio (SNR) as a function of the number of waveforms averaged. Theory suggests that the sample standard deviation s at each time point in a voltage waveform with additive normally distributed independent noise should decrease by a factor of / NN where N is the number of waveforms averaged. I want you to use signal simulations to test that idea. Simulate 00 noisy sinusoidal voltage waveforms, where in each waveform the deterministic signal is the same Hz, 0-cycle sine wave xx(tt) = sin (2ππuu 0 tt). Add an independent, normally distributed noise trace, nn(tt)~pp(xx; 0, σσ), to the same sinusoidal signal to model 00 noisy measurements of a signal. How much noise do we add? That question is specified by the signal-to-noise ratio (SNR). Set SNR = db. Then find the noise amplitude σ by solving for σ in the equation, SSSSSS (db) = 0 log 0 MMMMMM 0TT σσ2, where MMMMMM = 0 dddd sin 2 (2ππ uu 0TT 0 tt) is the mean- 0 0 squared value of x(t) measured in volts squared, and uu 0 = TT 0 is the sinusoidal frequency in Hz. Simulate the data (Methods Section of Lab Report) (a) First, compute MSV using your calculus skills. (b) Second, compute σ by combining the SNR equations and MSV. (c) Generate a time axis in [s], the sinusoidal signal in [V], and an M N matrix of standard normal random noise series, zz(tt) where M = 00 and NN = 0/ tt +. You must also select a sampling interval t [s]. Be sure to scale each row of the standard normal random samples in array z to find, nn(tt) = σσσσ(tt). Now the noise will correspond to the correct SNR. 4

(d) Use a for loop to add signal xx(tt) to each independent noise trace nn(tt). You should generate 00 waveforms yy(tt) = xx(tt) + nn(tt). Now you have the simulated data! In a 4 subplot, plot one of the recorded waveforms that you just simulated. Analyze the waveform data (Results Section of Lab Report) (e) Compute the standard deviation of the noise in the first waveform using ssssss(nn; ) = NN (yy NN ii= (tt ii ) xx(tt ii)) 2. Note the averaging is over time! Recall the additive noise was generated to be zero mean. However, each measured waveform contains the same non-zero-mean deterministic signal x, which must be subtracted; e.g., nn (tt) = yy (tt) xx(tt). (f) Average the first j waveforms, e.g., at j = 2, yy (tt; 2) = (yy (tt) + yy 2 (tt))/2. Repeat for each j over the range 2 jj 00 using yy (tt; jj) = jj (tt). For example, yy (tt; 25) represents the jj kk= yy kk mean temporal waveform from averaging 25 recorded waveforms. (g) For each j, compute the noise standard deviation, ssssss(nn; jj) = NN (yy (tt ii; jj) xx(tt ii )) 2 NN ii=. (h) Plot both yy (tt; 00) and ssssss(nn; jj) in the subplot matrix as plots 2 and 3, respectively. (i) Compute and plot SNR ( j ) as a function of the number of waveforms averaged, j. Discuss findings (Discussion Section of Lab Report) (j) The sample standard deviation of noise for one waveform is s. Theory tell us the standard deviation for an average of j waveforms should be ss xx = ss/ jj. On top of the plot of sample std versus j in part (g) above, plot the theoretical prediction. Plot every 0th point to more clearly show the match. (k) Summarize the findings and discuss the tradeoffs for deciding how many waveforms to average. How many waveforms must be averaged to increase the SNR 0 db? 5