Lecture 2. Random variables: discrete and continuous

Similar documents
d) There is a Web page that includes links to both Web page A and Web page B.

Galileo Galilei ( ) Title page of Galileo's Dialogue concerning the two chief world systems, published in Florence in February 1632.

Conditional expectation and prediction

Lecture 2: Introduction to Probability

课内考试时间? 5/10 5/17 5/24 课内考试? 5/31 课内考试? 6/07 课程论文报告

The dynamic N1-methyladenosine methylome in eukaryotic messenger RNA 报告人 : 沈胤

Integrated Algebra. Simplified Chinese. Problem Solving

On the Quark model based on virtual spacetime and the origin of fractional charge

2012 AP Calculus BC 模拟试卷

Source mechanism solution

Concurrent Engineering Pdf Ebook Download >>> DOWNLOAD

Riemann s Hypothesis and Conjecture of Birch and Swinnerton-Dyer are False

Chapter 2 the z-transform. 2.1 definition 2.2 properties of ROC 2.3 the inverse z-transform 2.4 z-transform properties

系统生物学. (Systems Biology) 马彬广

Molecular weights and Sizes

GRE 精确 完整 数学预测机经 发布适用 2015 年 10 月考试

Atomic & Molecular Clusters / 原子分子团簇 /

USTC SNST 2014 Autumn Semester Lecture Series

通量数据质量控制的理论与方法 理加联合科技有限公司

Chapter 4. Mobile Radio Propagation Large-Scale Path Loss

上海激光电子伽玛源 (SLEGS) 样机的实验介绍

0 0 = 1 0 = 0 1 = = 1 1 = 0 0 = 1

Digital Image Processing. Point Processing( 点处理 )

5. Polymorphism, Selection. and Phylogenetics. 5.1 Population genetics. 5.2 Phylogenetics

Hebei I.T. (Shanghai) Co., Ltd LED SPECIFICATION

Lecture 4-1 Nutrition, Laboratory Culture and Metabolism of Microorganisms

A Tutorial on Variational Bayes

课内考试时间 5/21 5/28 课内考试 6/04 课程论文报告?

A proof of the 3x +1 conjecture

ArcGIS 10.1 for Server OGC 标准支持. Esri 中国信息技术有限公司

Synthesis of PdS Au nanorods with asymmetric tips with improved H2 production efficiency in water splitting and increased photostability

Principia and Design of Heat Exchanger Device 热交换器原理与设计

澳作生态仪器有限公司 叶绿素荧光测量中的 PAR 测量 植物逆境生理生态研究方法专题系列 8 野外进行荧光测量时, 光照和温度条件变异性非常大 如果不进行光照和温度条件的控制或者精确测量, 那么荧光的测量结果将无法科学解释

Z-Factory Physics. Physics for Modernized Z-Factory. Chao-Hsi Chang (Zhao-Xi Zhang) ITP, CAS

生物統計教育訓練 - 課程. Introduction to equivalence, superior, inferior studies in RCT 謝宗成副教授慈濟大學醫學科學研究所. TEL: ext 2015

2. The lattice Boltzmann for porous flow and transport

三类调度问题的复合派遣算法及其在医疗运营管理中的应用

Cooling rate of water

Explainable Recommendation: Theory and Applications

Chapter 2 Bayesian Decision Theory. Pattern Recognition Soochow, Fall Semester 1

1. Length of Daytime (7 points) 白昼长度 (7 分 )

QTM - QUALITY TOOLS' MANUAL.

APPROVAL SHEET 承认书 厚膜晶片电阻承认书 -CR 系列. Approval Specification for Thick Film Chip Resistors - Type CR 厂商 : 丽智电子 ( 昆山 ) 有限公司客户 : 核准 Approved by

NiFe layered double hydroxide nanoparticles for efficiently enhancing performance of BiVO4 photoanode in

西班牙 10.4 米 GTC 望远镜观测时间申请邀请

The preload analysis of screw bolt joints on the first wall graphite tiles in East

Theory of Water-Proton Spin Relaxation in Complex Biological Systems

Easter Traditions 复活节习俗

Halloween 万圣节. Do you believe in ghosts? 你相信有鬼吗? Read the text below and do the activity that follows. 阅读下面的短文, 然后完成练习 :

沙强 / 讲师 随时欢迎对有机化学感兴趣的同学与我交流! 理学院化学系 从事专业 有机化学. 办公室 逸夫楼 6072 实验室 逸夫楼 6081 毕业院校 南京理工大学 电子邮箱 研 究 方 向 催化不对称合成 杂环骨架构建 卡宾化学 生物活性分子设计

Conditional Probability

2018 Mid-Year Examination - Sec 3 Normal (Academic) Editing Skills, Situational Writing Skills, Continuous Writing Skills

偏微分方程及其应用国际会议在数学科学学院举行

( 选出不同类别的单词 ) ( 照样子完成填空 ) e.g. one three

Effect of lengthening alkyl spacer on hydroformylation performance of tethered phosphine modified Rh/SiO2 catalyst

X 射线和 γ 射线天文观测 王俊贤 中国科技大学天体物理中心

Lecture 13 Metabolic Diversity 微生物代谢的多样性

Geomechanical Issues of CO2 Storage in Deep Saline Aquifers 二氧化碳咸水层封存的力学问题

2NA. MAYFLOWER SECONDARY SCHOOL 2018 SEMESTER ONE EXAMINATION Format Topics Comments. Exam Duration. Number. Conducted during lesson time

Design, Development and Application of Northeast Asia Resources and Environment Scientific Expedition Data Platform

能源化学工程专业培养方案. Undergraduate Program for Specialty in Energy Chemical Engineering 专业负责人 : 何平分管院长 : 廖其龙院学术委员会主任 : 李玉香

Chapter 22 Lecture. Essential University Physics Richard Wolfson 2 nd Edition. Electric Potential 電位 Pearson Education, Inc.

A Tableau Algorithm for the Generic Extension of Description Logic

目錄 Contents. Copyright 2008, FengShui BaZi Centre < 2

Key Topic. Body Composition Analysis (BCA) on lab animals with NMR 采用核磁共振分析实验鼠的体内组分. TD-NMR and Body Composition Analysis for Lab Animals

2004 Journal of Software 软件学报

A STUDY OF FACTORS THAT AFFECT CAPILLARY

Mechatronics Engineering Course Introduction

There are only 92 stable elements in nature

Lecture English Term Definition Chinese: Public

Lecture Note on Linear Algebra 14. Linear Independence, Bases and Coordinates

Service Bulletin-04 真空电容的外形尺寸

Revision Booklet Grade 6 June 2018

Random Variables. Definition: A random variable (r.v.) X on the probability space (Ω, F, P) is a mapping

Enhancement of the activity and durability in CO oxidation over silica supported Au nanoparticle catalyst via CeOx modification

Integrating non-precious-metal cocatalyst Ni3N with g-c3n4 for enhanced photocatalytic H2 production in water under visible-light irradiation

Steering plasmonic hot electrons to realize enhanced full spectrum photocatalytic hydrogen evolution

Rigorous back analysis of shear strength parameters of landslide slip

Silver catalyzed three component reaction of phenyldiazoacetate with arylamine and imine

MASTER S DEGREE THESIS. Electrochemical Stability of Pt-Au alloy Nanoparticles and the Effect of Alloying Element (Au) on the Stability of Pt

tan θ(t) = 5 [3 points] And, we are given that d [1 points] Therefore, the velocity of the plane is dx [4 points] (km/min.) [2 points] (The other way)

Numerical Analysis in Geotechnical Engineering

Chapter 20 Cell Division Summary

XING Sheng-Kai LI Yun ZHAO Xue-Zhuang * CAI Zun-Sheng SHANG Zhen-Feng WANG Gui-Chang *

Chapter7 Image Compression

A new operational medium-range numerical weather forecast system of CHINA. NWPD/NMC/CMA (Beijing,CHINA)

暗物质 II 毕效军 中国科学院高能物理研究所 2017 年理论物理前沿暑期讲习班 暗物质 中微子与粒子物理前沿, 2017/7/25

Fabrication of ultrafine Pd nanoparticles on 3D ordered macroporous TiO2 for enhanced catalytic activity during diesel soot combustion

One step synthesis of graphitic carbon nitride nanosheets for efficient catalysis of phenol removal under visible light

A timed communication behaviour model for distributed systems

MILESTONES Let there be light

Preliminary design on the 5 MeV injector and 500 kv DC gun in IHEP

2019 年中国大学生物理学术竞赛 ( 华北赛区 ) 第一轮通知

國立中正大學八十一學年度應用數學研究所 碩士班研究生招生考試試題

Modeling effects of changes in diffuse radiation on light use efficiency in forest ecosystem. Wei Nan

Random Variables Example:

Sichuan Earthquake 四川地震

Effects of Au nanoparticle size and metal support interaction on plasmon induced photocatalytic water oxidation

INTERNATIONAL EXPERIENCE IN INNOVATIONS IN METRO RAIL. Gerald Ollivier Transport Cluster Leader Singapore, World Bank

Transcription:

Lecture 2 Random variables: discrete and continuous

Random variables: discrete Probability theory is concerned with situations in which the outcomes occur randomly. Generically, such situations are called experiments, and the set of all possible outcomes is the sample space corresponding to an experiment. - Tossing a coin 3 times: - A random variable is a function from sample space to real numbers: - Total # (number) of heads - Total # of tails - # of heads minus # of tails - - Discrete random variables can take only finite or countably infinite # of values - Toss a coil until a head turns up, # of tosses is countably infinite - A countably infinite set: one-to-one correspondence with the integers

- Tossing a coin 3 times: Random variables: discrete X=total # of heads - Probability mass function or frequency function - Cumulative distribution function (cdf, non-decreasing) frequency function cdf

Bernoulli Random Variables An experiment is either a success (1, probability p) or a failure (0): - An alternative representation - Indicator random variable - Examples: - Product quality assessment: pass/fail - A new-born: female/male - Computer commands in binary form: 1/0 - Schrödinger's cats??

The Binomial Distribution Perform n (fixed) success/failure experiments. X=how many successes. Example: 10 coin tosses, how many led to a head? - For a given X=k, any particular sequence of k successes occurs with probability - The total # of such sequences is - Finally n = 10 and p = 0.1 n = 10 and p = 0.5

The Binomial Distribution Example. If a single bit (0 or 1) is transmitted over a noisy communications channel, it has probability p of being incorrectly transmitted. To improve the reliability of the transmission, the bit is transmitted n times, where n is odd. A decoder at the receiving end, called a majority decoder, decides that the correct message is that carried by a majority of the received bits. Each bit is independently subject to being corrupted with the same probability p. The number of bits that is in error, X, is thus a binomial random variable with n trials and probability p of success on each trial (here a success is an error). If n=5, p=0.1, what is the probability that the message is correctly received?

The Binomial Distribution Example: If a single bit (0 or 1) is transmitted over a noisy communications channel, it has probability p of being incorrectly transmitted. Yeah!! = 0, 1 or 2 failures, k failures among n: binomial To improve the reliability of the transmission, the bit is transmitted n times, where n is odd. A decoder at the receiving end, called a majority decoder, decides that the correct message is that carried by a majority of the received bits. Much better reliability. Each bit is independently subject to being corrupted with the same probability p. The number of bits that is in error, X, is thus a binomial random variable with n trials and probability p of success on each trial (here a success is an error). If n=5, p=0.1, what is the probability that the message is correctly received?

The Geometric Distribution In independent Bernoulli trials, keep trying until first success. X=how many trials (including the success). For X=k, k-1 failures followed by a success. Due to independence, The Negative Binomial Distribution Keep trying until succeed r times. X=how many trials (including the success). Generalization of geometric distribution. For X=k, r-1 successes assigned to k-1 trails, before the last success. Negative binomial random variable=sum of independent geometric variables: Example: f f f f f s f f s f f f f f f f s f s (f=failure, s=success)

Example A: The probability of winning in a certain state lottery is p=1/9. The distribution of the number of tickets a person must purchase is a geometric random variable: Example B: He/she wants to win twice! Then the distribution is negative binomial with p=1/9, r=2:

The Hypergeometric Distribution Suppose that an urn contains n balls, of which r are black and n-r are white. Let X denote the number of black balls drawn when taking m balls without replacement. Then X follows the hypergeometric distribution: When n approaches infinity, let p=r/n, this distribution approximates a binomial distribution with p: Take m balls one by one, it may succeed (black) or fail (white).

The Poisson Distribution ( 参见北京大学数学系汪仁官 概率论引论 ) Rutherford 和 Geiger 观察放射性物质发射 α 粒子时发现, 在规定的一段时间内 (7.5 秒 ), 放射的粒子数 X 服从 Poisson 分布 为什么? 把体积 V 的放射性物质设想分割为 n 份相同体积的小块 ΔV=V/n. 假定 : 1. 对于每个小块而言, 在 7.5s 内放出一个粒子的概率 p 正比于 ΔV; 2. 在 7.5s 内每个小块放出 >1 个粒子的概率极低, 可认为是 0; 3. 各个小块是否放出粒子, 是相互独立的 在 7.5s 内, 体积为 V 的放射物质放出 k 个粒子这个事件, 可近似看作 n 个独立小块中, 恰有 k 个小块放出粒子, 其他不放, 所以近似服从二项分布 ( 无限细分, 得到精确表达 ):

The Poisson Distribution 记 np=λ( 反映总体而非小块的放射性 ), Poisson 分布是二项分布当 n 趋于无穷,np=λ 时的极限分布 ; 所以,n 较大,p 较小时, 可用 Poisson 分布来作二项分布的近似计算

The Poisson Distribution The # of independent events that happen in a time interval or space volume. - 电话系统 : 交换机服务大量客户, 客户行为独立, 单位时间内交换机被呼叫的次数 ; - 保险建模 : 例如合肥市包河区群众略诡异的事故 ( 淋浴时摔跤, 被雷击中 ) 稀少且独立发生 ; - 智能交通 : 交通不太拥挤, 单位时间内通过街道 / 路口的车辆数 ; - CCD 接收到的光子数 ( 如 X-ray 天文学 )

The Poisson Distribution The # of independent events that happen in a time interval or space volume. - 电话系统 : 交换机服务大量客户, 客户行为独立, 单位时间内交换机被呼叫的次数 ; - 保险建模 : 例如合肥市包河区群众略诡异的事故 ( 淋浴时摔跤, 被雷击中 ) 稀少且独立发生 ; - 智能交通 : 交通不太拥挤, 单位时间内通过街道 / 路口的车辆数 ; - CCD 接收到的光子数 ( 如 X-ray 天文学 )

Poisson and Binomial Distributions But how large is a large n, how small is a small p? Example: Two dice are rolled 100 times, count the # of double sixes as X. The distribution of X is binomial with n=100, p=1/36 (why?). Let s compare the two distributions (λ=np=2.78). Why should we do this? Poisson distribution is easier to parameterize and calculate.

λ=0.1 As n, p 0 Binomial distribution Poisson distribution λ=1 Binomial: n=10, p=0.1 λ=5 λ=10 As λ=np, Poisson distribution Gaussian / Normal distribution

A Glimpse at Poisson Process Poisson distribution often arises from Poisson process for the distribution of random events in a set S (1-d:time, 2-d: a plane, 3-d: a volume of space). If S 1, S 2,, S n are disjoint subsets of S, then the numbers of events in these subsets, N 1, N 2,, N n, are independent random variables that follow Poisson distributions with parameters λ S 1, λ S 2,, λ S n, where S i is the measure of S i (e.g. length, area, volume). Crucial assumptions: 互不相交子集 1.Events in disjoint subset are independent of each other; 2.Poisson parameter for a subset is proportional to the subset s size. More in stochastic processes.

Random variables: continuous Frequency / probability mass function probability density function (pdf) - What is P (X=c)? Differential form: Discrete cdf: Continuous cdf: Probability that X falls in an interval is:

Quantile ( 分位数 ) and its application The pth quantile x p satisfies cdf 50%-th quantile = median.

Quantile ( 分位数 ) and its application The pth quantile x p satisfies cdf 50%-th quantile = median. Example from my research: [O III] 5007Å line Why use quantiles? Quasar outflow physics hidden in the broad wings Need non-parametric measurement, because emission line has freaky profiles Quantile is sensitive to the flux under the broad wings 10% 50%90% of total flux 10% 50%90% of total flux W80 W80 Liu et al. (2013)

Random variables: continuous Uniform density:

Exponential distribution pdf: λ=0.5, 1, 2 cdf: Often used to model lifetimes or waiting times. Why? Assume: at any time, the probability that an electronic component breaks down is a constant λ. ( memoryless, no aging effects) Then: Prove this in your homework. ( 参见陈希孺 概率论与数理统计 2.1.3 节,2009 年版 52 页 )

Reverse question: Exponential distribution 1) An electronic component s lifetime is an exponential random variable, 2) it has lasted time s, what is the probability that it will last at least more time t? Result is independent of s memoryless, no aging effects (therefore inapplicable on human lifetimes).

Exponential distribution Memoryless character follows directly from Poisson process. An event happens at t 0, events occur in time as a Poisson process (with λ for unit time). T=length of time until next event. During time t, Poisson distribution with parameter λt, Don t be confused with the notations here [[ ]] No events, k=0,, an exponential distribution with λ. The distribution of time until the 3 rd event is similar, and is independent of the length of time between events 1 and 2. The lengths of time between events of a Poisson process are independent, identically distributed, exponential random variables.

The Gamma density Two parameters: α, λ, a flexible class for modeling nonnegative random variables, Gamma function, α=0.5, 1, λ=1 α=5, 10, λ=1 Shape parameter Scale parameter

The Gamma density Example: (Udias & Rice, 1975) The observed times separating a sequence of small earthquakes are fitted to a gamma density, and an exponential density. Why does gamma density works better? Because earthquakes are not memoryless! Exponential model: knowing that an earthquake had not occurred in the last t time tells us nothing about the next s time. Exponential model Gamma model: a large likelihood that the next earthquake will immediately follow any given one, and this likelihood decreases monotonically with time. Gamma model

Gaussian/Normal distribution Why is normal distribution everywhere? Because of the central limit theorem. (details later) Roughly, if a random variable is the sum/average of a large number of independent random variable, it is approximately normally distributed. Standard normal density has cdf has no close form.

Example: Gaussian/Normal distribution

Gaussian/Normal distribution Example: Q: why do you almost always find Gaussian noise on your CCD image?

Functions of a Random Variable Q: We know the pdf of particle velocity, what is the pdf of the kinetic energy? f X, F X f Y, F Y. Suppose that Start with a simpler case. and Y=aX+b, a>0 (a<0 similar).

Functions of a Random Variable Q: We know the pdf of particle velocity, what is the pdf of the kinetic energy? Find the density of X=Z 2, where Z~N (0,1). standard normal distribution cdf to pdf Differentiating (we circumvent cdf because it has no closed form), Finally, Gamma density, with α=λ=1/2. Chi-square density with 1 degree of freedom.

Functions of a Random Variable We always go through the same steps: Find the cdf of Y, diffcrentiate it to find pdf, specify in what region it holds. However, for any specific problem, proceeding from scratch is usually easier.

Generating pseudorandom numbers

Generating pseudorandom numbers To generate random variables with cdf F, just apply F -1 to uniform random numbers.

Generating pseudorandom numbers Example: Generating random variables from an exponential distribution. When simulating large queueing networks to assess the performance, one needs to generate random time intervals between customer arrivals, which might be exponentially distributed. cdf: F -1 : So if U is uniform on [0,1], then is the result. Actually is also uniformly distributed on [0, 1], so finally

Review: Conditional probability & Bayes rule

Laplace s law of succession Suppose that the sun has risen n times in succession; what is the probability that it will rise once more? Pierre-Simon Laplace (1749 1827)

Laplace s law of succession Suppose that the sun has risen n times in succession; what is the probability that it will rise once more? Assume the a priori probability for a sunrise on any day is a constant. Due to our total ignorance it will be assumed to take all possible values in [0, 1] with equal likelihood. This probability is treated as a random variable ξ uniformly distributed over [0, 1]. Thus ξ has the density function f such that f(p) = 1 for 0 p 1. Assume sunrises are independent events, if the true value of ξ is p, then the probability of n successive sunrises is p n.

Laplace s law of succession Suppose that the sun has risen n times in succession; what is the probability that it will rise once more? Law of total probability: Pass from the sum into an integral, Apply it for both n and n+1, taking the ratio,

Problem set #1 1. Do you agree with Laplace s result? No matter your answer is yes or no, elaborate your comments (e.g. meaning, assumptions, derivation, interpretation, Bayesian spirit,, if any). Use your critical thinking! 2. Find the distribution of Y=X-r, where X is a negative binomial random variable. Then write down the Taylor expansion of (1-x) -r. Now, do you understand why this distribution is called negative binomial? 3. Verify that Bayes rule indeed renders 7.5% in the cancer research mentioned in Lecture 1 (see slide pp. 27). 4. Statistics are there for decision against a background. How do you understand the word background? Provide a real example to validate your explanation. 5. Prove that under the memoryless assumption, the lifetime of electronic components follow the exponential distribution. 6. Create an exponential distribution using the F -1 technique. Plot a histogram of the data you generate overlaid with the expected pdf to show you are successful. Do the same thing on another 1-2 of any continuous distributions.