Entropy for complex systems and its use in path dependent non-ergodic processes
|
|
- Coleen Andrews
- 5 years ago
- Views:
Transcription
1 Entropy for complex systems and its use in path dependent non-ergodic processes Stefan Thurner rio oct
2 with Rudolf Hanel and Bernat Corominas-Murtra BCM, RH, ST, PNAS 112 (2015) rio oct
3 I will not talk about... rio oct
4 Remember Shannon-Khinchin axioms SK1: S depends continuously on p g is continuous SK2: S maximal for equi-distribution p i = 1/W g is concave SK3: S(p 1, p 2,, p W ) = S(p 1, p 2,, p W, 0) g(0) = 0 SK4: S(AB) = S(A) + S(B A) note: write entropy as S[p] = W i g(p i ). If SK1-SK4 g(x) = kx ln x rio oct
5 Shannon-Khinchin axiom 4 is non-sense for CS SK4 corresponds to Markovian and ergodic processes SK4 violated for non-ergodic systems nuke SK4 rio oct
6 The Complex Systems axioms SK1 holds SK2 holds SK3 holds S g = W i g(p i ), W 1 Theorem: All systems for which these axioms hold (1) can be uniquely classified by 2 numbers, c and d (2) have the entropy S c,d = e 1 c + cd [ W i=1 Γ (1 + d, 1 c ln p i ) c e ] e Euler const rio oct
7 Classification of entropies: order in the zoo entropy c d S BG = i p i ln(1/p i ) 1 1 S q<1 = 1 p q i q 1 (q < 1) c = q < 1 0 S κ = i p i(p κ i p κ i )/( 2κ) (0 < κ 1) c = 1 κ 0 S q>1 = 1 p q i q 1 (q > 1) 1 0 S b = i (1 e bp i) + e b 1 (b > 0) 1 0 S E = p i p i 1 i(1 e p i ) 1 0 S η = i Γ(η+1 η, ln p i) p i Γ( η+1 η ) (η > 0) 1 d = 1/η S γ = i p i ln 1/γ (1/p i ) 1 d = 1/γ S β = i pβ i ln(1/p i) c = β 1 S c,d = i erγ(d + 1, 1 c ln p i) cr c d rio oct
8 Distribution functions of CS if used in max ent principle p (1,1) exponentials (Boltzmann distribution) p e ax p (q,0) power-laws (q-exponentials) p 1 (a+x) b p (1,d>0) stretched exponentials p e axb p (c,d) all others Lambert-W exponentials p e aw (xb ) NO OTHER POSSIBILITIES if SK4 is violated rio oct
9 q-exponentials (powers) Lambert-exponentials 10 0 (b) d=0.025, r=0.9/(1 c) 10 0 (c) r=exp( d/2)/(1 c) p(x) c=0.2 c=0.4 c=0.6 c=0.8 p(x) (0.3, 4) (0.3, 2) (0.3, 2) (0.3, 4) (0.7, 4) (0.7, 2) (0.7, 2) (0.7, 4) x x rio oct
10 1 (1,0) violates K2 compact support of distr. function BG entropy violates K2 Stretched exponentials asymptotically stable c (c,d) entropy, d<0 Lambert W 1 exponentials (c,0) q entropy, 0<q<1 (c,d) entropy, d>0 Lambert W 0 exponentials 0 (0,0) violates K d rio oct
11 I will talk about origin of power laws rio oct
12 Power laws are pests rio oct
13 they are everywhere its hard to control them you never get rid of them rio oct
14 Classical routes to scaling are limited statistical mechanics: at phase transitions self-organised criticality multiplicative processes with constraints preferential processes generalized entropies rio oct
15 Routes to scaling I: at phase transitions statistical physics: power laws emerge right at phase transition this happens at the critical point power laws in various quantities: critical exponents various materials have same critical exponents behave identical universality rio oct
16 Routes to scaling II: self-organised criticality systems find critical points themselves without tuning since they are at the critical point power laws examples: sandpiles, cascading, earthquakes, systemic risk (Wiesenfeld) rio oct
17 routes to scaling III: multiplicative processes with constraints Gaussian distribution: add random numbers from same source (CLT) power laws: multiply random numbers and impose some constraints. For example: minimum number can not be smaller than X examples: wealth distribution, city size, double Pareto, language,... rio oct
18 routes to scaling IV: preferential processes process repeats proportional to how many times it occurred before linking probability of new node to network is proportional to degree of existing nodes examples: network growth, language formation rio oct
19 routes to scaling V: other mechanisms Levy stable processes constraint optimisation extreme value statistics return times rio oct
20 a few examples rio oct
21 City size MEJ Newman (2005) multiplicative rio oct
22 Rainfall SOC rio oct
23 Landslides SOC rio oct
24 Hurrican damages secondary (multiplicative)??? rio oct
25 Financial interbank loans multiplicative / preferential rio oct
26 Forrest fires in various regions SOC? rio oct
27 Moon crater diameters MEJ Newman (2005)??? rio oct
28 Gamma rays from solar wind MEJ Newman (2005) rio oct
29 Movie sales SOC rio oct
30 Healthcare costs multiplicative??? rio oct
31 Words in books MEJ Newman (2005) preferential / random / optimization rio oct
32 Citations of scientific articles MEJ Newman (2005) preferential rio oct
33 Website hits MEJ Newman (2005) preferential rio oct
34 Book sales MEJ Newman (2005) preferential rio oct
35 Telephone calls MEJ Newman (2005) preferential rio oct
36 Earth quake magnitude MEJ Newman (2005) SOC rio oct
37 Seismic events SOC rio oct
38 War intensity MEJ Newman (2005)??? rio oct
39 Killings in wars??? rio oct
40 Size of war??? rio oct
41 Wealth distribution MEJ Newman (2005) multiplicative rio oct
42 Family names MEJ Newman (2005)??? rio oct
43 More power laws... networks: literally thousands of scale-free networks allometric scaling in biology terrorist attacks particle physics dynamics in cities fragmentation processes random walks crackling noise growth with random times of observation blackouts fossil record bird sightings fluvial discharge, contact processes anomalous diffusion... rio oct
44 One phenomenon many explanations Zipf law in word frequencies Simon: preferential attachment Mandelbrot: constraint optimization Miller: monkeys produce random texts (Sole: information theoretic: sender and receiver) rio oct
45 did we miss something? rio oct
46 Many processes are history-dependent history-dependence: future actions depend on history of past actions possible outcomes Ω = {1, 2, 3,, N} probability for next outcome: p(x = i, t) = f(histogram(ω) up to time t) often past actions constrain possibilities for future sample space of these processes reduces as they unfold rio oct
47 Example: History-dependent processes 1) 2) 3) ) 5) 6) rio oct
48 Sample-space reduces and has a nested structure Ω 1 Ω 2... Ω N Ω rio oct
49 p(i) p(n) xn nx Site i Site i rio oct
50 This gives exact power laws! Probability to visit site i p(i) = 1 i rio oct
51 Proof by induction Let N = 2. There are two sequences φ: either φ directly generates a 1 with p = 1/2, or first generates 2 with p = 1/2, and then a 1 with certainty. Both sequences visit 1 but only one visits 2. As a consequence, P 2 (2) = 1/2 and P 2 (1) = 1. Now suppose P N 1 (i) = 1/i holds. Process starts with dice N, and probability to hit i in the first step is 1/N. Also, any other j, N j > i, is reached with probability 1/N. If we get j > i, we get i in the next step with probability ( P j 1 (i), which leads to a recursive scheme for i < N, P N (i) = 1 N 1 + ) i<j N P j 1(i). Since by assumption P j 1 (i) = 1/i, with i < j N holds, some algebra yields P N (i) = 1/i. rio oct
52 The role of noise φ... Sample Space Reducing process (SSR) φ R... Random walk Mix both processes Φ (λ) = λφ + (1 λ)φ R, λ [0, 1] Add noise with strength (1 λ) any power becomes possible p(i) = i λ noise (1 λ) is a surprise factor for SSR process rio oct
53 The role of noise result is exact too Clearly p (λ) (i) = N j=1 P (i j) p(λ) (j) holds, with P (i j) = λ j λ 1 λ N for i < j (SSR) N for i j > 1 (RW ) 1 N for i j = 1 (restart) We get p (λ) (i) = 1 λ N + 1 N p(λ) (1) + N j=i+1 λ j 1 p(λ) (j) to recursive relation p (λ) (i + 1) p (λ) (i) = λ 1 i p(λ) (i + 1) p (λ) (i) p (λ) (1) = i 1 exp j=1 ( ) 1 [ 1 + λ j = exp ) ( i 1 j=1 λ j i 1 j=1 log (1 + λ j exp ( λ log(i)) = i λ )] rio oct
54 History-dependent processes with noise b) 10 0 =1.0 =0.5 = =0.0 = Distance Number of site visits a) sim Rank Number of jumps T 4 10 How fast does power law converge to its limiting distribution? Answer: same convergence speed as central-limit theorem for iid processes (Berry-Esseen theorem) rio oct
55 SSR based Zipf law is extremely robust a) b) p(i) p(n) c) n Stair i Stair n i x'' x' x 0 1 N rio oct
56 Zipf law is remarkably robust accelerate SSR rio oct
57 SSR based Zipf law is extremely robust if priors are power q(x) x α, with α > 1 ZIPF if priors are polynomials ZIPF if priors are exponentials q(x) e βx with β > 0 UNIFORM rio oct
58 Conjecture: SSR has 2 attractors for limit distributions Zipf uniform any others? Noise: Zipf turns into power law with exponent λ rio oct
59 Language and word frequencies rank ordered distribution of word frequencies follows approximate power law f(r) r α α 1 closer look: α varies book by book quest for understanding: almost half a century: Zipf, Simon, Mandelbrot, Miller, Sole,... no-one of them can explain variation in α rio oct
60 The origin of species rio oct
61 How is a sentence formed? Its history dependent! start with any word in the dictionary second word is constraint by grammar second word is constraint by context to convey meaning: must reduce sample space come to the point! if sentence is not SSR it sounds crazy! rio oct
62 funny wolf word runs funny howls bites house can rucksack room light moon night green eats door open blue sun wind cloud buy high short grey jacket computer algebra write strong letter table fill information article punch hold line brown trash tree deer rabbit space fly mosquito bee window think red building hospital glass wine take sell plane envelope word runs funny howls bites house can rucksack room light moon night green eats door open blue sun wind cloud buy high short grey jacket computer algebra write strong letter table fill information article punch hold line brown trash tree deer rabbit space fly mosquito bee window think red building hospital glass wine take sell plane envelope wolf wolf word house can rucksack room light moon night green door open blue sun wind cloud buy high short jacket computer algebra write letter table fill information article punch hold line brown trash tree deer rabbit space fly mosquito bee window think red building hospital glass wine take sell plane envelope runs howls bites eats grey strong wolf word runs funny howls bites house can rucksack room light green eats door open blue sun wind cloud buy high short grey jacket computer algebra write strong letter table fill information article punch hold line brown trash tree deer rabbit space fly mosquito bee window think red building hospital glass wine take sell plane envelope moon night a) b) c) d) rio oct
63 Which word follows which? word i word i (a) word following i (b) sample space vol Ω i rio oct
64 Model for sentence formation: random books given: word-transition matrix M random sentences vocabulary is N words sentence length is L = pick one of N words. Say i. Write i in wordlist W = {i} jump to line i and pick word from Ω i. Say k W = {i, k} jump to line k, pick word from Ω k ; say j W = {i, k, j} repeat L times random sentence is formed repeat the process to produce N sent sentences random book has same word-transition matrix as actual book rio oct
65 Zipf exponent for random books 1.05 α model Decent of man American tragedy Ulysses David Copperfield Tale of two cities 0.85 Romeo and Juliet Origin of species Henry V Hamlet (a) Different forms of flowers α rio oct
66 Surprise factor how nested is a book? strict nestedness is clearly unrealistic. to what extent are there surprises? want number that specifies to what extent sample-space reduction is present in text nestedness n(m) = Ωi Ω j min( Ω i, Ω j ) (i,j) rio oct
67 Surprise factor explains Zipf exponent of individual books David Copperfield American tragedy nestedness n(m) Ulysses Tale of two cities Decent of man Origin of species Different forms of flowers Hamlet Henry V Romeo and Juliet (b) α rio oct
68 Zipf exponent and sample space α model (c) κ N=1000 N=10000 N= rio oct
69 SSR random walks on networks imagine a random network: directed, ordered, random pick start-node and end-node perform a random walk from start-node to end-node prediction visiting frequency of nodes follows Zipf law rio oct
70 Random walk on directed ordered NW is SSR a) Start 1/5 1/4 1/3 1/ pexit=1) pexit=0) 1 (1-pexit)/5 1 pexit Stop b) Node occupation probability 1/2 1/4 1/8 acyclic pexit=0.3 p -1 Path probability Path rank Node rank note fully connected rio oct
71 RW on not fully connected ER graphs are SSR rio oct
72 RW on various NW topologies rio oct
73 SSR processes and fragmentation stick of initial length L (spaghetti) mark it at some point break stick at a random position keep fragment with the mark and record its length break this fragment again at a random position, take fragment with mark, record its length repeat until fragment with mark reaches a minimal length, say the diameter of a spaghetti atom, note: size of all fragments more complicated than SSR (Krapivsky1994) rio oct
74 What is entropy of SSR process? Entropy is used to talk about three very different things (1)Entropy has to do with quantifying information production of a source (2)Entropy has to do with thermodynamics: for example du = T ds pdv (3)Entropy has to do with statistical inference: maximum for system in equilibrium For ergodic systems these topics are related In all cases entropy looks something like S = k W p i log p i i=1 This is not necessarily true for complex, non-ergodic systems! rio oct
75 Information production rate of SSR S IP = 1 2 log N + 1 rio oct
76 Phasespace growth of SSR S thermo = S c,d with (c, d) = (1 1 8, 0) rio oct
77 Max ent of SSR S max ent = W i=2 [ ( ) pi p i log p 1 + (p 1 p i ) log ( 1 p )] i p 1 rio oct
78 Conclusions strict SSR processes produce exact Zipf laws noise contribution determines power exponent applications in language formation diffusion on networks fragmentation processes human behaviour? SSR processes new & independent route to understand scaling SSR processes are non-ergodic and have 3 entropies rio oct
79 MEP works for Pólya urns rio oct
80 Pólya urns paradigmatic class of stochastic processes with path dependence self-reinforcement breaks multinomial structure of random processes a) b) rio oct
81 Pólya urns generalize hypergeometric models (sampling without replacement) generalize Bernoulli models (sampling with replacement) self-reinforcing the rich get richer or the winner takes all rio oct
82 Pólya urns are related to... beta-binomial distribution Dirichlet-processes Chinese restaurant problem models in population genetics generalizations of central limit theorem applications in adaptive clinical trials modeling tissue growth path dependent institutional development computer data structures reform resistance of sectors of EU politics aging of alleles and Ewens s sampling formula image segmentation and labeling evolutionary dynamics... rio oct
83 Pólya urn processes initial condition: a i balls of color i = 1,, W initial prior probabilities to draw a ball: q i = a i /A 0, total number of balls initially in the urn: A 0 = i a i drawing without replacement hypergeometric process drawing with replacement multinomial process Replace drawn ball of color i by δ balls of same color Polya urn priors... θ = (q 1,, q W ; A 0, δ, N) λ = δ A 0 rio oct
84 Polya urn processes Let δ > 0. After N trials balls of type i: a i (x) = a i + δk i total number A(N) = a i (x) = A 0 + δn conditional probability of drawing ball of type i at step N + 1 p(i k, θ) = a i(x) A(N) = a i + δk i A 0 + δn, Start with empty sequence: x(0) = [.] Compute probability of sequence and then probability of histogram rio oct
85 some work later Probability to observe histogram k after N trials P (k θ) = ( ) W N i=1 a(δ,k i) i k A (δ,n) 0 with m (δ,r) m(m + δ)(m + 2δ) (m + (r 1)δ) In the limit N we get log P (k θ) = W log p i + 1 γ i=1 W q i log p i, i=1 rio oct
86 Pólya violates practically everything S Polya (p) = W log p i is of class (c, d) = (0, 1) i=1 c = 0 violates SK4 c = 0 violates SK3 S Polya (p) violates SK2. p i = 1/W yields minimum! log(x) is convex and uniform distribution Ok with intuition: self-reinforcing mechanisms drive system away from uniform distribution into rich-get-richer scenarios rio oct
87 Maximum entropy principle still works! Violation of axioms SK2, SK3, SK4 does not invalidate the principle Minimizing Polya-divergence under the usual normalization constraint means ψ S Polya (p) α W p i β i=1 W q i log p i, i=1 with the negative inverse temperature β = 1/γ < 0 Solving the MEP either leads to the solution p i = 1 ζ (q i γ) for 0 < p i < 1 or if this can not be satisfied p i = 0 rio oct
88 Note! For γ < min(q) uniform MEP solution for all i If max(q) > γ > min(q), uniform for i with q i > γ. For others p i = 0 If γ > max(q) for one i, p i = 1 while all others p j = 0 rio oct
89 Frequency distributions of Polya processes assume q i = 1/W MEP prediction is uniform How is p i distributed around 1/W? look at frequency and rank distribution of Polya-processes new variables n z (k) = W χ(k i = z) i=1 χ is characteristic function 1 if argument is true and 0 if false n z (k) number of states that occur with frequency z (π z = n z W ) rio oct
90 Frequency distributions of Polya processes Probability of observing some n = (n 1,, n N ) is P (n θ) = {k n=n(k)} P (k θ) Maximize S Polya (π θ) frequency distribution for uniform priors π z = n z W = 1 ζ 1 W γ + 1) φz(z z + 1 γw (1) where φ = exp( β) and ζ = exp(1 + α)n 1 W γ 1 Rank distribution: For r = 1,, W define intervals [t r+1, t r ] with 1 W = t r t r+1 dz π z f(r) = W tr dz π z z N t r+1 rio oct
91 Frequency distribution of Polya urn process W = 100, δ = 2, uniform initial conditions a i = 1, N = 10 5 trials Note: δ = 1 recovers Poissonian distribution for the multinomial process rio oct
92 Summary Polya MEP entropy is of the (0, d)-class violates SK2, SK3, SK4 but MEP still works no-power laws but something really different rio oct
93 Conclusions I Complex Systems are non-ergodic by nature Interpret CS as those where Shannon axioms 1-3 hold All statistical systems uniquely classified by 2 scaling exponents (c, d) Single entropy covers all systems: S c,d = re i Γ (1 + d, 1 c ln p i) rc All known entropies of SK1-SK3 systems are special cases Distribution functions of all systems are Lambert-W exponentials. There are no other options Phasespace growth determines entropy Maximum entropy principle: exists for path-dependent processes rio oct
94 Conclusions II information theoretic approach leads to (c, d)-entropies If (c, d)-entropies are thermodynamically reasonable deep connection to phase space volume (tells you what systems have what c and d) Extensive and max entropy are both (c, d)-entropies but NOT the same SSR Markovian, non-ergodic exact power laws. Slope determines noise SSR new route to scaling huge applicability Pólya urns toy systems for self-reinforcing dyn. and history dependence Pólya urns violate SK2, SK3, SK4 but MEP holds no power laws, no stretched exponentials, something very different rio oct
95 A note on Rényi entropy It is it not sooo relevant for CS. Why? Relax Khinchin axiom 4: S(A+B) = S(A)+S(B A) S(A+B) = S(A)+S(B) Rényi entropy S R = 1 α 1 ln i pα i violates our S = i g(p i) But: our above argument also holds for Rényi-type entropies!!! ( W ) lim W S(λW ) S(W ) = lim R S = G G i=1 g(p i ) ( ) fg (z) z G 1 (R) R = [for G ln] = 1 rio oct
96 The Lambert-W: a reminder solves x = W (x)e W (x) inverse of p ln p: [W (p)] 1 = p ln p delayed differential equations ẋ(t) = αx(t τ) x(t) = e 1 τ W (ατ)t Ansatz: x(t) = x 0 exp[ 1 τ f(ατ)t] with f some function rio oct
97 Amount of information production in a process Markov chain with states A 1, A 2,... A n transition probabilities p ik probability for being in state l: P l = k P kp kl if system is in state A i then we have the scheme ( ) A1 A 2... A n p i1 p i2... p in Then H i = k p ik log p ik is the information obtained when Markov chain is moving from A i step to the next average this over all initial states: H = i P i H i = i k P ip ik log p ik this is the information production if Markov chain moves one step ahead rio oct
LECTURE 2 Information theory for complex systems
LECTURE 2 Information theory for complex systems Stefan Thurner www.complex-systems.meduniwien.ac.at www.santafe.edu rio summer school oct 28 2015 Lecture 1: What are CS? Lecture 2: Basics in information
More informationIs There a World Behind Shannon? Entropies for Complex Systems
Is There a World Behind Shannon? Entropies for Complex Systems Stefan Thurner and Rudolf Hanel Abstract In their seminal works, Shannon and Khinchin showed that assuming four information theoretic axioms
More informationSeptember Math Course: First Order Derivative
September Math Course: First Order Derivative Arina Nikandrova Functions Function y = f (x), where x is either be a scalar or a vector of several variables (x,..., x n ), can be thought of as a rule which
More information1 Complex Networks - A Brief Overview
Power-law Degree Distributions 1 Complex Networks - A Brief Overview Complex networks occur in many social, technological and scientific settings. Examples of complex networks include World Wide Web, Internet,
More informationLecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.
Lecture 2 1 Martingales We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales. 1.1 Doob s inequality We have the following maximal
More information6.207/14.15: Networks Lecture 12: Generalized Random Graphs
6.207/14.15: Networks Lecture 12: Generalized Random Graphs 1 Outline Small-world model Growing random networks Power-law degree distributions: Rich-Get-Richer effects Models: Uniform attachment model
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationf( x) f( y). Functions which are not one-to-one are often called many-to-one. Take the domain and the range to both be all the real numbers:
I. UNIVARIATE CALCULUS Given two sets X and Y, a function is a rule that associates each member of X with exactly one member of Y. That is, some x goes in, and some y comes out. These notations are used
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationExponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that
1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1
More informationPower Laws & Rich Get Richer
Power Laws & Rich Get Richer CMSC 498J: Social Media Computing Department of Computer Science University of Maryland Spring 2015 Hadi Amiri hadi@umd.edu Lecture Topics Popularity as a Network Phenomenon
More informationLecture Notes 7 Random Processes. Markov Processes Markov Chains. Random Processes
Lecture Notes 7 Random Processes Definition IID Processes Bernoulli Process Binomial Counting Process Interarrival Time Process Markov Processes Markov Chains Classification of States Steady State Probabilities
More informationMath Camp II. Calculus. Yiqing Xu. August 27, 2014 MIT
Math Camp II Calculus Yiqing Xu MIT August 27, 2014 1 Sequence and Limit 2 Derivatives 3 OLS Asymptotics 4 Integrals Sequence Definition A sequence {y n } = {y 1, y 2, y 3,..., y n } is an ordered set
More informationMATH 56A SPRING 2008 STOCHASTIC PROCESSES
MATH 56A SPRING 008 STOCHASTIC PROCESSES KIYOSHI IGUSA Contents 4. Optimal Stopping Time 95 4.1. Definitions 95 4.. The basic problem 95 4.3. Solutions to basic problem 97 4.4. Cost functions 101 4.5.
More informationDebunking Misconceptions Regarding the Theory of Evolution
Debunking Misconceptions Regarding the Theory of Evolution Myth 1 - Evolution has never been observed. Biologists define evolution as a change in the gene pool of a population over time. One example is
More informationFrom Multiplicity to Entropy
From Multiplicity to Entropy R Hanel, S Thurner, and M. Gell-Mann http://www.complex-systems.meduniwien.ac.at Hanel (CeMSIIS2013) Aging & Entropy Tsallis Birthday 2013 1 / 24 When I first visited CBPF...
More informationThe Fundamentals of Heavy Tails Properties, Emergence, & Identification. Jayakrishnan Nair, Adam Wierman, Bert Zwart
The Fundamentals of Heavy Tails Properties, Emergence, & Identification Jayakrishnan Nair, Adam Wierman, Bert Zwart Why am I doing a tutorial on heavy tails? Because we re writing a book on the topic Why
More informationComplex Systems Methods 11. Power laws an indicator of complexity?
Complex Systems Methods 11. Power laws an indicator of complexity? Eckehard Olbrich e.olbrich@gmx.de http://personal-homepages.mis.mpg.de/olbrich/complex systems.html Potsdam WS 2007/08 Olbrich (Leipzig)
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationNew Directions in Computer Science
New Directions in Computer Science John Hopcroft Cornell University Time of change The information age is a revolution that is changing all aspects of our lives. Those individuals, institutions, and nations
More informationTheory and Applications of Complex Networks
Theory and Applications of Complex Networks 1 Theory and Applications of Complex Networks Classes Six Eight College of the Atlantic David P. Feldman 30 September 2008 http://hornacek.coa.edu/dave/ A brief
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationIII. The position-time graph shows the motion of a delivery truck on a long, straight street.
Physics I preap Name per MOTION GRAPHS For this assignment, you will need to use the Moving Man simulation from the phet website. As you work through these problems focus on making sense of the motion
More informationLecture 3: Markov chains.
1 BIOINFORMATIK II PROBABILITY & STATISTICS Summer semester 2008 The University of Zürich and ETH Zürich Lecture 3: Markov chains. Prof. Andrew Barbour Dr. Nicolas Pétrélis Adapted from a course by Dr.
More informationUnderstanding Zipf s law of word frequencies through sample-space collapse in sentence formation
Understanding Zipf s law of frequencies through sample- collapse in sentence formation Stefan Thurner 1,2,3, Rudolf Hanel 1, Bo Liu 1, and Bernat Corominas-Murtra 1 1 Section for Science of Complex Systems,
More informationInformation in Biology
Information in Biology CRI - Centre de Recherches Interdisciplinaires, Paris May 2012 Information processing is an essential part of Life. Thinking about it in quantitative terms may is useful. 1 Living
More informationModelling data networks stochastic processes and Markov chains
Modelling data networks stochastic processes and Markov chains a 1, 3 1, 2 2, 2 b 0, 3 2, 3 u 1, 3 α 1, 6 c 0, 3 v 2, 2 β 1, 1 Richard G. Clegg (richard@richardclegg.org) November 2016 Available online
More informationThe Metropolis-Hastings Algorithm. June 8, 2012
The Metropolis-Hastings Algorithm June 8, 22 The Plan. Understand what a simulated distribution is 2. Understand why the Metropolis-Hastings algorithm works 3. Learn how to apply the Metropolis-Hastings
More informationLatent Dirichlet Allocation Introduction/Overview
Latent Dirichlet Allocation Introduction/Overview David Meyer 03.10.2016 David Meyer http://www.1-4-5.net/~dmm/ml/lda_intro.pdf 03.10.2016 Agenda What is Topic Modeling? Parametric vs. Non-Parametric Models
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationConvergence of Random Walks and Conductance - Draft
Graphs and Networks Lecture 0 Convergence of Random Walks and Conductance - Draft Daniel A. Spielman October, 03 0. Overview I present a bound on the rate of convergence of random walks in graphs that
More informationBayesian nonparametrics
Bayesian nonparametrics 1 Some preliminaries 1.1 de Finetti s theorem We will start our discussion with this foundational theorem. We will assume throughout all variables are defined on the probability
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationInformation in Biology
Lecture 3: Information in Biology Tsvi Tlusty, tsvi@unist.ac.kr Living information is carried by molecular channels Living systems I. Self-replicating information processors Environment II. III. Evolve
More informationMarkov Chains CK eqns Classes Hitting times Rec./trans. Strong Markov Stat. distr. Reversibility * Markov Chains
Markov Chains A random process X is a family {X t : t T } of random variables indexed by some set T. When T = {0, 1, 2,... } one speaks about a discrete-time process, for T = R or T = [0, ) one has a continuous-time
More informationModels of Language Evolution: Part II
Models of Language Evolution: Part II Matilde Marcolli CS101: Mathematical and Computational Linguistics Winter 2015 Main Reference Partha Niyogi, The computational nature of language learning and evolution,
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationLogistic Regression: Online, Lazy, Kernelized, Sequential, etc.
Logistic Regression: Online, Lazy, Kernelized, Sequential, etc. Harsha Veeramachaneni Thomson Reuter Research and Development April 1, 2010 Harsha Veeramachaneni (TR R&D) Logistic Regression April 1, 2010
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More informationMIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,
MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October, 23 2013 The exam is closed book. You are allowed a one-page cheat sheet. Answer the questions in the spaces provided on the question sheets. If you run
More informationMini course on Complex Networks
Mini course on Complex Networks Massimo Ostilli 1 1 UFSC, Florianopolis, Brazil September 2017 Dep. de Fisica Organization of The Mini Course Day 1: Basic Topology of Equilibrium Networks Day 2: Percolation
More informationCS224W: Analysis of Networks Jure Leskovec, Stanford University
CS224W: Analysis of Networks Jure Leskovec, Stanford University http://cs224w.stanford.edu 10/30/17 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu 2
More informationChapter 11 - Sequences and Series
Calculus and Analytic Geometry II Chapter - Sequences and Series. Sequences Definition. A sequence is a list of numbers written in a definite order, We call a n the general term of the sequence. {a, a
More informationChapter 13: Integrals
Chapter : Integrals Chapter Overview: The Integral Calculus is essentially comprised of two operations. Interspersed throughout the chapters of this book has been the first of these operations the derivative.
More informationthe time it takes until a radioactive substance undergoes a decay
1 Probabilities 1.1 Experiments with randomness Wewillusethetermexperimentinaverygeneralwaytorefertosomeprocess that produces a random outcome. Examples: (Ask class for some first) Here are some discrete
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationMathematical Structures of Statistical Mechanics: from equilibrium to nonequilibrium and beyond Hao Ge
Mathematical Structures of Statistical Mechanics: from equilibrium to nonequilibrium and beyond Hao Ge Beijing International Center for Mathematical Research and Biodynamic Optical Imaging Center Peking
More informationComputer Vision Group Prof. Daniel Cremers. 14. Clustering
Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it
More informationChapter 6: Large Random Samples Sections
Chapter 6: Large Random Samples Sections 6.1: Introduction 6.2: The Law of Large Numbers Skip p. 356-358 Skip p. 366-368 Skip 6.4: The correction for continuity Remember: The Midterm is October 25th in
More informationAP Calculus Chapter 9: Infinite Series
AP Calculus Chapter 9: Infinite Series 9. Sequences a, a 2, a 3, a 4, a 5,... Sequence: A function whose domain is the set of positive integers n = 2 3 4 a n = a a 2 a 3 a 4 terms of the sequence Begin
More informationInformation Theory and Statistics Lecture 3: Stationary ergodic processes
Information Theory and Statistics Lecture 3: Stationary ergodic processes Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Measurable space Definition (measurable space) Measurable space
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationModelling data networks stochastic processes and Markov chains
Modelling data networks stochastic processes and Markov chains a 1, 3 1, 2 2, 2 b 0, 3 2, 3 u 1, 3 α 1, 6 c 0, 3 v 2, 2 β 1, 1 Richard G. Clegg (richard@richardclegg.org) December 2011 Available online
More informationDaily Update. Dondi Ellis. January 27, 2015
Daily Update Dondi Ellis January 27, 2015 CLASS 1: Introduction and Section 1.1 REMINDERS: Assignment: Read sections 1.1 and 1.2, and Student Guide (online). Buy a TI-84 or other graphing calculator satisfying
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationClassification and Regression Trees
Classification and Regression Trees Ryan P Adams So far, we have primarily examined linear classifiers and regressors, and considered several different ways to train them When we ve found the linearity
More information2.1 How Do We Measure Speed? Student Notes HH6ed. Time (sec) Position (m)
2.1 How Do We Measure Speed? Student Notes HH6ed Part I: Using a table of values for a position function The table below represents the position of an object as a function of time. Use the table to answer
More informationM155 Exam 2 Concept Review
M155 Exam 2 Concept Review Mark Blumstein DERIVATIVES Product Rule Used to take the derivative of a product of two functions u and v. u v + uv Quotient Rule Used to take a derivative of the quotient of
More informationUNDERSTANDING BOLTZMANN S ANALYSIS VIA. Contents SOLVABLE MODELS
UNDERSTANDING BOLTZMANN S ANALYSIS VIA Contents SOLVABLE MODELS 1 Kac ring model 2 1.1 Microstates............................ 3 1.2 Macrostates............................ 6 1.3 Boltzmann s entropy.......................
More informationMath 38: Graph Theory Spring 2004 Dartmouth College. On Writing Proofs. 1 Introduction. 2 Finding A Solution
Math 38: Graph Theory Spring 2004 Dartmouth College 1 Introduction On Writing Proofs What constitutes a well-written proof? A simple but rather vague answer is that a well-written proof is both clear and
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationConditional probabilities and graphical models
Conditional probabilities and graphical models Thomas Mailund Bioinformatics Research Centre (BiRC), Aarhus University Probability theory allows us to describe uncertainty in the processes we model within
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationInformation Theory. Lecture 5 Entropy rate and Markov sources STEFAN HÖST
Information Theory Lecture 5 Entropy rate and Markov sources STEFAN HÖST Universal Source Coding Huffman coding is optimal, what is the problem? In the previous coding schemes (Huffman and Shannon-Fano)it
More informationChapter 1 Review of Equations and Inequalities
Chapter 1 Review of Equations and Inequalities Part I Review of Basic Equations Recall that an equation is an expression with an equal sign in the middle. Also recall that, if a question asks you to solve
More informationMotivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University
Econ 690 Purdue University In virtually all of the previous lectures, our models have made use of normality assumptions. From a computational point of view, the reason for this assumption is clear: combined
More informationLecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability
Lecture Notes 1 Basic Probability Set Theory Elements of Probability Conditional probability Sequential Calculation of Probability Total Probability and Bayes Rule Independence Counting EE 178/278A: Basic
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationTheFourierTransformAndItsApplications-Lecture28
TheFourierTransformAndItsApplications-Lecture28 Instructor (Brad Osgood):All right. Let me remind you of the exam information as I said last time. I also sent out an announcement to the class this morning
More informationUniformly Uniformly-ergodic Markov chains and BSDEs
Uniformly Uniformly-ergodic Markov chains and BSDEs Samuel N. Cohen Mathematical Institute, University of Oxford (Based on joint work with Ying Hu, Robert Elliott, Lukas Szpruch) Centre Henri Lebesgue,
More informationMARKOV PROCESSES. Valerio Di Valerio
MARKOV PROCESSES Valerio Di Valerio Stochastic Process Definition: a stochastic process is a collection of random variables {X(t)} indexed by time t T Each X(t) X is a random variable that satisfy some
More informationPHY411 Lecture notes Part 5
PHY411 Lecture notes Part 5 Alice Quillen January 27, 2016 Contents 0.1 Introduction.................................... 1 1 Symbolic Dynamics 2 1.1 The Shift map.................................. 3 1.2
More informationEverything Old Is New Again: Connecting Calculus To Algebra Andrew Freda
Everything Old Is New Again: Connecting Calculus To Algebra Andrew Freda (afreda@deerfield.edu) ) Limits a) Newton s Idea of a Limit Perhaps it may be objected, that there is no ultimate proportion of
More informationRandom Times and Their Properties
Chapter 6 Random Times and Their Properties Section 6.1 recalls the definition of a filtration (a growing collection of σ-fields) and of stopping times (basically, measurable random times). Section 6.2
More informationIt can be shown that if X 1 ;X 2 ;:::;X n are independent r.v. s with
Example: Alternative calculation of mean and variance of binomial distribution A r.v. X has the Bernoulli distribution if it takes the values 1 ( success ) or 0 ( failure ) with probabilities p and (1
More informationProbabilistic Graphical Models: MRFs and CRFs. CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov
Probabilistic Graphical Models: MRFs and CRFs CSE628: Natural Language Processing Guest Lecturer: Veselin Stoyanov Why PGMs? PGMs can model joint probabilities of many events. many techniques commonly
More information1 What is the area model for multiplication?
for multiplication represents a lovely way to view the distribution property the real number exhibit. This property is the link between addition and multiplication. 1 1 What is the area model for multiplication?
More informationWXML Final Report: Chinese Restaurant Process
WXML Final Report: Chinese Restaurant Process Dr. Noah Forman, Gerandy Brito, Alex Forney, Yiruey Chou, Chengning Li Spring 2017 1 Introduction The Chinese Restaurant Process (CRP) generates random partitions
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationLecture 9 Classification of States
Lecture 9: Classification of States of 27 Course: M32K Intro to Stochastic Processes Term: Fall 204 Instructor: Gordan Zitkovic Lecture 9 Classification of States There will be a lot of definitions and
More informationCalculus with business applications, Lehigh U, Lecture 01 notes Summer
Calculus with business applications, Lehigh U, Lecture 01 notes Summer 2012 1 Functions 1. A company sells 100 widgets at a price of $20. Sales increase by 5 widgets for each $1 decrease in price. Write
More informationFast Inference with Min-Sum Matrix Product
Fast Inference with Min-Sum Matrix Product (or how I finished a homework assignment 15 years later) Pedro Felzenszwalb University of Chicago Julian McAuley Australia National University / NICTA 15 years
More information3.5 Quadratic Approximation and Convexity/Concavity
3.5 Quadratic Approximation and Convexity/Concavity 55 3.5 Quadratic Approximation and Convexity/Concavity Overview: Second derivatives are useful for understanding how the linear approximation varies
More informationGroups, entropies and Einstein s likelihood functions
Groups, entropies and Einstein s likelihood functions Gabriele Sicuro in collaboration with P. Tempesta (UCM and ICMAT) Centro Brasileiro de Pesquisas Físicas Rio de Janeiro, Brazil 22nd of June, 2016
More informationCS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions
CS145: Probability & Computing Lecture 18: Discrete Markov Chains, Equilibrium Distributions Instructor: Erik Sudderth Brown University Computer Science April 14, 215 Review: Discrete Markov Chains Some
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationNetworks: Lectures 9 & 10 Random graphs
Networks: Lectures 9 & 10 Random graphs Heather A Harrington Mathematical Institute University of Oxford HT 2017 What you re in for Week 1: Introduction and basic concepts Week 2: Small worlds Week 3:
More informationLooking Ahead to Chapter 10
Looking Ahead to Chapter Focus In Chapter, you will learn about polynomials, including how to add, subtract, multiply, and divide polynomials. You will also learn about polynomial and rational functions.
More informationUnit IV Derivatives 20 Hours Finish by Christmas
Unit IV Derivatives 20 Hours Finish by Christmas Calculus There two main streams of Calculus: Differentiation Integration Differentiation is used to find the rate of change of variables relative to one
More informationAdvanced Mathematics Unit 2 Limits and Continuity
Advanced Mathematics 3208 Unit 2 Limits and Continuity NEED TO KNOW Expanding Expanding Expand the following: A) (a + b) 2 B) (a + b) 3 C) (a + b)4 Pascals Triangle: D) (x + 2) 4 E) (2x -3) 5 Random Factoring
More informationAdvanced Mathematics Unit 2 Limits and Continuity
Advanced Mathematics 3208 Unit 2 Limits and Continuity NEED TO KNOW Expanding Expanding Expand the following: A) (a + b) 2 B) (a + b) 3 C) (a + b)4 Pascals Triangle: D) (x + 2) 4 E) (2x -3) 5 Random Factoring
More informationUnit IV Derivatives 20 Hours Finish by Christmas
Unit IV Derivatives 20 Hours Finish by Christmas Calculus There two main streams of Calculus: Differentiation Integration Differentiation is used to find the rate of change of variables relative to one
More informationSolutions For Stochastic Process Final Exam
Solutions For Stochastic Process Final Exam (a) λ BMW = 20 0% = 2 X BMW Poisson(2) Let N t be the number of BMWs which have passes during [0, t] Then the probability in question is P (N ) = P (N = 0) =
More informationDiscrete random structures whose limits are described by a PDE: 3 open problems
Discrete random structures whose limits are described by a PDE: 3 open problems David Aldous April 3, 2014 Much of my research has involved study of n limits of size n random structures. There are many
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationSTA 294: Stochastic Processes & Bayesian Nonparametrics
MARKOV CHAINS AND CONVERGENCE CONCEPTS Markov chains are among the simplest stochastic processes, just one step beyond iid sequences of random variables. Traditionally they ve been used in modelling a
More informationMidterm 1 Review. Distance = (x 1 x 0 ) 2 + (y 1 y 0 ) 2.
Midterm 1 Review Comments about the midterm The midterm will consist of five questions and will test on material from the first seven lectures the material given below. No calculus either single variable
More informationProbabilistic Linguistics
Matilde Marcolli MAT1509HS: Mathematical and Computational Linguistics University of Toronto, Winter 2019, T 4-6 and W 4, BA6180 Bernoulli measures finite set A alphabet, strings of arbitrary (finite)
More information