Nearly Maximally Predictive Features and Their Dimensions

Size: px
Start display at page:

Download "Nearly Maximally Predictive Features and Their Dimensions"

Transcription

1 Nearl Maximall Predictive Features and Their Dimensions Sarah E. Marzen James P. Crutchfield SFI WORKING PAPER: SFI Working Papers contain accounts of scienti5ic work of the author(s) and do not necessaril represent the views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have alread appeared in print. Except for papers b our external facult, papers must be based on work done at SFI, inspired b an invited visit to or collaboration at SFI, or funded b an SFI grant. NOTICE: This working paper is included b permission of the contributing author(s) as a means to ensure timel distribution of the scholarl and technical work on a non-commercial basis. Copright and all rights therein are maintained b the author(s). It is understood that all persons coping this information will adhere to the terms and constraints invoked b each author's copright. These works ma be reposted onl with the explicit permission of the copright holder. SANTA FE INSTITUTE

2 Santa Fe Institute Working Paper 17-0-XXX arxiv.org:170.xxxx [phsics.gen-ph] Nearl Maximall Predictive Features and Their Dimensions Sarah E. Marzen 1,, and James P. Crutchfield 3, 1 Phsics of Living Sstems Group, Department of Phsics, Massachusetts Institute of Technolog, Cambridge, MA 0139 Department of Phsics, Universit of California at Berkele, Berkele, CA Complexit Sciences Center, Department of Phsics Universit of California at Davis, One Shields Avenue, Davis, CA (Dated: Februar 7, 017) Scientific explanation often requires inferring maximall predictive features from a given data set. Unfortunatel, the collection of minimal maximall predictive features for most stochastic processes is uncountabl infinite. In such cases, one compromises and instead seeks nearl maximall predictive features. Here, we derive upper-bounds on the rates at which the number and the coding cost of nearl maximall predictive features scales with desired predictive power. The rates are determined b the fractal dimensions of a process mixed-state distribution. These results, in turn, show how widel-used finite-order Markov models can fail as predictors and that mixed-state predictive features offer a substantial improvement. PACS numbers: r c Tp 0.50.Ga Kewords: hidden Markov models, entrop rate, dimension, resource-prediction trade-off, mixed-state simplex Often, we wish to find a minimal maximall predictive model consistent with available data. Perhaps we are designing interactive agents that reap greater rewards b developing a predictive model of their environment [1 6] or, perhaps, we wish to build a predictive model of experimental data because we believe that the resultant model gives insight into the underling mechanisms of the sstem [7, 8]. Either wa, we are almost alwas faced with constraints that force us to efficientl compress our data [9]. Ideall, we would compress information about the past without sacrificing an predictive power. For stochastic processes generated b finite unifilar hidden Markov models (HMMs), one need onl store a finite number of predictive features. The minimal such features are called causal states, their coding cost is the statistical complexit C µ [10], and the implied unifilar HMM is the ɛ-machine [10, 11]. However, most processes require an infinite number of causal states [7] and so cannot be described b finite unifilar HMMs. In these cases, we can onl attain some maximal level of predictive power given constraints on the number of predictive features or their coding cost. Equivalentl, from finite data we can onl infer a finite predictive model. Thus, we need to know how our predictive power grows with available resources. Recent work elucidated the tradeoffs between resource constraints and predictive power for stochastic processes generated b countable unifilar HMMs or, equivalentl, semarzen@mit.edu chaos@ucdavis.edu described b a finite or countabl infinite number of causal states [1 14]. Few, though, studied this tradeoff or provided bounds thereof more generall. Here, we place new bounds on resource-prediction tradeoffs in the limit of nearl maximal predictive power for processes with either a countable or an uncountable infinit of causal states b coarse-graining the mixed-state simplex [15]. These bounds give a novel operational interpretation to the fractal dimension of the mixed-state simplex and suggest new routes towards quantifing the memor stored in a stochastic process when, as is tpical, statistical complexit diverges. Background We consider a discrete-time, discretestate stochastic process P generated b an HMM G, which comes equipped with underling states g and labeled transition matrices Tg,g x = Pr(G t+1 = g, X t+1 = x G t = g) [16]. There is an infinite number of alternate HMMs that generate P [17 0], so we specif here that G is the minimal generative model that is, the generative model with the minimal number of hidden states consistent with the observed process [1]. For reasons that become clear shortl, we are interested in the block entrop H(L) = H[X 0:L ], where X a:b = X a, X a+1,..., X b 1 is a contiguous block of random variables generated b G. In particular, its growth the entrop rate h µ = lim L H(L)/L quantifies a process intrinsic randomness. While the excess entrop E = lim L (H(L) h µ L) quantifies how much is predictable: how much future information can be predicted from the past []. Finite-length entrop-rate estimates h µ (L) = H[X 0 X L:0 ] provide increasingl better approximations to the true entrop rate h µ as L grows large.

3 While finite-length excess-entrop estimates: E(L) = H(L) h µ L (1) L 1 = (h µ (l) h µ ). () l=0 tend to the true excess entrop E as L grows large [3]. As there, we consider onl finitar processes, those with finite E [4]. Predictive features R from some alphabet F are formed b compressing the process past X :0 in was that implicitl retain information about the future X 0:. (From hereon, our block notation suppresses infinite indices.) A predictive distortion quantifies the predictabilit lost after such a coarse-graining: d(r) = I[X :0 ; X 0: R] = E I[R; X 0: ]. On the one hand, distortion achieves its maximum value E when R captures no information from the past that could be used for prediction. On the other, it can be made to vanish triviall b taking the predictive features R to be all of the possible histories X :0. Here, we quantif the cost of a larger feature space in two was: b the number F of predictive features and b their coding cost H[R] [5]. Results Ideall, we would identif the minimal number F of predictive features or the coding cost H[R] required to achieve at least a given level d of predictive distortion. This is almost alwas a difficult optimization problem. However, we can place upper bounds on F and H[R] b constructing suboptimal predictive feature sets that achieve predictive distortion d. We start b reminding ourselves of the optimal solution in the limit that d = 0 the causal states S [10, 11, 13]. Causal states can be defined b calling two pasts, x :0 and x :0, equivalent when using them our predictions of the future are the same: Pr(X 0: X :0 = x :0 ) = Pr(X 0: X :0 = x :0). (These conditional distributions are referred to as future morphs.) One can then form a model, the ɛ-machine, from the set S of causal-state equivalence classes and their transition operators. The Shannon entrop of the causal state distribution is the statistical complexit: C µ = H[S]. A process ɛ-machine can also be viewed as the minimal unifilar HMM capable of generating the process [6] and C µ the amount of historical information the process stores in its causal states. Though the mechanics of working with causal states can become rather sophisticated, the essential idea is that causal states are designed to capture everthing about the past relevant to predicting the future and onl that information. In the lossless limit, when d = 0, one cannot find predictive representations that achieve F smaller than S or that achieve H[R] smaller than C µ. Similarl, no optimal loss predictive representation will ever find F > S or H[R] C µ. When the number of causal states is infinite or statistical complexit diverges, as is tpical, as we noted, these bounds are quite useless; but otherwise, the provide a useful calibration for the feature sets proposed below. Markov Features Several familiar predictive models use pasts of length L as predictive features. This feature set can be thought of as constructing an order-l Markov model of a process [7]. The implied predictive distortion is d(l) = I[X :0 ; X 0: X 0:L ] = E E(L) [8], while the number of features is the number of length-l words with nonzero probabilit and their entrop H[R] = H(L). Generall, h µ (L) converges exponentiall quickl to the true entrop rate h µ for stochastic processes generated b finite-state HMMs, unifilar or not; see Ref. [9] and references therein. Then, h µ (L) h µ Ke λl in the large L limit. From Eq. (), we see that the convergence rate λ also implies an exponential rate of deca for E E(L) K e λl. Additionall, according to the asmptotic equipartition propert [5], when L is large, F e h0l and H(L) h µ L, where h 0 is the topological entrop and h µ the entrop rate, both in nats. In sum, this first set of predictive features effectivel, the construction of order-l Markov models ields an algebraic tradeoff between the size of the feature set F and the predictive distortion d: F ( ) h0/λ 1, (3) d and a logarithmic tradeoff between the entrop of the features and distortion: H[R] h ( ) µ 1 λ log. (4) d In principle, λ can be arbitraril small, and so h 0 /λ and h µ /λ arbitraril large, even for processes generated b finite-state HMMs. This can be true even for finite unifilar HMMs (ɛ-machines). To see this, let W be the transition matrix of a process mixed-state presentation; defined shortl. From Ref. [8], when W s spectral gap γ is small, we have E E(L) (1 γ) L, so that λ = log 1 1 γ can be quite small. In short, when the process in question has onl a finite number of causal states, F optimall saturates at S and H[R] optimall saturates at C µ, but from Eqs. (3) and (4), the size of this Markov feature set can grow

4 3 without bound when we attempt to achieve zero predictive distortion. Mixed-State Features A different predictive feature set comes from coarse-graining the mixed-state simplex. As first described b Blackwell [15], mixed states Y are probabilit distributions Pr(G 0 X L:0 = x L:0 ) over the internal states G of a generative model given the generated sequences. Transient mixed states are those at finite L, while recurrent mixed states are those remaining with positive probabilit in the limit that L. Recurrent mixed states exactl correspond to causal states S [30]. When C µ diverges, recurrent mixed states often la on a Cantor set in the simplex; see Fig. 1. In this circumstance, one examines the various dimensions that describe the scaling in such sets. Here, for reasons that will become clear, we use the box-counting dim 0 (Y ) and information dimensions dim 1 (Y ) [31]. More concretel, we partition the simplex into cubes of side length ɛ, and each nonempt cube is taken to be a predictive feature in our representation. When there are onl a finite number of causal states, then this feature set consists of the causal states S for some nonzero ɛ. There is no such ɛ if, instead, the process has a countable infinit of causal states. Sometimes, as for the finite S case, the information dimension and perhaps even the box-counting dimension of the mixed states will vanish. When there is an uncountable infinit of causal states and ɛ is sufficientl small, the corresponding number of features scales as: F ( ) dim0(y ) 1 ɛ where dim 0 (Y ) denotes the box-counting dimension [3] and the coding cost scales as: H[R] dim 1 (Y ) log 1 ɛ, where dim 1 (Y ) is the information dimension. These dimensions can be approximated numericall; see Fig. 1 and Supplementar Materials App. A. For processes generated b infinite ɛ-machines, finding the better feature set requires analzing the scaling of predictive distortion with ɛ. Supplementar Materials App. B shows that d(r) at coarse-graining ɛ scales at most as ɛ in the limit of asmptoticall small ɛ. Our upper bound relies on the fact that two nearb mixed states have similar future morphs, i.e., the have similar conditional probabilit distributions over future trajectories given one s mixed state. From this, we conclude that: F ( ) dim0(y ) 1 d, (5) and H[R] dim 1 (Y ) log 1 d. (6) In general, both dim 0 (Y ) and dim 1 (Y ) are bounded above b G, the number of states in the minimal generative model. Resource-Prediction Tradeoff Finall, putting Eqs. (3)-(6) together, we find that for a process with an uncountabl infinite ɛ-machine, the necessar number of predictive features F scales no faster than: F ( ) min(h0/λ,dim 1 0(Y )) d (7) and the requisite coding cost H[R ] scales no faster than: H[R ] min(h µ /λ, dim 1 (Y )) log 1 d, (8) where d is predictive distortion. These bounds have a practical interpretation. From finite data, one can onl justif inferring finite-state minimal maximall predictive models. Indeed, the criteria of Ref. [33] applied as in Ref. [13] suggests that the maximum number of inferred states should not ield predictive distortions below the noise in our estimate of predictive distortion d from T data points. This noise scales as 1/ T, since from T data points, we have approximatel T separate measurements of predictive distortion. This, in turn, sets upper bounds on F T 1 min(h0/λ,dim0(y )) and H[R ] min(h µ /λ, dim 1 (Y )) log T when the process in question has an uncountable infinit of causal states. We can test this prediction directl using the Baesian Structural Inference (BSI) algorithm [34] applied to the processes described in Fig. 1. The major difficult in emploing BSI is that one must specif a list of ɛ-machine topologies to search over. Since the number of such topologies grows super-exponentiall with the number of states [35], experimentall probing the scaling behavior of the number of inferred states with the amount of available data is, with nothing else said, impossible. However, expert knowledge can cull the number of ɛ-machine topologies that one should search over. In this spirit, we focus on the Simple Nonunifilar Source (SNS), since ɛ-machines for renewal processes have been characterized in detail [36 39]. The SNS s generative HMM is given in Fig. (left). This process has a mixed-state presentation similar to that of nond in Fig. 1(left). We choose to onl search over the ɛ-machine topologies that correspond to the class of eventuall Poisson renewal processes. We must first revisit the upper bound in Eq. (7), as the

5 /18/017 SNS_10_6_bins_1_100.svg /18/017 Fractal_10_6_1000bins.svg /18/017 figure_uncountabled.svg 4 FIG. 1. Mixed state presentations Y for three different generative models given in Supplementar Materials App. A. To visualize the mixed-state simplex, the diagrams plot iterates t such that the left vertex corresponds to the pure state Pr(A, B, C) = (1, 0, 0) or HMM state A, the right to pure state (0, 1, 0) or HMM state B, and the top to pure state (0, 0, 1) or HMM state C. Left plot has 55 mixed states plotted in a bin histogram; middle plots, 391, 484 mixed states in a bin file:///users/chaos/research/coarse%0grained%0mixed%0states/code/sns_10_6_bins_1_100.svg 1/1file:///Users/chaos/Research/Coarse%0Grained%0Mixed%0States/code/Fractal_10_6_1000bins.svg 1/1file:///Users/chaos/Research/Coarse%0Grained%0Mixed%0States/code/figure_uncountabled.svg 1/1 histogram; and right plots 1, 53, 360 mixed states in a bin histogram. Bin cell coloring is negative logarithm of the normalized bin counts. The box-counting dimensions are respectivel: dim 0(Y ) 0.3 at left, dim 0(Y ) 1.8 in the middle, and dim 0(Y ) 1.9 at right. Box-counting dimension calculated b estimating the slope of log(1/ɛ) versus log N ɛ as described in Supplementar Materials App. A. SNS has a countable (not uncountable) infinit of causal states. When a process ɛ-machine is countable instead of uncountable, then we can improve upon Eq. (7). However, the magnitude of improvement is process dependent and not easil characterized in general for two main reasons. First, predictive distortion can decrease faster than ɛ for processes generated b countabl infinite ɛ-machines since there might onl be one mixed state in an ɛ hpercube. (See the lefthand factor in Supplementar Materials Eq. (B5), r R (ɛ) :H[Y R (ɛ) =r]=0 π(r). We are guaranteed that lim ɛ 0 r R (ɛ) :H[Y R (ɛ) =r]=0 π(r) = 0, but the rate of convergence to zero is highl process dependent.) Second, the scaling relation between N ɛ and 1/ɛ ma be subpower law. Given a specific process, though, with a countabl infinite ɛ-machine here, the SNS we can derive expected scaling relations. See Supplementar Materials App. C. We argue that, roughl speaking, we should expect predictive distortion to deca exponentiall with N ɛ, so that d(r ɛ ) λ Nɛ for some λ < 1. As we expect, since our uncertaint in d scales as 1/ T, where T is the amount of data, we expect that the number of inferred states N T should scale as log T. These scaling relationships are confirmed b BSI applied to data generated from the SNS, where the set of ɛ-machine topologies selected from are the eventuall Poisson ɛ-machine topologies characterized b Ref. [36]. Given a set of machines M and data x 0:T, BSI returns an easil-calculable posterior Pr(M x 0:T ) for M M with (at least) two hperparameters: (i) the concentration parameter for our Dirichlet prior on the transition probabilities α and (ii) our prior on the likelihood of a model M with number of states M, taken to be proportional to e β M for a user-specified β. From this posterior, we calculate an average model size M (T ) = M M M Pr(M x 0:T ) as a function of T for multiple data strings x 0:T. The result is that we find the exact scaling of M (T ) with T is proportional to log T in the large T limit, where the proportionalit constant and the initial value depends on hperparameters α and β. Conclusion We now better understand the tradeoff between memor and prediction in processes generated b a finite-state Hidden Markov model with infinite statistical complexit. Importantl and perhaps still underappreciated, these processes are more tpical than those with finite statistical complexit and Gaussian processes, which have received more attention in related literature [1, 13]. We proposed a new method for obtaining predictive features coarse-graining mixed states and used this feature set to bound the number of features and their coding cost in the limit of small predictive distortion. These bounds were compared to those obtained from the more familiar and widel used (Markov) predictive feature set that of memorizing all pasts of a fixed length. The general results here suggest that the new feature set can outperform the standard set in man situations. Practicall speaking, the new bounds presented have three potential uses. First, the give weight to Ref. [7] s suggestion to replace the statistical complexit with the information dimension of mixed-state space as a complexit measure when statistical complexit diverges.

6 5 1 p 0 M A β = 1 β = 4 β = 8 p 0 q 1 B 1 q T FIG.. Model-size scaling for the parametrized Simple Nonunifilar Source (SNS). (Top) Generative HMM for the SNS. (Bottom) Scaling of M = M M P (M x0:t ) with T, with the posterior P (M x 0:T ) calculated using formulae from BSI [34] with α = 1 and the set of model topologies being the eventuall Poisson ɛ-machines described in Ref. [36]. Data generated from the SNS with p = q = 1. Linear regression reveals a slope of 1 for log log T vs. log M, confirming the expected N T log T, where the proportionalit constants depend on β. Second, the suggest a route to an improved, processdependent tradeoff between complexit and precision for estimating the entrop rate of processes generated b nonunifilar HMMs [40]. (Admittedl, here space kept us from addressing how to estimate the probabilit distribution over ɛ-boxes.) Finall, and perhaps most importantl, our upper bounds provide a first attempt to calculate the expected scaling of inferred model size with available data. It is commonl accepted that inferring time-series models from finite data tpicall has two components parameter estimation and model selection though these two can be done simultaneousl. Our focus above was on model selection, as we monitored how model size increased with the amount of available data. One can view the results as an effort to characterize the posterior distribution of ɛ-machine topologies given data or as the growth rate of the optimum model cost [41] in the asmptotic limit. Posterior distributions of estimated parameters (transition probabilities) are almost alwas asmptoticall normal, with standard deviation decreasing as the square root of the amount of available data [4 44]. However, asmptotic normalit does not tpicall hold for this posterior distribution, since using finite ɛ-machine topologies almost alwas implies out-ofclass modeling. Rather, we showed that the particular wa in which the mode of this posterior distribution increases with data tpicall depends on a gross process statistic the box-counting dimension of the mixed-state presentation. Stepping back, we onl tackled the lower parts of the infinitar process hierarch identified in Ref. [45]. In particular, complex processes in the sense of Ref. [46] have different resource-prediction tradeoffs than analzed here, since processes generated b finite-state HMMs (as assumed here) cannot produce processes with infinite excess entrop. To do so, at the ver least, predictive distortion must be more carefull defined. We conjecture that success will be achieved b instead focusing on a one-step predictive distortion, equivalent to a self-information loss function, as is tpicall done [41, 47]. Luckil, the derivation in Supplementar Materials App. B easil extends to this case, simultaneousl suggesting improvements to related entrop rate-approximating algorithms. We hope these introductor results inspire future stud of resource-prediction tradeoffs for infinitar processes. The authors thank Santa Fe Institute for its hospitalit during visits and A. Bod, C. Hillar, and D. Upper for useful discussions. JPC is an SFI External Facult member. This material is based upon work supported b, or in part b, the U.S. Arm Research Laborator and the U. S. Arm Research Office under contracts W911NF and W911NF S.E.M. was funded b a National Science Foundation Graduate Student Research Fellowship, a U.C. Berkele Chancellor s Fellowship, and the MIT Phsics of Living Sstems Fellowship. [1] S. Still. EuroPhs. Lett., 85:8005, 009. [] M. L. Littman, R. S. Sutton, and S. P. Singh. In NIPS, volume 14, pages , 001. [3] N. Tishb and D. Polani. In Perception-action ccle, pages Springer, 011. [4] D. Little and F. Sommer. Learning and exploration in action-perception loops. Frontiers in Neural Circuits, 7:37, 013. [5] S. Still and D. Precup. Theor in Biosciences, 131(3): , 01. [6] N. Brodu. Adv. Complex Ss., 14(05): , 011. [7] J. P. Crutchfield. Phsica D, 75:11 54, [8] S. Still and J. P. Crutchfield. arxiv: [9] T. Berger. Rate Distortion Theor. Prentice-Hall, New

7 6 York, [10] J. P. Crutchfield and K. Young. Phs. Rev. Let., 63: , [11] C. R. Shalizi and J. P. Crutchfield. J. Stat. Phs., 104: , 001. [1] F. Creutzig, A. Globerson, and N. Tishb. Phs. Rev. E, 79(4):04195, 009. [13] S. Still, J. P. Crutchfield, and C. J. Ellison. CHAOS, 0(3):037111, 010. [14] S. Marzen and J. P. Crutchfield. J. Stat. Phs., 163(6): , 014. [15] D. Blackwell. Transactions of the first Prague conference on information theor, Statistical decision functions, Random processes, 8:13 0, Held at Liblice near Prague from November 8 to 30, [16] L. R. Rabiner and B. H. Juang. IEEE ASSP Magazine, Januar, [17] E. J. Gilbert. Ann. Math. Stat., pages , [18] H. Ito, S.-I. Amari, and K. Kobaashi. IEEE Info. Th., 38:34, 199. [19] V. Balasubramanian. Neural Computation, 9: , [0] D. R. Upper. Theor and Algorithms for Hidden Markov Models and Generalized Hidden Markov Models. PhD thesis, Universit of California, Berkele, Published b Universit Microfilms Intl, Ann Arbor, Michigan. [1] This is not necessaril the same as the generative model with minimal generative complexit [48], since a nonunifilar generator with a large number of sparsel-used states might have smaller state entrop than a nonunifilar generator with a small number of equall-used states. And, tpicall, per our main result, the minimal generative model is usuall not the same as the minimal prescient (unifilar) model. [] Cf. Ref. [3] s discussion of terminolog and Ref. [46]. [3] J. P. Crutchfield and D. P. Feldman. CHAOS, 13(1):5 54, 003. [4] Finite HMMs generate finitar processes, as E is bounded from above b the logarithm of the number of HMM states [48]. [5] T. M. Cover and J. A. Thomas. Elements of Information Theor. Wile-Interscience, New York, second edition, 006. [6] N. Travers and J. P. Crutchfield. Equivalence of histor and generator ɛ-machines. arxiv.org: [7] One can construct a better feature set simpl b appling the causal-state equivalence relation to these length- L pasts. We avoid this here since the size of this feature set is more difficult to analze genericall and since the causal-state equivalence relation applied to length-l pasts often induces no coarse-graining. [8] P. M. Ara, R. G. James, and J. P. Crutchfield. Phs. Rev. E, 93():0143, 016. [9] N. F. Travers. Stochastic Proc. Appln., 14(1): , 014. [30] C. J. Ellison, J. R. Mahone, and J. P. Crutchfield. J. Stat. Phs., 136(6): , 009. [31] Ya. B. Pesin. Universit of Chicago Press, [3] We assume here that the lower box-counting dimension and upper box-counting dimension are equivalent. [33] S. Still and W. Bialek. Neural Comp., 16(1): , 004. [34] C. C. Strelioff and J. P. Crutchfield. Phs. Rev. E, 89:04119, 014. [35] B. D. Johnson, J. P. Crutchfield, C. J. Ellison, and C. S. McTague. arxiv.org: [36] S. Marzen and J. P. Crutchfield. Entrop, 17(7): , 015. [37] S. Marzen and J. P. Crutchfield. Phs. Lett. A, 380(17): , 016. [38] S. Marzen, M. R. DeWeese, and J. P. Crutchfield. Front. Comput. Neurosci., 9:109, 015. [39] S. Marzen and J. P. Crutchfield. arxiv: [40] E Ordentlich and T Weissman. Entrop of Hidden Markov Processes and Connections to Dnamical Sstems, London Math. Soc. Lecture Notes, 385: , 011. [41] J. Rissanen. IEEE Trans. Info. Th., IT-30:69, [4] A. M. Walker. J. Ro. Stat. Soc. Series B, pages 80 88, [43] C. C. Hede and I. M. Johnstone. J. Ro. Stat. Soc. Series B, pages , [44] T.J. Sweeting. Baesian Statistics, 4:85 835, 199. [45] J. P. Crutchfield and S. Marzen. Phs. Rev. E, 91(5):050106, 015. [46] W. Bialek, I. Nemenman, and N. Tishb. Neural Comp., 13: , 001. [47] N. Merhav and M. Feder. IEEE Trans. Info. Th., 44(6):14 147, [48] W. Löhr. Models of discrete-time stochastic processes and associated complexit measures. PhD thesis, Universit of Leipzig, Ma 009. [49] S.-W. Ho and R. W. Yeung. The interpla between entrop and variational distance. IEEE Trans. Info. Th., 56(1): , 010. [50] T. Tao. An Introduction to Measure Theor, volume 16. American Mathematical Societ, 011. [51] This upper bound on E was noted in Ref. [48].

8 7 Supplementar Materials for Nearl Maximall Predictive Features and Their Dimensions Sarah E. Marzen and James P. Crutchfield Appendix A: Estimating fractal dimensions We start b describing the generative models with mixed-state presentations shown in Fig. 1. Figure 1(left) has labeled transition matrices of: / 1/3 T (0) = and T (1) = 0 1/ 1/ /3 Figure 1(middle) and Fig. 1(right) have labeled transition matrices parametrized b x and α: α βx βx β αx βx β βx αx T (0) = αx β βx, T (1) = βx α βx, and T () = βx β αx, αx βx β βx αx β βx βx α with β = 1 α and = 1 x. Figure 1(middle) corresponds to x = 0.15 and α = 0.6, while Fig. 1(right) corresponds to x = 0.05 and α = 0.6. The probabilities of emitting a 0, 1, and given the current state are, respectivel: p(x A) = p(x B) = p(x C) = { α x = 0 1 α x = 1,, { α x = 1 1 α x = 0,, and { α x = 1 α x = 0, 1. 1 x C A B x C x x A x x x x B 1 x FIG. 3. (Left) Minimal generative model for the nond process whose mixed-state presentation is shown in Fig. 1(left). (Right) Parametrized minimal generative model for the mess3 process. For Fig. 1(middle), x = 0.15 and α = 0.6 and for Fig. 1(right), x = 0.05 and α = 0.6. The box-counting dimension is approximated b forming the mixed-state presentation from the generative model as described b Ref. [30] down to the desired tree depth. For Fig. 1(left), the depth was sufficient to accrue 10, 000 mixed states; for Fig. 1(middle) and Fig. 1(right), the depth was sufficient to accrue 30, 000 mixed states. Box-counting dimension was estimated b partitioning the simplex into boxes of side length ɛ = 1/n for n {1,..., 300}, calculating the number N ɛ of nonempt boxes, and using linear regression on log N ɛ versus log 1 ɛ to find the slope. Information dimension could be estimated as well b estimating probabilities of words that lead to each mixed state.

9 8 Appendix B: Predictive distortion scaling for coarse-grained mixed states Recall that the second feature set (alphabet F, random variable R) is constructed b partitioning the simplex into boxes of side-length ɛ. This section finds the scaling of d(r) with ɛ in the limit of asmptoticall small ɛ. For reasons that become clear shortl, we define: d L (R) = I[X :0 ; X L R] (B1) = H[ X L R] H[ X L X :0 ], (B) where lim L d L (R) = d(r). Let π(r) be the invariant probabilit distribution over ɛ-boxes, and let π( r) be the probabilit measure over mixed states in that ɛ-box. Then: d L (R) = ( π(r) H[ X L R = r] dπ( r) H[ ) X L G ], (B3) r where: Pr( X L R = r) = dπ( r) Pr( X L G ). (B4) To place an upper bound on d L (R) in terms of ɛ, we note that for an two mixed states and in the same partition: 1 G ɛ, the length of the longest diagonal in a hpercube of dimension G, b construction. Hence: Pr( X L G ) Pr( G X L G ) T V = Pr( X L G = i)( i i) From earlier, though, this is simpl: i=1 G T V Pr( X L G = i) T V i i i=1 G = i i. i=1 Pr( X L G ) Pr( X L G ) T V 1 G ɛ. (Often the literature defines T V as 1/ of the quantit used here.) From this, we can similarl conclude that: Pr( X L R = r) Pr( X L G ) T V = dπ( r) Pr( X L G ) Pr( X L G ) T V dπ( r) Pr( X L G ) Pr( X L G ) T V dπ( r) G ɛ = G ɛ for an mixed state in partition r. Reference [49] gives us an upper bound on differences in entrop in terms of the

10 9 total variation, which here implies that: ( ) H[ X L R = r] H[ G ɛ X L G ] H b + ɛl log A. This, in turn, gives: H[ X L R = r] dπ( r)h[ X L G ] dπ( r) H[ X L R = r] H[ X L G ] ( ) G ɛ H b + ɛl log A. With that having been said, if there is onl one mixed state in some ɛ-cube r, the quantit above vanishes: H[ X L R = r] dπ( r)h[ X L G ] = 0. Denote the probabilit of a nonempt ɛ-cube having onl one mixed state in it as r F:H[Y R=r]=0 π(r). substitution into Eq. (B3) gives: d L (R) r F:H[Y R=r]=0 ( π(r) H b ( G ɛ Note that this implies the entrop rate can be approximated with error no greater than ) Then ) + ɛl log A. (B5) ( ( ) ) G ɛ H b + ɛ log A if one knows the distribution over ɛ-boxes, π(r). The left factor r F:H[Y R=r]=0 π(r) tends to zero as the partitions shrink for a process generated b a countable ɛ-machine and not so otherwise. For ease of presentation, we first stud the case of uncountabl infinite ɛ-machines, and then comment on the case of processes produced b countable ɛ-machines. For clarit, we now introduce the notation R (ɛ) to indicate the predictive features obtained from an ɛ-partition of the mixed-state simplex. From Eq. (B5), we can readil conclude that for an finite L: d L (R (ɛ) ) lim ɛ 0 ɛ γ = 0, when γ < 1. We would like to extend this conclusion to d(r (ɛ) ), but inspection of Eq. (B5) suggests that concluding anthing similar for d(r (ɛ) ) requires care. In particular, we would like to show that lim ɛ 0 d(r (ɛ) )/ɛ γ = 0 for an γ < 1. Recall that d(r) = lim L d L (R). If we can exchange the limits lim ɛ 0 and lim L, then: d(r (ɛ) ) lim ɛ 0 ɛ γ = lim lim ɛ 0 L = lim L lim ɛ 0 = 0. d L (R (ɛ) ) ɛ γ d L (R (ɛ) ) ɛ γ To justif exchanging limits, we appeal to the Moore-Osgood Theorem [50]. Consider an monotone decreasing sequence {ɛ i } i=1 of ɛ, such that a i,l := d L (R (ɛi) )/ɛ γ i is a doubl-infinite sequence. When γ < 1, we have that lim i a i,l = 0. We provide a plausibilit argument for uniform convergence of a i,l to 0. Recall from Eq. (B5) that: a i,l 0 = a i,l 1 ɛ γ i ( H b ( G ɛi ) ) + ɛ il log A.

11 10 And, since H b (x) x log x + x, this expands to: a i,l G ɛ 1 γ i log (1/ɛ i ) + ( G ) G log + + L log A ɛ 1 γ i. G At small enough ɛ i, the first term dominates, impling that a i,l G ɛ 1 γ 1 i log ɛ i. B making i sufficientl large (i.e., b making ɛ i sufficientl small) we can make this upper bound as small as desired. Finall, the Data Processing Inequalit implies that I[X :0 ; X L R] is monotone increasing in L and upper bounded b E < log G < [51]. And so, a i,l is also monotone increasing in L and upper-bounded b E/ɛ γ i. Hence, the monotone convergence theorem implies that lim L a i,l exists for an i. Therefore, the conditions of the Moore-Osgood Theorem are satisfied, and lim i lim L d L (R (ɛi) )/ɛ γ i = 0, so lim ɛ 0 d(r (ɛ) )/ɛ γ = 0 for an γ < 1 as desired. Loosel speaking, as in the main text, we sa simpl that d(r (ɛ) ) scales as ɛ for small ɛ. Appendix C: Countabl infinite ɛ-machine example: the simple nonunifilar source Consider the parametrized Simple Nonunifilar Source (SNS), represented b Fig. 4. It is straightforward to show that the mixed states lie on a line in the simplex {(v n, 1 v n )} n=0 with: v n = (1 p) n (q p) (1 p) n q (1 q) n p. If p < q, then lim n v n = 1 p/q; if q < p, then lim n v n = 0. It is also straightforward to show that v n converges exponentiall fast to its limit when p q: v n lim v (1 p q n )( p q ( 1 q 1 p )n ) p < q ( ) n n (1 q p ) 1 p q < p. However, when p = q, then v n = 1 q n n+( 1 p 1), with lim n v n = 1. For an p, v n lim n v n (1/p) 1 n. This is a much slower rate of convergence. This greatl affects the boxcounting dimension of the parametrized SNS s mixed-state presentation. In fact, dim 0 (Y ) is nonzero (and roughl 1/) onl when p = q; see Fig. 4(bottom). Additionall, for the parametrized SNS, causal states act as a counter of the number of 0 s since last 1. So, we can think of the predictive distortion at coarse-graining ɛ as d(r ɛ ) E E(N ɛ ), which decreases exponentiall quickl with N ɛ rather than algebraicall under weak conditions. (See, for example, Ref. [9] and references therein.) Alternativel, from Eq. (B5), we note that n i=0 π(i) decreases exponentiall for the parametrized SNS; see Ref. [36] for details. From Fig. 4, we see that: N ɛ { 1/ɛ p = q (1/ɛ) p q.

12 11 10 p = q = 0.5 q = 0.4 < p = 0.5 Nɛ /ɛ FIG. 4. Feature-set scaling of SNS mixed states: Number N ɛ of nonempt ɛ-cubes as a function of 1/ɛ on a log-log plot. When p = q, the scaling relation appears to be power law: N ɛ (1/ɛ) 1/. When p q, N ɛ scales more slowl than a power law with 1/ɛ, seemingl at a rate N ɛ ( log 1 ɛ ).

On Information and Sufficiency

On Information and Sufficiency On Information and Sufficienc Huaiu hu SFI WORKING PAPER: 997-02-04 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessaril represent the views of the Santa Fe Institute.

More information

Informational and Causal Architecture of Discrete-Time Renewal Processes

Informational and Causal Architecture of Discrete-Time Renewal Processes Entropy 2015, 17, 4891-4917; doi:10.3390/e17074891 OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Informational and Causal Architecture of Discrete-Time Renewal Processes Sarah

More information

Equivalence of History and Generator Epsilon-Machines

Equivalence of History and Generator Epsilon-Machines Equivalence of History and Generator Epsilon-Machines Nicholas F. Travers James P. Crutchfield SFI WORKING PAPER: 2011-11-051 SFI Working Papers contain accounts of scientific work of the author(s) and

More information

Structure or Noise? Susanne Still James P. Crutchfield SFI WORKING PAPER:

Structure or Noise? Susanne Still James P. Crutchfield SFI WORKING PAPER: Structure or Noise? Susanne Still James P. Crutchfield SFI WORKING PAPER: 2007-08-020 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views

More information

Computation of Csiszár s Mutual Information of Order α

Computation of Csiszár s Mutual Information of Order α Computation of Csiszár s Mutual Information of Order Damianos Karakos, Sanjeev Khudanpur and Care E. Priebe Department of Electrical and Computer Engineering and Center for Language and Speech Processing

More information

Optimal predictive inference

Optimal predictive inference Optimal predictive inference Susanne Still University of Hawaii at Manoa Information and Computer Sciences S. Still, J. P. Crutchfield. Structure or Noise? http://lanl.arxiv.org/abs/0708.0654 S. Still,

More information

The Past and the Future in the Present

The Past and the Future in the Present The Past and the Future in the Present James P. Crutchfield Christopher J. Ellison SFI WORKING PAPER: 2010-12-034 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily

More information

UC San Francisco UC San Francisco Previously Published Works

UC San Francisco UC San Francisco Previously Published Works UC San Francisco UC San Francisco Previousl Published Works Title Radiative Corrections and Quantum Chaos. Permalink https://escholarship.org/uc/item/4jk9mg Journal PHYSICAL REVIEW LETTERS, 77(3) ISSN

More information

CONTINUOUS SPATIAL DATA ANALYSIS

CONTINUOUS SPATIAL DATA ANALYSIS CONTINUOUS SPATIAL DATA ANALSIS 1. Overview of Spatial Stochastic Processes The ke difference between continuous spatial data and point patterns is that there is now assumed to be a meaningful value, s

More information

Equivalence of History and Generator ɛ-machines

Equivalence of History and Generator ɛ-machines MCS Codes: 37A50 37B10 60J10 Equivalence of History and Generator ɛ-machines Nicholas F. Travers 1, 2, 1, 2, 3, 4, and James P. Crutchfield 1 Complexity Sciences Center 2 Mathematics Department 3 Physics

More information

Intrinsic Quantum Computation

Intrinsic Quantum Computation Intrinsic Quantum Computation James P. Crutchfield Karoline Wiesner SFI WORKING PAPER: 2006-11-045 SFI Working Papers contain accounts of scientific work of the author(s and do not necessarily represent

More information

Computation at the Nanoscale:

Computation at the Nanoscale: Computation at the Nanoscale: thoughts on information processing in novel materials, molecules, & atoms Jim Crutchfield Complexity Sciences Center Physics Department University of California at Davis Telluride

More information

Statistical Complexity of Simple 1D Spin Systems

Statistical Complexity of Simple 1D Spin Systems Statistical Complexity of Simple 1D Spin Systems James P. Crutchfield Physics Department, University of California, Berkeley, CA 94720-7300 and Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

More information

LESSON #48 - INTEGER EXPONENTS COMMON CORE ALGEBRA II

LESSON #48 - INTEGER EXPONENTS COMMON CORE ALGEBRA II LESSON #8 - INTEGER EXPONENTS COMMON CORE ALGEBRA II We just finished our review of linear functions. Linear functions are those that grow b equal differences for equal intervals. In this unit we will

More information

Optimal Causal Inference

Optimal Causal Inference Optimal Causal Inference Susanne Still James P. Crutchfield Christopher J. Ellison SFI WORKING PAPER: 007-08-04 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily

More information

Stability Analysis for Linear Systems under State Constraints

Stability Analysis for Linear Systems under State Constraints Stabilit Analsis for Linear Sstems under State Constraints Haijun Fang Abstract This paper revisits the problem of stabilit analsis for linear sstems under state constraints New and less conservative sufficient

More information

Permutation Excess Entropy and Mutual Information between the Past and Future

Permutation Excess Entropy and Mutual Information between the Past and Future Permutation Excess Entropy and Mutual Information between the Past and Future Taichi Haruna 1, Kohei Nakajima 2 1 Department of Earth & Planetary Sciences, Graduate School of Science, Kobe University,

More information

Signatures of Infinity: Nonergodicity and Resource Scaling in Prediction, Complexity, and Learning

Signatures of Infinity: Nonergodicity and Resource Scaling in Prediction, Complexity, and Learning Signatures of Infinity: Nonergodicity and Resource Scaling in Prediction, Complexity, and Learning James P. Crutchfield Sarah Marzen SFI WORKING PAPER: 2015-04-010 SFI Working Papers contain accounts of

More information

Perturbation Theory for Variational Inference

Perturbation Theory for Variational Inference Perturbation heor for Variational Inference Manfred Opper U Berlin Marco Fraccaro echnical Universit of Denmark Ulrich Paquet Apple Ale Susemihl U Berlin Ole Winther echnical Universit of Denmark Abstract

More information

Section 2: Wave Functions and Probability Solutions

Section 2: Wave Functions and Probability Solutions Phsics 43a: Quantum Mechanics I Section : Wave Functions and Probabilit Solutions Spring 5, Harvard Here is a summar of the most important points from the second week with a few of m own tidbits), relevant

More information

Causal Effects for Prediction and Deliberative Decision Making of Embodied Systems

Causal Effects for Prediction and Deliberative Decision Making of Embodied Systems Causal Effects for Prediction and Deliberative Decision Making of Embodied ystems Nihat y Keyan Zahedi FI ORKING PPER: 2011-11-055 FI orking Papers contain accounts of scientific work of the author(s)

More information

Week 3 September 5-7.

Week 3 September 5-7. MA322 Weekl topics and quiz preparations Week 3 September 5-7. Topics These are alread partl covered in lectures. We collect the details for convenience.. Solutions of homogeneous equations AX =. 2. Using

More information

Exact Synchronization for Finite-State Sources

Exact Synchronization for Finite-State Sources Santa Fe Institute Woring Paper 10-11-031 arxiv.org:1008.4182 [nlin.cd] Exact Synchronization for Finite-State Sources Nicholas F. Travers 1, 2, 1, 2, 3, 4, and James P. Crutchfield 1 Complexity Sciences

More information

Information Accessibility and Cryptic Processes

Information Accessibility and Cryptic Processes Information Accessibility and Cryptic Processes John R. Mahoney Christopher J. Ellison James P. Crutchfield SFI WORKING PAPER: 2009-05-08 SFI Working Papers contain accounts of scientific work of the author(s)

More information

Complex Systems Methods 3. Statistical complexity of temporal sequences

Complex Systems Methods 3. Statistical complexity of temporal sequences Complex Systems Methods 3. Statistical complexity of temporal sequences Eckehard Olbrich MPI MiS Leipzig Potsdam WS 2007/08 Olbrich (Leipzig) 02.11.2007 1 / 24 Overview 1 Summary Kolmogorov sufficient

More information

Research Article Chaotic Attractor Generation via a Simple Linear Time-Varying System

Research Article Chaotic Attractor Generation via a Simple Linear Time-Varying System Discrete Dnamics in Nature and Societ Volume, Article ID 836, 8 pages doi:.//836 Research Article Chaotic Attractor Generation via a Simple Linear Time-Varing Sstem Baiu Ou and Desheng Liu Department of

More information

Section 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y

Section 3.1. ; X = (0, 1]. (i) f : R R R, f (x, y) = x y Paul J. Bruillard MATH 0.970 Problem Set 6 An Introduction to Abstract Mathematics R. Bond and W. Keane Section 3.1: 3b,c,e,i, 4bd, 6, 9, 15, 16, 18c,e, 19a, 0, 1b Section 3.: 1f,i, e, 6, 1e,f,h, 13e,

More information

QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS

QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Journal of the Operations Research Societ of Japan 008, Vol. 51, No., 191-01 QUADRATIC AND CONVEX MINIMAX CLASSIFICATION PROBLEMS Tomonari Kitahara Shinji Mizuno Kazuhide Nakata Toko Institute of Technolog

More information

Supplementary Material: A Robust Approach to Sequential Information Theoretic Planning

Supplementary Material: A Robust Approach to Sequential Information Theoretic Planning Supplementar aterial: A Robust Approach to Sequential Information Theoretic Planning Sue Zheng Jason Pacheco John W. Fisher, III. Proofs of Estimator Properties.. Proof of Prop. Here we show how the bias

More information

Solutions to Two Interesting Problems

Solutions to Two Interesting Problems Solutions to Two Interesting Problems Save the Lemming On each square of an n n chessboard is an arrow pointing to one of its eight neighbors (or off the board, if it s an edge square). However, arrows

More information

Ordinal One-Switch Utility Functions

Ordinal One-Switch Utility Functions Ordinal One-Switch Utilit Functions Ali E. Abbas Universit of Southern California, Los Angeles, California 90089, aliabbas@usc.edu David E. Bell Harvard Business School, Boston, Massachusetts 0163, dbell@hbs.edu

More information

School of Computer and Communication Sciences. Information Theory and Coding Notes on Random Coding December 12, 2003.

School of Computer and Communication Sciences. Information Theory and Coding Notes on Random Coding December 12, 2003. ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE School of Computer and Communication Sciences Handout 8 Information Theor and Coding Notes on Random Coding December 2, 2003 Random Coding In this note we prove

More information

Lecture 4: Exact ODE s

Lecture 4: Exact ODE s Lecture 4: Exact ODE s Dr. Michael Doughert Januar 23, 203 Exact equations are first-order ODE s of a particular form, and whose methods of solution rel upon basic facts concerning partial derivatives,

More information

Prediction, Retrodiction, and the Amount of Information Stored in the Present

Prediction, Retrodiction, and the Amount of Information Stored in the Present Prediction, Retrodiction, and the Amount of Information Stored in the Present Christopher J. Ellison John R. Mahoney James P. Crutchfield SFI WORKING PAPER: 2009-05-017 SFI Working Papers contain accounts

More information

How Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos

How Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos How Random is a Coin Toss? Bayesian Inference and the Symbolic Dynamics of Deterministic Chaos Christopher C. Strelioff Center for Complex Systems Research and Department of Physics University of Illinois

More information

Lecture 7. The Cluster Expansion Lemma

Lecture 7. The Cluster Expansion Lemma Stanford Universit Spring 206 Math 233: Non-constructive methods in combinatorics Instructor: Jan Vondrák Lecture date: Apr 3, 206 Scribe: Erik Bates Lecture 7. The Cluster Expansion Lemma We have seen

More information

Information Symmetries in Irreversible Processes

Information Symmetries in Irreversible Processes Information Symmetries in Irreversible Processes Christopher J. Ellison John R. Mahoney R. G. James James P. Crutchfield Joerg Reichardt SFI WORKING PAPER: 0-07-08 SFI Working Papers contain accounts of

More information

Reconstruction Deconstruction:

Reconstruction Deconstruction: Reconstruction Deconstruction: A Brief History of Building Models of Nonlinear Dynamical Systems Jim Crutchfield Center for Computational Science & Engineering Physics Department University of California,

More information

Regular Physics - Notes Ch. 1

Regular Physics - Notes Ch. 1 Regular Phsics - Notes Ch. 1 What is Phsics? the stud of matter and energ and their relationships; the stud of the basic phsical laws of nature which are often stated in simple mathematical equations.

More information

15.2 Graphing Logarithmic

15.2 Graphing Logarithmic _ - - - - - - Locker LESSON 5. Graphing Logarithmic Functions Teas Math Standards The student is epected to: A.5.A Determine the effects on the ke attributes on the graphs of f () = b and f () = log b

More information

Limits and Continuous Functions. 2.2 Introduction to Limits. We first interpret limits loosely. We write. lim f(x) = L

Limits and Continuous Functions. 2.2 Introduction to Limits. We first interpret limits loosely. We write. lim f(x) = L 2 Limits and Continuous Functions 2.2 Introduction to Limits We first interpret limits loosel. We write lim f() = L and sa the limit of f() as approaches c, equals L if we can make the values of f() arbitraril

More information

Simultaneous Orthogonal Rotations Angle

Simultaneous Orthogonal Rotations Angle ELEKTROTEHNIŠKI VESTNIK 8(1-2): -11, 2011 ENGLISH EDITION Simultaneous Orthogonal Rotations Angle Sašo Tomažič 1, Sara Stančin 2 Facult of Electrical Engineering, Universit of Ljubljana 1 E-mail: saso.tomaic@fe.uni-lj.si

More information

MDP Preliminaries. Nan Jiang. February 10, 2019

MDP Preliminaries. Nan Jiang. February 10, 2019 MDP Preliminaries Nan Jiang February 10, 2019 1 Markov Decision Processes In reinforcement learning, the interactions between the agent and the environment are often described by a Markov Decision Process

More information

Course Requirements. Course Mechanics. Projects & Exams. Homework. Week 1. Introduction. Fast Multipole Methods: Fundamentals & Applications

Course Requirements. Course Mechanics. Projects & Exams. Homework. Week 1. Introduction. Fast Multipole Methods: Fundamentals & Applications Week 1. Introduction. Fast Multipole Methods: Fundamentals & Applications Ramani Duraiswami Nail A. Gumerov What are multipole methods and what is this course about. Problems from phsics, mathematics,

More information

Autonomous Equations / Stability of Equilibrium Solutions. y = f (y).

Autonomous Equations / Stability of Equilibrium Solutions. y = f (y). Autonomous Equations / Stabilit of Equilibrium Solutions First order autonomous equations, Equilibrium solutions, Stabilit, Longterm behavior of solutions, direction fields, Population dnamics and logistic

More information

Hamiltonicity and Fault Tolerance

Hamiltonicity and Fault Tolerance Hamiltonicit and Fault Tolerance in the k-ar n-cube B Clifford R. Haithcock Portland State Universit Department of Mathematics and Statistics 006 In partial fulfillment of the requirements of the degree

More information

2 Ordinary Differential Equations: Initial Value Problems

2 Ordinary Differential Equations: Initial Value Problems Ordinar Differential Equations: Initial Value Problems Read sections 9., (9. for information), 9.3, 9.3., 9.3. (up to p. 396), 9.3.6. Review questions 9.3, 9.4, 9.8, 9.9, 9.4 9.6.. Two Examples.. Foxes

More information

On Solving Systems of ODEs Numerically

On Solving Systems of ODEs Numerically On Solving Sstems of ODEs Numericall Temple H. Fa, Stephan V. Joubert, and Andrew Mkolesia Department of Mathematical Technolog Tshwane Universit of Technolog Private Bag x680 Pretoria 0001 SOUTH AFRICA

More information

Chapter 6. Exploring Data: Relationships

Chapter 6. Exploring Data: Relationships Chapter 6 Exploring Data: Relationships For All Practical Purposes: Effective Teaching A characteristic of an effective instructor is fairness and consistenc in grading and evaluating student performance.

More information

Hierarchical Thermodynamics

Hierarchical Thermodynamics Hierarchical Thermodynamics James P Crutchfield Complexity Sciences Center Physics Department University of California, Davis http://csc.ucdavis.edu/~chaos/ Workshop on Information Engines at the Frontier

More information

Evolution of philosophy and description of measurement (preliminary rationale for VIM3)

Evolution of philosophy and description of measurement (preliminary rationale for VIM3) Accred Qual Assur (2007) 12:201 218 DOI 10.1007/s00769-007-0259-4 LEGISLATION AND NORMS Evolution of philosoph and description of measurement (preliminar rationale for VIM3) Charles Ehrlich Æ René Dbkaer

More information

Markov chains and Hidden Markov Models

Markov chains and Hidden Markov Models Discrete Math for Bioinformatics WS 10/11:, b A. Bockmar/K. Reinert, 7. November 2011, 10:24 2001 Markov chains and Hidden Markov Models We will discuss: Hidden Markov Models (HMMs) Algorithms: Viterbi,

More information

Experimental Uncertainty Review. Abstract. References. Measurement Uncertainties and Uncertainty Propagation

Experimental Uncertainty Review. Abstract. References. Measurement Uncertainties and Uncertainty Propagation Experimental Uncertaint Review Abstract This is intended as a brief summar of the basic elements of uncertaint analsis, and a hand reference for laborator use. It provides some elementar "rules-of-thumb"

More information

Lecture 2: Separable Ordinary Differential Equations

Lecture 2: Separable Ordinary Differential Equations Lecture : Separable Ordinar Differential Equations Dr. Michael Doughert Januar 8, 00 Some Terminolog: ODE s, PDE s, IVP s The differential equations we have looked at so far are called ordinar differential

More information

Minimax Learning for Remote Prediction

Minimax Learning for Remote Prediction 1 Minimax Learning for Remote Prediction Cheuk Ting Li *, Xiugang Wu, Afer Ozgur, Abbas El Gamal * Department of Electrical Engineering and Computer Sciences Universit of California, Berkele Email: ctli@berkele.edu

More information

COIN TOSS MODELING RĂZVAN C. STEFAN, TIBERIUS O. CHECHE

COIN TOSS MODELING RĂZVAN C. STEFAN, TIBERIUS O. CHECHE Romanian Reports in Phsics 69, 94 (7) COIN TOSS MODELING RĂZVAN C. STEFAN, TIBERIUS O. CHECHE Universit of Bucharest, Facult of Phsics, P.O.Box MG-, RO-775, Bucharest Măgurele, Romania E-mails: stefan.constantin6@gmail.com;

More information

Vibrational Power Flow Considerations Arising From Multi-Dimensional Isolators. Abstract

Vibrational Power Flow Considerations Arising From Multi-Dimensional Isolators. Abstract Vibrational Power Flow Considerations Arising From Multi-Dimensional Isolators Rajendra Singh and Seungbo Kim The Ohio State Universit Columbus, OH 4321-117, USA Abstract Much of the vibration isolation

More information

Transient Dissipation and Structural Costs of Physical Information Transduction

Transient Dissipation and Structural Costs of Physical Information Transduction Santa Fe Institute Working Paper 2017-01-01 arxiv.org:1612.08616 [cond-mat.stat-mech] Transient Dissipation and Structural Costs of Physical Information Transduction Alexander B. Boyd, 1, Dibyendu Mandal,

More information

Exact Synchronization for Finite-State Sources

Exact Synchronization for Finite-State Sources Santa Fe Institute Woring Paper 10-08-XXX arxiv.org:1008.4182 [nlin.cd] Exact Synchronization for Finite-State Sources Nicholas F. Travers 1, 2, 1, 2, 3, 4, and James P. Crutchfield 1 Complexity Sciences

More information

MATRIX KERNELS FOR THE GAUSSIAN ORTHOGONAL AND SYMPLECTIC ENSEMBLES

MATRIX KERNELS FOR THE GAUSSIAN ORTHOGONAL AND SYMPLECTIC ENSEMBLES Ann. Inst. Fourier, Grenoble 55, 6 (5), 197 7 MATRIX KERNELS FOR THE GAUSSIAN ORTHOGONAL AND SYMPLECTIC ENSEMBLES b Craig A. TRACY & Harold WIDOM I. Introduction. For a large class of finite N determinantal

More information

Enumerating Finitary Processes

Enumerating Finitary Processes Enumerating Finitary Processes B. D. Johnson J. P. Crutchfield C. J. Ellison C. S. McTague SFI WORKING PAPER: 200--027 SFI Working Papers contain accounts of scientific work of the author(s) and do not

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information

Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information Programming and Computer Software, Vol. 29, No. 4, 2003, pp. 210 218. Translated from Programmirovanie, Vol. 29, No. 4, 2003. Original Russian Text Copright 2003 b Zhirkov, Kortchagine, Lukin, Krlov, Baakovskii.

More information

Bayesian Structural Inference for Hidden Processes

Bayesian Structural Inference for Hidden Processes Santa Fe Institute Working Paper 13-09-027 arxiv:1309.1392 [stat.ml] Bayesian Structural Inference for Hidden Processes they occur in discrete-time and discrete-state time series. (The approach also applies

More information

Maximum-Likelihood fitting

Maximum-Likelihood fitting CMP 0b Lecture F. Sigworth Maximum-Likelihood fitting One of the issues I want to address in this lecture is the fitting of distributions dwell times. We want to find the best curve to draw over a histogram,

More information

Linear Transformations

Linear Transformations inear Transformations 6 The main objects of stud in an course in linear algebra are linear functions: Definition A function : V! W is linear if V and W are vector spaces and (ru + sv) r(u)+s(v) for all

More information

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali

More information

RECURSIVE SEQUENCES IN FIRST-YEAR CALCULUS

RECURSIVE SEQUENCES IN FIRST-YEAR CALCULUS RECURSIVE SEQUENCES IN FIRST-YEAR CALCULUS THOMAS KRAINER Abstract. This article provides read-to-use supplementar material on recursive sequences for a second semester calculus class. It equips first-ear

More information

Methods of Solving Ordinary Differential Equations (Online)

Methods of Solving Ordinary Differential Equations (Online) 7in 0in Felder c0_online.te V3 - Januar, 05 0:5 A.M. Page CHAPTER 0 Methods of Solving Ordinar Differential Equations (Online) 0.3 Phase Portraits Just as a slope field (Section.4) gives us a wa to visualize

More information

POLYNOMIAL EXPANSION OF THE PROBABILITY DENSITY FUNCTION ABOUT GAUSSIAN MIXTURES

POLYNOMIAL EXPANSION OF THE PROBABILITY DENSITY FUNCTION ABOUT GAUSSIAN MIXTURES POLYNOMIAL EXPANSION OF THE PROBABILITY DENSITY FUNCTION ABOUT GAUSSIAN MIXTURES Charles C. Cavalcante, J. C. M. Mota and J. M. T. Romano GTEL/DETI/CT/UFC C.P. 65, Campus do Pici, CEP: 6.455-76 Fortaleza-CE,

More information

Deep Nonlinear Non-Gaussian Filtering for Dynamical Systems

Deep Nonlinear Non-Gaussian Filtering for Dynamical Systems Deep Nonlinear Non-Gaussian Filtering for Dnamical Sstems Arash Mehrjou Department of Empirical Inference Ma Planck Institute for Intelligent Sstems arash.mehrjou@tuebingen.mpg.de Bernhard Schölkopf Department

More information

Stochastic Interpretation for the Arimoto Algorithm

Stochastic Interpretation for the Arimoto Algorithm Stochastic Interpretation for the Arimoto Algorithm Serge Tridenski EE - Sstems Department Tel Aviv Universit Tel Aviv, Israel Email: sergetr@post.tau.ac.il Ram Zamir EE - Sstems Department Tel Aviv Universit

More information

Stability of Limit Cycles in Hybrid Systems

Stability of Limit Cycles in Hybrid Systems Proceedings of the 34th Hawaii International Conference on Sstem Sciences - 21 Stabilit of Limit Ccles in Hbrid Sstems Ian A. Hiskens Department of Electrical and Computer Engineering Universit of Illinois

More information

LESSON #11 - FORMS OF A LINE COMMON CORE ALGEBRA II

LESSON #11 - FORMS OF A LINE COMMON CORE ALGEBRA II LESSON # - FORMS OF A LINE COMMON CORE ALGEBRA II Linear functions come in a variet of forms. The two shown below have been introduced in Common Core Algebra I and Common Core Geometr. TWO COMMON FORMS

More information

Identification of Nonlinear Dynamic Systems with Multiple Inputs and Single Output using discrete-time Volterra Type Equations

Identification of Nonlinear Dynamic Systems with Multiple Inputs and Single Output using discrete-time Volterra Type Equations Identification of Nonlinear Dnamic Sstems with Multiple Inputs and Single Output using discrete-time Volterra Tpe Equations Thomas Treichl, Stefan Hofmann, Dierk Schröder Institute for Electrical Drive

More information

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics )

Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) Learning the Linear Dynamical System with ASOS ( Approximated Second-Order Statistics ) James Martens University of Toronto June 24, 2010 Computer Science UNIVERSITY OF TORONTO James Martens (U of T) Learning

More information

Orientations of digraphs almost preserving diameter

Orientations of digraphs almost preserving diameter Orientations of digraphs almost preserving diameter Gregor Gutin Department of Computer Science Roal Hollowa, Universit of London Egham, Surre TW20 0EX, UK, gutin@dcs.rhbnc.ac.uk Anders Yeo BRICS, Department

More information

THE UNIVERSITY OF MICHIGAN. Timing Analysis of Domino Logic

THE UNIVERSITY OF MICHIGAN. Timing Analysis of Domino Logic Timing Analsis of Domino Logic b David Van Campenhout, Trevor Mudge and Karem Sakallah CSE-T-296-96 THE UNIVESITY O MICHIGAN Computer Science and Engineering Division Department of Electrical Engineering

More information

LESSON #12 - FORMS OF A LINE COMMON CORE ALGEBRA II

LESSON #12 - FORMS OF A LINE COMMON CORE ALGEBRA II LESSON # - FORMS OF A LINE COMMON CORE ALGEBRA II Linear functions come in a variet of forms. The two shown below have been introduced in Common Core Algebra I and Common Core Geometr. TWO COMMON FORMS

More information

Symmetry Arguments and the Role They Play in Using Gauss Law

Symmetry Arguments and the Role They Play in Using Gauss Law Smmetr Arguments and the Role The la in Using Gauss Law K. M. Westerberg (9/2005) Smmetr plas a ver important role in science in general, and phsics in particular. Arguments based on smmetr can often simplif

More information

CCSSM Algebra: Equations

CCSSM Algebra: Equations CCSSM Algebra: Equations. Reasoning with Equations and Inequalities (A-REI) Eplain each step in solving a simple equation as following from the equalit of numbers asserted at the previous step, starting

More information

Nonlinear regression

Nonlinear regression 7//6 Nonlinear regression What is nonlinear model? In general nonlinearit in regression model can be seen as:. Nonlinearit of dependent variable ( ) for eample is binar (logit model), nominal (multivariate

More information

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35

Information Theory. Mark van Rossum. January 24, School of Informatics, University of Edinburgh 1 / 35 1 / 35 Information Theory Mark van Rossum School of Informatics, University of Edinburgh January 24, 2018 0 Version: January 24, 2018 Why information theory 2 / 35 Understanding the neural code. Encoding

More information

Trusses - Method of Sections

Trusses - Method of Sections Trusses - Method of Sections ME 202 Methods of Truss Analsis Method of joints (previous notes) Method of sections (these notes) 2 MOS - Concepts Separate the structure into two parts (sections) b cutting

More information

1/f spectral trend and frequency power law of lossy media

1/f spectral trend and frequency power law of lossy media 1/f spectral trend and frequenc power law of loss media W. Chen Simula Research Laborator, P. O. Box. 134, 135 Lsaker, Norwa (5 Ma 3) The dissipation of acoustic wave propagation has long been found to

More information

Using Expectation-Maximization for Reinforcement Learning

Using Expectation-Maximization for Reinforcement Learning NOTE Communicated by Andrew Barto and Michael Jordan Using Expectation-Maximization for Reinforcement Learning Peter Dayan Department of Brain and Cognitive Sciences, Center for Biological and Computational

More information

EQUIVALENT FORMULATIONS OF HYPERCONTRACTIVITY USING INFORMATION MEASURES

EQUIVALENT FORMULATIONS OF HYPERCONTRACTIVITY USING INFORMATION MEASURES 1 EQUIVALENT FORMULATIONS OF HYPERCONTRACTIVITY USING INFORMATION MEASURES CHANDRA NAIR 1 1 Dept. of Information Engineering, The Chinese Universit of Hong Kong Abstract We derive alternate characterizations

More information

Spotlight on the Extended Method of Frobenius

Spotlight on the Extended Method of Frobenius 113 Spotlight on the Extended Method of Frobenius See Sections 11.1 and 11.2 for the model of an aging spring. Reference: Section 11.4 and SPOTLIGHT ON BESSEL FUNCTIONS. Bessel functions of the first kind

More information

MAT 127: Calculus C, Fall 2010 Solutions to Midterm I

MAT 127: Calculus C, Fall 2010 Solutions to Midterm I MAT 7: Calculus C, Fall 00 Solutions to Midterm I Problem (0pts) Consider the four differential equations for = (): (a) = ( + ) (b) = ( + ) (c) = e + (d) = e. Each of the four diagrams below shows a solution

More information

Why is Deep Learning so effective?

Why is Deep Learning so effective? Ma191b Winter 2017 Geometry of Neuroscience The unreasonable effectiveness of deep learning This lecture is based entirely on the paper: Reference: Henry W. Lin and Max Tegmark, Why does deep and cheap

More information

Quantum versus Classical Measures of Complexity on Classical Information

Quantum versus Classical Measures of Complexity on Classical Information Quantum versus Classical Measures of Complexity on Classical Information Lock Yue Chew, PhD Division of Physics and Applied Physics School of Physical & Mathematical Sciences & Complexity Institute Nanyang

More information

BAYESIAN STRUCTURAL INFERENCE AND THE BAUM-WELCH ALGORITHM

BAYESIAN STRUCTURAL INFERENCE AND THE BAUM-WELCH ALGORITHM BAYESIAN STRUCTURAL INFERENCE AND THE BAUM-WELCH ALGORITHM DMITRY SHEMETOV UNIVERSITY OF CALIFORNIA AT DAVIS Department of Mathematics dshemetov@ucdavis.edu Abstract. We perform a preliminary comparative

More information

11.4 Polar Coordinates

11.4 Polar Coordinates 11. Polar Coordinates 917 11. Polar Coordinates In Section 1.1, we introduced the Cartesian coordinates of a point in the plane as a means of assigning ordered pairs of numbers to points in the plane.

More information

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS

Copyright, 2008, R.E. Kass, E.N. Brown, and U. Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS Copright, 8, RE Kass, EN Brown, and U Eden REPRODUCTION OR CIRCULATION REQUIRES PERMISSION OF THE AUTHORS Chapter 6 Random Vectors and Multivariate Distributions 6 Random Vectors In Section?? we etended

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures AB = BA = I,

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures AB = BA = I, FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 7 MATRICES II Inverse of a matri Sstems of linear equations Solution of sets of linear equations elimination methods 4

More information

Strain Transformation and Rosette Gage Theory

Strain Transformation and Rosette Gage Theory Strain Transformation and Rosette Gage Theor It is often desired to measure the full state of strain on the surface of a part, that is to measure not onl the two etensional strains, and, but also the shear

More information

Fault Detection and Diagnosis Using Information Measures

Fault Detection and Diagnosis Using Information Measures Fault Detection and Diagnosis Using Information Measures Rudolf Kulhavý Honewell Technolog Center & Institute of Information Theor and Automation Prague, Cech Republic Outline Probabilit-based inference

More information

Answer Explanations. The SAT Subject Tests. Mathematics Level 1 & 2 TO PRACTICE QUESTIONS FROM THE SAT SUBJECT TESTS STUDENT GUIDE

Answer Explanations. The SAT Subject Tests. Mathematics Level 1 & 2 TO PRACTICE QUESTIONS FROM THE SAT SUBJECT TESTS STUDENT GUIDE The SAT Subject Tests Answer Eplanations TO PRACTICE QUESTIONS FROM THE SAT SUBJECT TESTS STUDENT GUIDE Mathematics Level & Visit sat.org/stpractice to get more practice and stud tips for the Subject Test

More information

Characterization of the Skew-Normal Distribution Via Order Statistics and Record Values

Characterization of the Skew-Normal Distribution Via Order Statistics and Record Values International Journal of Statistics and Probabilit; Vol. 4, No. 1; 2015 ISSN 1927-7032 E-ISSN 1927-7040 Published b Canadian Center of Science and Education Characterization of the Sew-Normal Distribution

More information

10.5 Graphs of the Trigonometric Functions

10.5 Graphs of the Trigonometric Functions 790 Foundations of Trigonometr 0.5 Graphs of the Trigonometric Functions In this section, we return to our discussion of the circular (trigonometric functions as functions of real numbers and pick up where

More information