A framework for the timing analysis of dynamic branch predictors

Size: px
Start display at page:

Download "A framework for the timing analysis of dynamic branch predictors"

Transcription

1 A framework for the timing analysis of dynamic ranch predictors Claire Maïza INP Grenole, Verimag Grenole, France Christine Rochange IRIT - CNRS Université de Toulouse, France rochange@irit.fr Astract Real-time systems require predictale processor ehavior such that tight upper ounds on the worst-case execution times (WCETs) of their critical tasks can e derived. Methods for estimating these upper ounds received much attention in the last fifteen years. However, the use of high-performance processors to meet ever growing performance requirements demands the development of more and more complex models. This is also the case for dynamic ranch prediction schemes that are implemented in processors used in emedded systems. In this paper we focus on the static timing analysis of dynamic ranch prediction ased on the implicit path enumeration technique. The goal of this paper is to introduce a framework that allow to generate the IPET model of dynamic ranch prediction mechanisms. With this framework we generated the models of some ranch predictors and compare them. 1. Introduction Real-time systems differ from other computer systems y the fact that respecting timing constraints is as important as providing correct results. In other words, the system not only has to deliver correct results, ut must also deliver them in time. This must e proved y timing analysis. The ojective is to assess that the longest execution time of each task makes it possile to uild a task schedule that meets the constraints. When time-critical applications are considered, the timing analysis is expected to produce safe upper ounds of the execution times, also called upper ounds of the WCETs, for Worst-Case Execution Times. Techniques for WCET analysis have een investigated for more than twenty years ut they always need improvement and extensions to support increasingly complex hardware (needed to fulfill ever growing performance requirements). For example, recent emedded processors like ARM 11 and Cortex-A8 or PowerPC 750 implement dynamic ranch prediction schemes. Their ehavior is to e taken into account when computing WCETs. Several solutions have een proposed in the literature and will e descried elow. Our contriution in this paper is a general framework to derive models for dynamic ranch predictors the most frequently availale in emedded cores. The static analysis of dynamic ranch prediction can e local, i.e., each ranch is considered in isolation and its ehavior is determined y the algorithmic structure it implements, or gloal, i.e., all ranches are analyzed conjunctly so that their possile interactions can e accounted for. Local approaches [5, 3, 2] aim at computing the numer of mispredictions for each ranch y only considering its previous executions. Then, they are limited to simple dynamic ranch predictors such as imodal predictors (see Section 2) that do not rely on a gloal history of ranches. Furthermore, except for [5] that is ased on the source code, the local approach is only used under the very optimistic assumption that the entries of the prediction tale are not shared: The effect of sharing needs to e analysed separately, in a similar way of cache analysis, and in the case of conflicts, the ranches are considered as mispredicted. Gloal approaches descrie the ehavior of the ranch predictor as a set of constraints [7, 4] that extend the ILP formulation used to estimate the WCET of the program with the IPET method [8]. The solution of the ILP prolem includes the execution count of each ranch and its numer of mispredictions along oth directions. Previous work using the gloal approach introduce a system of constraints to model one specific ranch predictor: one ased on local histories and 1-it counters [7] and one ased on a gloal history and 2-it counters [4]. In this paper, we consider the gloal approach. The main contriution of the paper is a framework to model ranch predictors in a generic way, which extends previous solutions [7, 4]. This framework allows modeling any ranch predictor that uses a ranch history tale (BHT), 1-it or 2-it counters to compute the prediction and various ways to index the BHT (address of the ranch, local or gloal history, or any comination). It is structured in three parts, each one eing related to a view of the BHT ehavior: computation of the BHT index, computation of

2 the interdependencies etween ranches sharing the same BHT entry, and computation of the prediction y prediction counters. The framework allows to model a large numer of BHT configurations. It is generic on: - the size of the BHT, - the ranch history (local or gloal), - the size of the ranch history, - the matching function used to index the BHT (for instance: exclusive-or), - the size of prediction counters (1 it or 2 its). Varying these parameters, we generate some models of ranch predictors. The second contriution of the paper is an evaluation of the gloal approach ased on the generated models. We use the modeling complexity notion [4] to compare the size of the models. From this comparison we conclude aout the gloal approach and the ranch predictors that fit etter to e modeled y this approach. Note that the timing analyses of ranch prediction using these two approaches are limited to architectures that do not exhiit timing anomalies [?], i.e., a misprediction is the worst case and can e accounted for y a fixed constant penalty. The paper is organized as follows. Section 2 gives a short overview of dynamic ranch prediction and lists the schemes that are implemented in some emedded processors. In Section 4 we introduce our framework and show how it can e used to generate the models of some dynamic ranch predictors. The approach is evaluated in Section 5 and we give some concluding remarks in section Dynamic ranch prediction Branch prediction enhances the pipeline performance y allowing the speculative fetching of instructions along the predicted path after a conditional ranch has een encountered and until it is resolved. If the ranch was mispredicted, the pipeline is flushed and the correct path is fetched and executed. The BTB (Branch Target Buffer) is used to (i) detect the presence of a ranch instruction in the flow of fetched instructions efore it has een decoded and (ii) to provide the ranch target address in case the ranch is predicted as taken. The BTB is indexed y the lower its of the program counter (PC) and each entry is tagged with the ranch address (PC) for verification. Possile directions of a ranch instructions are taken (usually coded y 1 ) or not taken (coded y 0 ). The dynamic prediction of a ranch direction is generally ased on a counter that reflects the ranch history. A 1- it counter stores the last outcome of the ranch. More often, a 2-it saturating counter is used. Its four possile states are show in Figure 1: strongly taken (T), weakly taken (t), weakly not taken (nt) and strongly not taken (NT). The counter is incremented when the ranch Figure 1. Behavior of a 2-it ranch prediction counter is taken and decremented otherwise. The next instance of the ranch is predicted as taken whenever the upper it is set [11] and not taken when it it cleared. Several prediction counters(to e used y different ranches) are maintained in a BHT (Branch History Tale). Dynamic ranch predictors differ y the way they index the BHT: the imodal ranch predictor uses the ranch instruction address (PC) [11], while gloal ranch predictors use an n-it register that stores the outcomes of the n last executed ranches [12]. For these schemes, the BHT can e directly indexed y the gloal history register (see Figure 2) or y a function (generally an exclusive- OR) of the history and the ranch PC. This last scheme is known under the name gshare. A local history, private to each ranch, can also e used. Note that more complex schemes exist ut we have not found them implemented in any emedded processor. Therefore, we do not consider them in this paper. Unlike the Branch Target Buffer, the Branch History Tale is usually not tagged to reduce costs. Two ranches reaching the same entry of the BHT use the same prediction counter. Shared entries are more frequent when the BHT is indexed y a gloal history, even when this phenomenon known as aliasing is reduced y XOR-ing the history with the ranch PC. Sharing can e advantageous (i.e. a ranch takes advantage from the fact that its 2-it counter has een updated y another ranch) ut it is more often destructive. Tale 1 shows the ranch prediction schemes implemented in four emedded processors. The BHT is indexed y the ranch PC in most of the cases. However, one processor implements a gloal-history ranch predictor. Core BHT size BHT Index Prediction ARM-CortexA history and PC a 2-it ARM PC 2-it PowerPC PC 2-it MIPS-34K 512 PC 2-it a 10 its of history and 4 its of PC used to compute the index unified BTB and BHT Tale 1. Branch predictors implemented in five emedded processors 2

3 Figure 2. A gloal-history dynamic ranch predictor In this paper, we focus on these types of dynamic predictors. The prediction is ased on 2-its counters, and the BHT is indexed y a function of the ranch instruction address and the gloal history. In the remainder of this paper, we focus on the analysis of the BHT ehavior. Modeling the BTB (that contains the target addresses for ranches that are predicted taken) is out of the scope of this work. For this reason, we assume that each ranch always hits in the BTB. As future work, we plan to comine our analysis to a BTB analysis such as [5, 6]. 3. Modeling dynamic ranch prediction y IPET Throughout this paper, we use the following notation. A control flow graph is given y: CFG = (X,E,entry,exit) where X = { 0,..., n } denotes the set of asic locks i of the program and E X X the corresponding set of edges connecting them. The entry node is denoted y entry and the exit node y exit. Furthermore, we denote the set of asic locks containing a conditional ranch y: B X. B = { X x 1,x 2 X,(,x 1 ),(,x 2 ) E and x 1 x 2 } We also use the set of predecessors of a lock and the set of successors: i X,S i = {s X (i,s) E} i X,P i = {p X (p,i) E} Finally, we use a partial function lael on the edges to mark the direction of the conditional ranches: B,lael((,s 1 )) lael((,s 2 )) where s 1 and s 2 are the successors of (S = {s 1,s 2 }) and the direction of a conditional ranch is taken (1) or nottaken (0): D = {0,1} ( s S,lael((,s)) D) Figure 3 shows a CFG that is used as an example in this paper. For this CFG, we have X = { }, entry = 0, exit = 2, B = { 1, 3, 5 }, S 3 = { 4, 5 }, lael(( 3, 4 )) = 1, lael(( 3, 5 )) = WCET analysis with the IPET The IPET method [8] aims at determining the longest execution path and computing an upper ound on the WCET. It considers the control-flow graph of the task under analysis. A set of integer variales stands for the execution counts of the nodes (asic locks) and edges. The Figure 3. Example CFG execution time of the task can e expressed y T = x i t i, i X where x i and t i are the execution count and the worst-case execution time of asic lock i, respectively. Estimating an upper ound on the WCET consists of maximizing this expression under a set of linear constraints. Flow constraints express the control flow structure of the task, i.e., the relations that must exist etween the execution counts of nodes (x i for asic lock i ) and edges (x i d j for the edge from i to j laeled with direction d in some equations, we will use the equivalent notation x i d ). The flow constraints are: i X,x i = x l + init i p P p i i i X,x i = x l + end i s S i s i end i = 1 i X i X init i = 1 Note that end exit = 1 and i X,i exit end i = 0. Similarly, init entry = 1 and i X,i entry init i = 0. From the CFG illustrated in Figure 3, the flow constraints for lock 3 are: x 3 = x x 8 u 3 + init 3 x 3 = x x end 3 3

4 To ound the execution counts, the system needs some additional constraints which we call semantic constraints. They express loop ounds, infeasile paths, etc [?]. For our example CFG, the system should least include the following semantic constraints: x MAX 1 x 0 u 1 x MAX 3 x where MAX n is a loop ound, i.e. the maximum numer of iterations of a loop, that depends on the semantic of the program. An upper ound on the WCET can e otained using Integer Linear Program solvers like lp solve 1. The solution is returned as the set of values of the execution counts that maximize the task execution time and satisfy all constraints Integrated analysis of dynamic ranch prediction The IPET method computes the overall execution time of a program y adding the execution times (costs) of the asics locks weighted y their respective execution count. To take the impact of ranch prediction into account, the expression of the execution time is extended with an additional cost for each lock that is the target of a conditional ranch whenever it is mispredicted. This cost includes all the effects of ranch prediction on the pipeline state. In this paper, we ignore the impact of ranch prediction on cache memories (considered as perfect 2 ). Note that a conditional ranch is always the last instruction of a asic lock. In the rest of the paper, for the sake of simplicity, B is used to denote oth a asic lock and the conditional ranch it ends with (if any). Variales m 0 and m 1 denote the numer of mispredictions for ranch B when it follows direction d {0,1} (0 is for not-taken, 1 for taken) while v d is the related cost. The overall execution-time can e expressed y: T = x i t i + i X j B d {0,1} m j d v j d Besides the addition of ranch misprediction penalties to the ojective function, additional linear constraints should express the ehavior of the ranch predictor. These constraints can e generated in a methodical way through the generic framework we introduce in the next section. They ound the worst-case numer of mispredictions for oth directions of each ranch B. 4. A framework to model dynamic ranch predictor using IPET In this section, we descrie our framework. We show how the ehaviour of any ranch predictor is modeled y the IPET using the gloal approach Perfect cache means that each access to the cache is a hit. When the ehaviour of the ranch predictor at a given point in the program is to e determined, three questions should e answered. The first one is: Which entry of the BHT is indexed to predict the ranch? Formally, the set of all indexes is denoted y Λ = [0,2 n ] where n is the size of the BHT. For each ranch B, Λ is the set of possile indexes of the BHT that may e used to predict this ranch. The index λ Λ may depend on the state π of the gloal history and on the ranch instruction Λ = {λ π Π,λ = f (@,π )} where Π Π is the set of possile values of the gloal history at (Π is the set of all values of the gloal history). Note is usually not the whole ranch instruction address ut only a significant part of it. For a imodal predictor, the index is computed directly from the ranch instruction address: Λ = f (@,π ) = {@}. In the case of gshare, the mapping function f is an exclusive-or: Λ = {λ π Π,λ π }. The set of equations related to this first question is given in Section 4.1. Note that these equations are not derived for predictors that do not use a history register (e.g. imodal). The second question is: Which other ranches may access the same BHT entry? Rememer that the capacity of the BHT is limited and it is not tagged with ranch addresses: some ranches may share the same BHT entry. Such ranches may update the 2-it counter according to their own ehaviour and thus influence the prediction. Equations that express the aliasing to the BHT and the order in which competing ranches access to the shared BHT entry are given in Section 4.2. The third question is: What may e the 2-it counter s state and thus the predicted direction? Some equations are required to express the possile ehaviours of the 2-it counter stored in the indexed BHT entry. They are derived in Section 4.3. Finally, all these equations must e related to uild the whole dynamic ranch prediction model. This is explained in Section 4.4 The framework eing generic, the size of the BHT, the size of the history, and the index function are formally descried BHT index In this Section, we show how the evolution of the gloal history can e modeled y a system of constraints. A gloal history is a n-it register. The gloal history is updated when the ranch issue is known ( 1 for taken, 0 for not-taken). The update consists of a left shift of the n 1 rightmost its and an insertion of the issue of the last executed ranch to the rightmost position. For instance π = 0100 is the new value of the gloal history when π = 1010 was the previous value and the last ranch is solved as not-taken. The update of the history is denoted y π = π.d : the state π is the successor of the state π (left shift and insert d to the rightmost position). The value of the gloal history at a given point p in a program depends on the initial value of the gloal history and on 4

5 the issue of each executed ranch along the path from the entry node to p. We introduce a graph to model the evolution of the gloal history. It depicts the value of the gloal history depending on the initial value and the issue of previous ranches. B, π Π, x π = x π + p d initπ p P π π Π where π = π.d and p d B, π Π, x π = xπ + 0 xπ + 1 endπ end π 1 B B init π 1 In [7] the value of gloal history is computed for each asic lock of the CFG. As this history may only evolve after a conditional ranch, we have enhanced this part of the gloal approach y only taking into account the asic locks that contain a conditional ranch Execution context Figure 4. Example: BHG. The Branch History Graph (BHG) expresses the overall evolution of the gloal history for a program. Each node of the BHG fits with a pair ( i,π) where i is a asic lock of the CFG containing a conditional ranch and π is one possile value of the gloal history at this ranch. The nodes laelled init target the ranches that may e the first ranch executed. For the sake of simplicity, end nodes are not shown on this graph and on the following ones. The BHG can e automatically uilt from the CFG. Figure 4 shows the BHG uilt for the CFG of Figure 3. From the BHG, we derive a set of equations that will e included in the IPET formulation. The value of the gloal history depends on the issue of the last executed ranches: we use p d to denote that p B is the last ranch to e executed efore B and d was the direction of ranch p (used to update the gloal history). One execution of the program under analysis consists of one path in the BHG from a node (,π), where is the first executed ranch and π the initial value of the ranch history, to a node (,π ) where is the last executed ranch and π the last value of the history. Flow constraints are derived for each node of the BHG: they link the executions counts of locks with a given history register value (x π ) to the execution counts of their in- and out-edges (x π ). Furthermore, if is the first executed d ranch and π the initial value of the gloal history then init π = 1. Similarly, endπ = 1 if is the last executed ranch. The set of predecessors of a node (,π) in the BHG is denoted y P π. The BHT is a Λ -entry tale. Each ranch may access Λ 2-it counters during the execution of the program. For a given λ Λ, the prediction of the ranch depends on all ranches sharing this entry. B λ denotes the set of ranches that may access entry λ of the BHT: B λ = { B λ Λ }. The prediction counter at λ is updated at each execution of one ranch accessing it. Then, the prediction depends on the access sequence to the entry λ. In other words, the prediction of a ranch accessing the entry λ depends on the execution sequence of ranches in B λ. To model the sharing, we introduce a new graph. It depicts the possile execution sequences etween ranches in B λ. A Branch Conflict Graph (BCG λ ) can automatically e derived for each BHT entry and contains as many locks as the numer of ranches sharing this entry ( B λ ). Figure 5 shows BCG λ=00 for the example program (B λ=00 = {B 1,B 3,B 5 }). The edge from 5 to 3 states that there is a path in the BHG from ( 5,00) to ( 3,00) and 3 is the first lock with history 00 along this path. Note that the lael on this edge represents the direction of ranch 5: 0 or 1 is used when oth directions may lead to the target lock. Each BCG is descried y IPET: One execution of the program under analysis may consist of at most one path in a BCG from an initial node to a final node. The init edges of a BCG λ target the possile initial nodes: Branches that may e the first executed with the index value λ (init λ = 1 if is the first executed ranch with λ). Similarly, the end edges (not shown) start on the final nodes: Branches that may e the last executed with this history value (end λ =1 if is the last executed ranch with λ). The set of constraints corresponding to BCG λ links the execution order etween ranches sharing the λ-th entry of the BHT. P λ (resp. S λ ) denote the set of predecessors (resp. successors) of in BCG λ. 5

6 B λ, x λ = (x λ + s S λ s 0 xλ ) + s 1 endλ B λ, x λ = (x λ + p P λ p 0 xλ ) + p 1 initλ end λ 1 B λ init λ 1 B λ 4.3. Prediction computation All ranches in B λ may e predicted y the counter in the λ-th entry of the BHT. The prediction is given y the state c of the counter (c C and C = {00,01,10,11}) and the update of the counter is done according to the update of a saturating 2-it counter. We introduce a new graph representing the evolution of the 2-it counter depending on the initial state and the possile transitions. A Prediction Graph (PG λ ) is uilt for each prediction counter (each entry of the BHT) and expresses the possile evolution of this counter. Figure 6 shows PG λ=00 for the example program in case the BHT is indexed y the gloal history. As in the previous steps, this graph is descried y a system of constraints. Note that the set of possile transitions to reach (resp. leave) a state depends on the set of possile successors (resp. predecessors) in the related BCG λ. One execution of the program under analysis consists of at most one path in a PG: from the initial state to the final state. B λ, x λ,00 = (x λ,00 + p P λ p 0 xλ,01 ) + p 0 initλ,00 x λ,00 = (x λ,00 + s S λ s 0 xλ,00) + s 1 endλ,00 x λ,01 = (x λ,00 + p P λ p 1 xλ,10 ) + p 0 initλ,01 x λ,01 = (x λ,01 + s S λ s 0 xλ,01) + s 1 endλ,01 x λ,10 = (x λ,01 + p P λ p 1 xλ,11 ) + p 0 initλ,10 x λ,10 = (x λ,10 + s S λ s 0 xλ,10) + s 1 endλ,10 x λ,11 = (x λ,10 + p P λ p 1 xλ,11 ) + p 1 initλ,11 x λ,11 = (x λ,11 + s S λ s 0 xλ,11) + s 1 endλ,11 B λ c C B λ c C end λ,c 1 init λ,c 1 A misprediction occurs each time a fired transition does not match to the prediction given y its source state, e.g., leaving the state 00 y a transition laeled 1. m λ = (x λ, s S λ s 1 xλ,01) s 1 m λ = (x λ, s S λ s 0 xλ,11) s 0 Figure 5. Example: Execution order etween ranches sharing the entry 00 of the BHT (Branch Conflict Graph) In [7] Li et al. consider a 1-it counter. In this case, the evolution does not need to e modeled y additional constraints ecause the counter state is the issue of the last executed ranch. The numer of mispredictions is given y: m λ = x 0 1 s S λ s 1 m λ = x 1 0 s S λ s 0 In the remainder of this paper, we only focus on 2-it counters which are the most frequently used due to their highest accuracy Model of ranch predictor In this susection we descrie the whole framework. First, we focus on the generation of the graphs: ranch history graph, ranch conflict graphs and prediction graphs. Then, we show how to uild the whole system of constraints: it is ased on the three susets of constraints descried in previous susections and on some additional constraints that are necessary to create a link etween the three susets and the IPET from the CFG. Finally, we summarize how to use the framework: what are the parameters and an example of configuration. Graph generation All the graphs on which the generation of constraints is ased on are generated from the CFG. The BHG is given y a partition of the CFG lock: The BHG contains only the asic locks with two outgoing edges (conditional ranch). Each BCG is a partition of the BHG or of the CFG. BCG λ is uilt from: - all BHG locks (,π) with λ = f (π), in case a gloal history is used, - all CFG locks ending y a conditional ranch which BHT index = λ for the imodal predictor. From each BCG λ, one PG λ is uilt: for each edge in BCG λ the corresponding transitions are generated in the related PG λ. 6

7 Figure 6. Example: 2-it counter evolution for λ = 00 (PG 00, history-indexed BHT) x x π x λ x λ,c CFG BHG BCG λ PG λ Tale 2. Variale used for the numer of occurences of a lock in the different graphs. The whole framework In the rest of this section, we explain how to uild the whole system of constraints and list the additional constraints generated for that purpose. Tale 2 reminds the notation for the numer of occurences of a lock depending on the corresponding graph. A BHG is needed when a history is used to index the BHT. If the histories are local, then a BHG is uilt for each history register 3. In case of a gloal history, one BHG is generated. In a BHG, more than one lock (,π) may refer to the same lock in the CFG. The set of constraints generated to link the susystems related to the CFG and the BHG is: B, x = π Π x π B, d {0,1}, x d = π Π x π d + endπ d B, π Π, end π = endπ 0 + endπ 1 Note that accounting for the last ranch in the BHG does not include the last issue of this ranch. However, in the CFG this last issue is needed to otain the correct numer of occurences of the related edge in the CFG. One BCG λ is generated for each λ Λ. The Λ susets of constraints derived from those BCGs are linked to the whole system of constraints y: 3 Note that in case of pattern history tale (PHT) containing local histories, some ranches may share a PHT entry: A BHG is derived for each entry of the PHT. B, λ Λ x λ = xπ where λ = f (π,@) B, d {0,1}, x d = x λ + s d endλ d λ Λ s S λ B, λ Λ, end λ = endλ 0 + endλ 1 Note that in case of a imodal ranch predictor, x π = x and f (π,@) For each BCG λ one PG λ is generated. B, λ Λ x λ = x λ,c c C B, λ Λ, s S λ, B, λ Λ, B, λ Λ, end λ = c C init λ = c C xλ = x λ,c s d c C s d end λ,c init λ,c B, d {0,1}, m d = λ Λ m λ d Note that if B λ is the first (resp. last) executed with λ then init λ = init λ,c = 1 (resp. end λ = end λ,c = 1). c C c C Configuration of the framework The gloal approach was introduced y Li et al. [7]. The first model was for a 1-it gloal predictor whose BHT is indexed y the ranch history. Their model accounts for interferences due to BHT entries sharing. This model corresponds to the following configuration of our framework: - ranch history: gloal - index of the BHT: λ = f (π,@) = π - size of the prediction counter: 1 it In [4], a model of a ranch predictor ased on 2-it counters was introduced and we used it to uild the models of two 2-it ranch predictors: a imodal predictor (PCased indexing) and a gloal 2-it predictor whose BHT 7

8 is indexed y a gloal ranch history. This model corresponds to the following configuration of our framework: - ranch history: gloal - index of the BHT: λ = f (π,@) = π or λ = f (π,@) - size of the prediction counter: 2 its In this paper, we present the model in a generic way that eases the understanding of the set of constraints. Furthermore, this framework allows to model any ranch predictor which would use the BHT. The configuration of the framework depends on: - size of the BHT - size of the history - ranch history: gloal or local - index of the BHT: λ = f (π,@) - size of the prediction counter: 2 its or 1 it Note that the index may e any index that could e generated from the ranch instruction address and a ranch history. Furthermore, they may e more than one history if a PHT is used. The framework is implemented in OTAWA: a WCET computation tool [1]. For a given ranch predictor, this tool may compute the WCET y integrating the constraints in the whole ILP system of constraints. It may also generate separately the system of constraints for the ranch predictor model that could e added in any other WCET tool. This only requires to use the same variales for the CFG nodes. 5. Experimental results Rememer the models uilt y our framework are to e included in the IPET formulation of WCET computation to take into account the modeled ranch predictor. In this section, our goal is to give an insight into how complex the generated models can e. In other words, we evaluate the gloal approach y comparing some models generated y our framework. With this information the end-user may retain or reject a possile target core with dynamic ranch prediction, depending on the resources (including time) he is ready to allocate to timing analysis. This may e useful when the development processes rely on very short steps for new code generation and analysis etween two test phases. Before this, we report some ounds on the WCET that we evaluated for some enchmark codes. They show that taking dynamic ranch prediction into account when performing WCET analysis is really needed to avoid the overestimation of considering every conditional ranch as mispredicted. Using the framework, we have generated models for three ranch predictors ased on 2-it counters. They differ in how the BHT is indexed. The first one ) is the so-called imodal predictor that selects a BHT entry from the ranch address (direct-mapping scheme: λ = f (π,@) The second one (BP H ) uses the gloal history (uilt from the outcomes of the last executed ranches) as an index to the BHT (λ = f (π,@) = π). The BP H H ficall -27.7% -25.0% -27.7% insertsort -10.9% -10.1% -9.1% jfdctint -4.5% -3.7% -3.5% matmul -8.1% -7.8% -6.9% qurt -11.2% -5.4% -9.8% Tale 4. Impact on the estimated WCET third one H ), known as gshare, computes the index to the BHT y XOR-ing some its of the gloal history and some its of the ranch PC (λ = f (π,@) = For the three of them, we assume a 16-entry Branch History Tale (choosing such a small size was dictated y the small size of our enchmarks). For the two last ones, we consider a 4-it gloal history. Bits 4 to 7 of the ranch instrction address are used to index the BHT when the ranch predictor is imodal or gshare. Our experiments consider a processor with a 2-way superscalar 4-stage pipeline that implements out-of-order execution. We assume that the system includes SRAM memories for instructions and data, with a short (1-cycle) latency. The models for dynamic ranch predictors have een implemented in OTAWA, our WCET computation tool [1]. Implementing a ranch predictor model consists in adding specific constraints to the asic IPET formulation. We consider a set of enchmarks that elong to the SNU-RT collection 4 and are frequently considered for the evaluation of WCET analysis techniques. They are listed in Tale Impact of ranch prediction modeling on the WCET Tale 4 shows how modeling dynamic ranch prediction improves the accuracy of WCET estimates: taking into account the ehavior of the ranch predictor instead of considering each ranch as mispredicted. Results are provided for each of the three predictors considered in this paper. For most of the enchmarks, taking dynamic ranch prediction into account improves the estimated WCET y 5% to 10%, and the improvement even reaches more than 25% for the ficall program. The weakest results are for the jfdctint program that contains only three ranches executed in sequence. These results show how important it is to model dynamic ranch prediction when the WCET accuracy is critical. However, modeling ranch prediction has a cost in terms of WCET analysis time. This will e shown elow and should e considered when selecting a processor with dynamic ranch prediction Modeling complexity of dynamic ranch predictors In an earlier paper [4], we argued that the ILP modeling complexity of a particular scheme may e quantified y the numer of constraints (C), numer of variales (V ) 4 8

9 Benchmark # s # chs Function ficall 7 1 Summing the Fionacci series insertsort 8 2 Insertion sort for 10 integer numers jfdctint 13 3 JPEG slow-ut-accurate integer implementation of the forward DCT (Discrete Cosine Transform) matmul 19 5 Matrix multiplication qurt Root computation of quadratic equations Tale 3. Benchmarks Benchmark # cons. (C) # vars (V ) arity (A) ficall insertsort jfdctint matmul qurt BP H H ficall insertsort jfdctint matmul qurt Tale 5. Initial complexity of the ILP formulation Tale 7. Aliasing to the BHT (# ranches sharing an entry) and the arity (A) which is defined as the maximum numer of variales per constraint in the system. In [4], on a numer of enchmarks, we oserved that the time to solve the ILP prolem (i.e. to determine an upper ound of the WCET considering dynamic ranch prediction) was closely related to these three values. The modeling complexity helps in getting information aout the complexity of the models independently of the solver that may e used. In [4], we provided formulas that express the three complexity values as a function of the program characteristics (e.g. numer of conditional ranches, numer of possile history values for each ranch, etc.). Further details, including the modeling complexity equations of the three ranch predictors compared in this section, are given in a technical report [10] which forms an extended version of this paper. In the following, we use (C,V,A) triplet to evaluate the complexity of the model for a given dynamic ranch predictor. Tale 6 shows the the modeling complexity of three dynamic ranch predictors. For all the enchmarks, the imodal predictor ) is the one that exhiits the lowest complexity y far. This is due to the asence of constraints that model the changes on the gloal history (since this predictor does not use it). On the contrary, the history indexed predictor (BP H ) has the largest complexity due to the fact that several ranches may share the same BHT entry. This occurs less often with the gshare H ) predictor that enhances the distriution of ranches over the tale entries. As a result, the complexity of H is generally lower than that of BP H. An exception is for ficall that has a very small numer of ranches. Tale 7 shows the mean numer of ranches that share a BHT entry, which quantifies the so-called phenomenon of aliasing. The imodal ) and gshare H ) predictors generally exhiit less conflicts than the pure gloal history-ased predictor (BP H ). It appears that the program # constraints (C) # variales (V ) arity (A) ficall insertsort jfdctint matmul qurt Tale 8. Impact of the history length on the complexity (4-it vs 2-it) with the highest rate of conflicts (qurt) is the one that exhiit the highest complexity y far. Our measurements also show that this program is the one that necessitates the longest time to solve the ILP prolem ( 200 ms while the other programs needed 0.2 ms for the imodal predictor). Tale 8 gives the modeling complexity of a gloal ranch predictor (BP H ) when considering two sizes for the prediction history register: 2 its and 4 its. Whatever the quantifier (C, V or A), the complexity significantly increases with the history register length. The results presented aove highlight the factors that influence the modeling complexity: the indexing scheme to the BHT, the length of the history register, the average numer of ranches sharing an entry in the BHT. Indexing the BHT with the gloal history register (BP H ) generates a high complexity, and this is expanded y a larger history register. Using an XOR operator H ) to comine the history and the ranch PC reduces the aliasing phenomenon and clearly reduces the complexity. We claim that these results should e kept in mind when selecting a processor with dynamic ranch prediction, at least when the computational complexity of WCET analysis is an issue. 9

10 # constraints (C) # variales (V ) arity (A) H BP H H BP H H BP H ficall insertsort jfdctint matmul qurt average Tale 6. Modeling complexity of 2-it dynamic ranch predictors 6. Conclusion In hard real-time systems, the execution of programs must meet deadlines. An upper ound on the worst-case execution time is computed to check whether timing constraints can e guaranteed. Static WCET analysis requires a detailed modeling of the execution of asic locks in the processor, which has long een restricted to very simple cores. However, performance requirements now lead to use more complex processors that implement sophisticated schemes, such as dynamic ranch predictors which we focus on in this paper. Developing a model for a dynamic ranch predictor is a huge work. Several solutions have reported in the literature as mentioned in Section 1 ut each of them is applicale to a given scheme (i.e. with 1-it counters and local history registers) and cannot easily e extended to other ones. In this paper, we introduced a framework that accounts for the most common functionalities of dynamic ranch predictors and makes it possile to generate a model of a new predictor in an automatic way. We descried each step of the framework in sufficient details so that the reader can easily derive its own models. In the evaluation section, we use our framework to generate models for three kinds of dynamic ranch predictors ased on 2-it counters (y far the most frequently used) and, for some of them, on a gloal history register, which exacerates the interactions etween ranches. We show that modeling the ranch predictor instead of considering each ranch as mispredicted leads to tighter WCET estimates. However, our results also show that the complexity of the models is high, especially for the most sophisticated predictors. It generally overcomes the complexity of the integer linear program used to compute the WCET. Since taking ranch prediction into account when determining WCETs may not e affordale, our recommendation is to take this study into consideration when selecting a processor to run time-critical software. As future work, we plan to extend this framework to complementary solutions that focus on the analysis of the Branch Target Buffer, such as [5, 6]. References Software Technologies for Emedded and Uiquitous Systems, volume 6399 of Lecture Notes in Computer Science, pages Springer Berlin / Heidelerg, [2] I. Bate and R. Reutemann. Efficient integration of imodal prediction and pipeline analysis. In 11 th international conference on emedded and Real-Time Computing Systems and Applications (RTCSA 05), pages 39 44, august [3] C. Burguière and C. Rochange. A Case for Static Branch Prediction Modeling in Real-Time Systems. In Proceedings of the 11 th IEEE International Conference on Emedded and Real-Time Computing Systems and Applications(RTCSA 05), pages 33 38, august [4] C. Burguière and C. Rochange. On the Complexity of Modeling Dynamic Branch Predictors when Computing Worst-Case Execution Time. In Proceedings of the ERCIM/DECOS Workshop On Dependale Emedded Systems, august [5] A. Colin and I. Puaut. Worst-case execution time analysis for a processors with ranch prediction. Journal on Real- Time Systems, 18(2-3): , may [6] D. Grund, J. Reineke, and G. Gehard. Branch target uffers: WCET analysis framework and timing predictaility. Journal of Systems Architecture, [7] X. Li, T. Mitra, and A. Roychoudhury. Modeling Control Speculation for Timing Analysis. Journal on Real-Time Systems, 29(1):27 58, january [8] Y.-T. Li and S. Malik. Performance analysis of emedded software using implicit path enumeration. In Proceedings of the ACM/SIGPLAN Workshop on Languages, Compilers and Tools for Emedded Systems (LCTES 95), volume 30, pages 88 98, [9] T. Lundqvist and P. Stenström. A method to improve the estimated worst-case performance of data caching. In 6 th International Conference on Real-Time Computing Systems and Applications (RTCSA 99), decemer [10] C. Maïza and C. Rochange. A framework for the timing analysis of dynamic ranch predictors. Technical Report IRIT/RR FR, IRIT, [11] J. Smith. A Study of Branch Prediction Strategies. In Proceedings of the 8 th ACM/IEEE International Symposium on Computer Architecture (ISCA 1981), pages , [12] T.-Y. Yeh and Y. N. Patt. Alternative Implementation of the Two-Level Adaptative Branch Prediction. In Proceedings of the 19 th ACM/IEEE International Symposium on Computer Architecture (ISCA 1992), pages , [1] C. Ballariga, H. Cassé, C. Rochange, and P. Sainrat. OTAWA: An Open Toolox for Adaptive WCET Analysis. In S. Min, R. Pettit, P. Puschner, and T. Ungerer, editors, 10

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 02, 03 May 2016 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 53 Most Essential Assumptions for Real-Time Systems Upper

More information

Portland State University ECE 587/687. Branch Prediction

Portland State University ECE 587/687. Branch Prediction Portland State University ECE 587/687 Branch Prediction Copyright by Alaa Alameldeen and Haitham Akkary 2015 Branch Penalty Example: Comparing perfect branch prediction to 90%, 95%, 99% prediction accuracy,

More information

Worst-Case Execution Time Analysis. LS 12, TU Dortmund

Worst-Case Execution Time Analysis. LS 12, TU Dortmund Worst-Case Execution Time Analysis Prof. Dr. Jian-Jia Chen LS 12, TU Dortmund 09/10, Jan., 2018 Prof. Dr. Jian-Jia Chen (LS 12, TU Dortmund) 1 / 43 Most Essential Assumptions for Real-Time Systems Upper

More information

Preemption Delay Analysis for Floating Non-Preemptive Region Scheduling

Preemption Delay Analysis for Floating Non-Preemptive Region Scheduling Preemption Delay Analysis for Floating Non-Preemptive Region Scheduling José Marinho, Vincent Nélis, Stefan M. Petters, Isaelle Puaut To cite this version: José Marinho, Vincent Nélis, Stefan M. Petters,

More information

Timing analysis and predictability of architectures

Timing analysis and predictability of architectures Timing analysis and predictability of architectures Cache analysis Claire Maiza Verimag/INP 01/12/2010 Claire Maiza Synchron 2010 01/12/2010 1 / 18 Timing Analysis Frequency Analysis-guaranteed timing

More information

Branch Prediction using Advanced Neural Methods

Branch Prediction using Advanced Neural Methods Branch Prediction using Advanced Neural Methods Sunghoon Kim Department of Mechanical Engineering University of California, Berkeley shkim@newton.berkeley.edu Abstract Among the hardware techniques, two-level

More information

Unit 6: Branch Prediction

Unit 6: Branch Prediction CIS 501: Computer Architecture Unit 6: Branch Prediction Slides developed by Joe Devie/, Milo Mar4n & Amir Roth at Upenn with sources that included University of Wisconsin slides by Mark Hill, Guri Sohi,

More information

SUFFIX TREE. SYNONYMS Compact suffix trie

SUFFIX TREE. SYNONYMS Compact suffix trie SUFFIX TREE Maxime Crochemore King s College London and Université Paris-Est, http://www.dcs.kcl.ac.uk/staff/mac/ Thierry Lecroq Université de Rouen, http://monge.univ-mlv.fr/~lecroq SYNONYMS Compact suffix

More information

ICS 233 Computer Architecture & Assembly Language

ICS 233 Computer Architecture & Assembly Language ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by

More information

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT CSE 560 Practice Problem Set 4 Solution 1. In this question, you will examine several different schemes for branch prediction, using the following code sequence for a simple load store ISA with no branch

More information

Pattern History Table. Global History Register. Pattern History Table. Branch History Pattern Pattern History Bits

Pattern History Table. Global History Register. Pattern History Table. Branch History Pattern Pattern History Bits An Enhanced Two-Level Adaptive Multiple Branch Prediction for Superscalar Processors Jong-bok Lee, Soo-Mook Moon and Wonyong Sung fjblee@mpeg,smoon@altair,wysung@dspg.snu.ac.kr School of Electrical Engineering,

More information

CPSC 3300 Spring 2017 Exam 2

CPSC 3300 Spring 2017 Exam 2 CPSC 3300 Spring 2017 Exam 2 Name: 1. Matching. Write the correct term from the list into each blank. (2 pts. each) structural hazard EPIC forwarding precise exception hardwired load-use data hazard VLIW

More information

Program Analysis. Lecture 5. Rayna Dimitrova WS 2016/2017

Program Analysis. Lecture 5. Rayna Dimitrova WS 2016/2017 Program Analysis Lecture 5 Rayna Dimitrova WS 2016/2017 2/21 Recap: Constant propagation analysis Goal: For each program point, determine whether a variale has a constant value whenever an execution reaches

More information

Caches in WCET Analysis

Caches in WCET Analysis Caches in WCET Analysis Jan Reineke Department of Computer Science Saarland University Saarbrücken, Germany ARTIST Summer School in Europe 2009 Autrans, France September 7-11, 2009 Jan Reineke Caches in

More information

Fall 2011 Prof. Hyesoon Kim

Fall 2011 Prof. Hyesoon Kim Fall 2011 Prof. Hyesoon Kim Add: 2 cycles FE_stage add r1, r2, r3 FE L ID L EX L MEM L WB L add add sub r4, r1, r3 sub sub add add mul r5, r2, r3 mul sub sub add add mul sub sub add add mul sub sub add

More information

Accurate Estimation of Cache-Related Preemption Delay

Accurate Estimation of Cache-Related Preemption Delay Accurate Estimation of Cache-Related Preemption Delay Hemendra Singh Negi Tulika Mitra Abhik Roychoudhury School of Computing National University of Singapore Republic of Singapore 117543. [hemendra,tulika,abhik]@comp.nus.edu.sg

More information

Chaos and Dynamical Systems

Chaos and Dynamical Systems Chaos and Dynamical Systems y Megan Richards Astract: In this paper, we will discuss the notion of chaos. We will start y introducing certain mathematical concepts needed in the understanding of chaos,

More information

/ : Computer Architecture and Design

/ : Computer Architecture and Design 16.482 / 16.561: Computer Architecture and Design Summer 2015 Homework #5 Solution 1. Dynamic scheduling (30 points) Given the loop below: DADDI R3, R0, #4 outer: DADDI R2, R1, #32 inner: L.D F0, 0(R1)

More information

Special Nodes for Interface

Special Nodes for Interface fi fi Special Nodes for Interface SW on processors Chip-level HW Board-level HW fi fi C code VHDL VHDL code retargetable compilation high-level synthesis SW costs HW costs partitioning (solve ILP) cluster

More information

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables

Minimizing a convex separable exponential function subject to linear equality constraint and bounded variables Minimizing a convex separale exponential function suect to linear equality constraint and ounded variales Stefan M. Stefanov Department of Mathematics Neofit Rilski South-Western University 2700 Blagoevgrad

More information

Robot Position from Wheel Odometry

Robot Position from Wheel Odometry Root Position from Wheel Odometry Christopher Marshall 26 Fe 2008 Astract This document develops equations of motion for root position as a function of the distance traveled y each wheel as a function

More information

Lecture 6 January 15, 2014

Lecture 6 January 15, 2014 Advanced Graph Algorithms Jan-Apr 2014 Lecture 6 January 15, 2014 Lecturer: Saket Sourah Scrie: Prafullkumar P Tale 1 Overview In the last lecture we defined simple tree decomposition and stated that for

More information

Timing analysis and timing predictability

Timing analysis and timing predictability Timing analysis and timing predictability Caches in WCET Analysis Reinhard Wilhelm 1 Jan Reineke 2 1 Saarland University, Saarbrücken, Germany 2 University of California, Berkeley, USA ArtistDesign Summer

More information

Structuring Unreliable Radio Networks

Structuring Unreliable Radio Networks Structuring Unreliale Radio Networks Keren Censor-Hillel Seth Gilert Faian Kuhn Nancy Lynch Calvin Newport March 29, 2011 Astract In this paper we study the prolem of uilding a connected dominating set

More information

Design and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources

Design and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources Design and Analysis of Time-Critical Systems Response-time Analysis with a Focus on Shared Resources Jan Reineke @ saarland university ACACES Summer School 2017 Fiuggi, Italy computer science Fixed-Priority

More information

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES

#A50 INTEGERS 14 (2014) ON RATS SEQUENCES IN GENERAL BASES #A50 INTEGERS 14 (014) ON RATS SEQUENCES IN GENERAL BASES Johann Thiel Dept. of Mathematics, New York City College of Technology, Brooklyn, New York jthiel@citytech.cuny.edu Received: 6/11/13, Revised:

More information

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model

Luis Manuel Santana Gallego 100 Investigation and simulation of the clock skew in modern integrated circuits. Clock Skew Model Luis Manuel Santana Gallego 100 Appendix 3 Clock Skew Model Xiaohong Jiang and Susumu Horiguchi [JIA-01] 1. Introduction The evolution of VLSI chips toward larger die sizes and faster clock speeds makes

More information

Beyond Loose LP-relaxations: Optimizing MRFs by Repairing Cycles

Beyond Loose LP-relaxations: Optimizing MRFs by Repairing Cycles Beyond Loose LP-relaxations: Optimizing MRFs y Repairing Cycles Nikos Komodakis 1 and Nikos Paragios 2 1 University of Crete, komod@csd.uoc.gr 2 Ecole Centrale de Paris, nikos.paragios@ecp.fr Astract.

More information

Embedded Systems 23 BF - ES

Embedded Systems 23 BF - ES Embedded Systems 23-1 - Measurement vs. Analysis REVIEW Probability Best Case Execution Time Unsafe: Execution Time Measurement Worst Case Execution Time Upper bound Execution Time typically huge variations

More information

A Novel Meta Predictor Design for Hybrid Branch Prediction

A Novel Meta Predictor Design for Hybrid Branch Prediction A Novel Meta Predictor Design for Hybrid Branch Prediction YOUNG JUNG AHN, DAE YON HWANG, YONG SUK LEE, JIN-YOUNG CHOI AND GYUNGHO LEE The Dept. of Computer Science & Engineering Korea University Anam-dong

More information

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I.

Department of Electrical and Computer Engineering University of Wisconsin - Madison. ECE/CS 752 Advanced Computer Architecture I. Last (family) name: Solution First (given) name: Student I.D. #: Department of Electrical and Computer Engineering University of Wisconsin - Madison ECE/CS 752 Advanced Computer Architecture I Midterm

More information

Upper Bounds for Stern s Diatomic Sequence and Related Sequences

Upper Bounds for Stern s Diatomic Sequence and Related Sequences Upper Bounds for Stern s Diatomic Sequence and Related Sequences Colin Defant Department of Mathematics University of Florida, U.S.A. cdefant@ufl.edu Sumitted: Jun 18, 01; Accepted: Oct, 016; Pulished:

More information

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line

The Mean Version One way to write the One True Regression Line is: Equation 1 - The One True Line Chapter 27: Inferences for Regression And so, there is one more thing which might vary one more thing aout which we might want to make some inference: the slope of the least squares regression line. The

More information

Modifying Shor s algorithm to compute short discrete logarithms

Modifying Shor s algorithm to compute short discrete logarithms Modifying Shor s algorithm to compute short discrete logarithms Martin Ekerå Decemer 7, 06 Astract We revisit Shor s algorithm for computing discrete logarithms in F p on a quantum computer and modify

More information

CSCI-564 Advanced Computer Architecture

CSCI-564 Advanced Computer Architecture CSCI-564 Advanced Computer Architecture Lecture 8: Handling Exceptions and Interrupts / Superscalar Bo Wu Colorado School of Mines Branch Delay Slots (expose control hazard to software) Change the ISA

More information

1Number ONLINE PAGE PROOFS. systems: real and complex. 1.1 Kick off with CAS

1Number ONLINE PAGE PROOFS. systems: real and complex. 1.1 Kick off with CAS 1Numer systems: real and complex 1.1 Kick off with CAS 1. Review of set notation 1.3 Properties of surds 1. The set of complex numers 1.5 Multiplication and division of complex numers 1.6 Representing

More information

Energy-Efficient Scheduling of Real-Time Tasks on Cluster-Based Multicores

Energy-Efficient Scheduling of Real-Time Tasks on Cluster-Based Multicores Energy-Efficient Scheduling of Real-Time Tasks on Cluster-Based Multicores Fanxin Kong Wang Yi, Qingxu Deng Northeastern University, China Uppsala University, Sweden kongfx@ise.neu.edu.cn, yi@it.uu.se,

More information

A Detailed Study on Phase Predictors

A Detailed Study on Phase Predictors A Detailed Study on Phase Predictors Frederik Vandeputte, Lieven Eeckhout, and Koen De Bosschere Ghent University, Electronics and Information Systems Department Sint-Pietersnieuwstraat 41, B-9000 Gent,

More information

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide)

Issue = Select + Wakeup. Out-of-order Pipeline. Issue. Issue = Select + Wakeup. OOO execution (2-wide) OOO execution (2-wide) Out-of-order Pipeline Buffer of instructions Issue = Select + Wakeup Select N oldest, read instructions N=, xor N=, xor and sub Note: ma have execution resource constraints: i.e., load/store/fp Fetch Decode

More information

Fast inverse for big numbers: Picarte s iteration

Fast inverse for big numbers: Picarte s iteration Fast inverse for ig numers: Picarte s iteration Claudio Gutierrez and Mauricio Monsalve Computer Science Department, Universidad de Chile cgutierr,mnmonsal@dcc.uchile.cl Astract. This paper presents an

More information

Divide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai

Divide-and-Conquer. Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, CSE 6331 Algorithms Steve Lai Divide-and-Conquer Reading: CLRS Sections 2.3, 4.1, 4.2, 4.3, 28.2, 33.4. CSE 6331 Algorithms Steve Lai Divide and Conquer Given an instance x of a prolem, the method works as follows: divide-and-conquer

More information

Shannon-Fano-Elias coding

Shannon-Fano-Elias coding Shannon-Fano-Elias coding Suppose that we have a memoryless source X t taking values in the alphabet {1, 2,..., L}. Suppose that the probabilities for all symbols are strictly positive: p(i) > 0, i. The

More information

Branching Bisimilarity with Explicit Divergence

Branching Bisimilarity with Explicit Divergence Branching Bisimilarity with Explicit Divergence Ro van Glaeek National ICT Australia, Sydney, Australia School of Computer Science and Engineering, University of New South Wales, Sydney, Australia Bas

More information

Polynomial Degree and Finite Differences

Polynomial Degree and Finite Differences CONDENSED LESSON 7.1 Polynomial Degree and Finite Differences In this lesson, you Learn the terminology associated with polynomials Use the finite differences method to determine the degree of a polynomial

More information

A Definition and Classification of Timing Anomalies

A Definition and Classification of Timing Anomalies Definition and Classification of Timing nomalies Jan Reineke 1, Björn Wachter 1, Stephan Thesing 1, Reinhard Wilhelm 1, Ilia Polian 2, Jochen Eisinger 2, and Bernd Becker 2 1 Saarland University Im Stadtwald

More information

Evaluation and Validation

Evaluation and Validation Evaluation and Validation Jian-Jia Chen (Slides are based on Peter Marwedel) TU Dortmund, Informatik 12 Germany Springer, 2010 2016 年 01 月 05 日 These slides use Microsoft clip arts. Microsoft copyright

More information

Predicting globally-coherent temporal structures from texts via endpoint inference and graph decomposition

Predicting globally-coherent temporal structures from texts via endpoint inference and graph decomposition Predicting gloally-coherent temporal structures from texts via endpoint inference and graph decomposition Pascal Denis, Philippe Muller To cite this version: Pascal Denis, Philippe Muller. Predicting gloally-coherent

More information

Simple Examples. Let s look at a few simple examples of OI analysis.

Simple Examples. Let s look at a few simple examples of OI analysis. Simple Examples Let s look at a few simple examples of OI analysis. Example 1: Consider a scalar prolem. We have one oservation y which is located at the analysis point. We also have a ackground estimate

More information

Weak Keys of the Full MISTY1 Block Cipher for Related-Key Cryptanalysis

Weak Keys of the Full MISTY1 Block Cipher for Related-Key Cryptanalysis Weak eys of the Full MISTY1 Block Cipher for Related-ey Cryptanalysis Jiqiang Lu 1, Wun-She Yap 1,2, and Yongzhuang Wei 3,4 1 Institute for Infocomm Research, Agency for Science, Technology and Research

More information

IN this paper we study a discrete optimization problem. Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach

IN this paper we study a discrete optimization problem. Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach Constrained Shortest Link-Disjoint Paths Selection: A Network Programming Based Approach Ying Xiao, Student Memer, IEEE, Krishnaiyan Thulasiraman, Fellow, IEEE, and Guoliang Xue, Senior Memer, IEEE Astract

More information

This is a repository copy of Attributed Graph Transformation via Rule Schemata : Church-Rosser Theorem.

This is a repository copy of Attributed Graph Transformation via Rule Schemata : Church-Rosser Theorem. This is a repository copy of Attriuted Graph Transformation via Rule Schemata : Church-Rosser Theorem. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/9/ Version: Accepted

More information

Fall 2008 CSE Qualifying Exam. September 13, 2008

Fall 2008 CSE Qualifying Exam. September 13, 2008 Fall 2008 CSE Qualifying Exam September 13, 2008 1 Architecture 1. (Quan, Fall 2008) Your company has just bought a new dual Pentium processor, and you have been tasked with optimizing your software for

More information

Regression-based Statistical Bounds on Software Execution Time

Regression-based Statistical Bounds on Software Execution Time Regression-based Statistical Bounds on Software Execution Time Peter Poplavko, Lefteris Angelis, Ayoub Nouri, Alexandros Zerzelidis, Saddek Bensalem, Panagiotis Verimag Research Report n o TR-2016-7 November

More information

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam Symmetry and Quantum Information Feruary 6, 018 Representation theory of S(), density operators, purification Lecture 7 Michael Walter, niversity of Amsterdam Last week, we learned the asic concepts of

More information

ENEE350 Lecture Notes-Weeks 14 and 15

ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining & Amdahl s Law ENEE350 Lecture Notes-Weeks 14 and 15 Pipelining is a method of processing in which a problem is divided into a number of sub problems and solved and the solu8ons of the sub problems

More information

Merging and splitting endowments in object assignment problems

Merging and splitting endowments in object assignment problems Merging and splitting endowments in oject assignment prolems Nanyang Bu, Siwei Chen, and William Thomson April 26, 2012 1 Introduction We consider a group of agents, each endowed with a set of indivisile

More information

Transposition Mechanism for Sparse Matrices on Vector Processors

Transposition Mechanism for Sparse Matrices on Vector Processors Transposition Mechanism for Sparse Matrices on Vector Processors Pyrrhos Stathis Stamatis Vassiliadis Sorin Cotofana Electrical Engineering Department, Delft University of Technology, Delft, The Netherlands

More information

Performance, Power & Energy

Performance, Power & Energy Recall: Goal of this class Performance, Power & Energy ELE8106/ELE6102 Performance Reconfiguration Power/ Energy Spring 2010 Hayden Kwok-Hay So H. So, Sp10 Lecture 3 - ELE8106/6102 2 What is good performance?

More information

QUALITY CONTROL OF WINDS FROM METEOSAT 8 AT METEO FRANCE : SOME RESULTS

QUALITY CONTROL OF WINDS FROM METEOSAT 8 AT METEO FRANCE : SOME RESULTS QUALITY CONTROL OF WINDS FROM METEOSAT 8 AT METEO FRANCE : SOME RESULTS Christophe Payan Météo France, Centre National de Recherches Météorologiques, Toulouse, France Astract The quality of a 30-days sample

More information

Automata, Logic and Games: Theory and Application

Automata, Logic and Games: Theory and Application Automata, Logic and Games: Theory and Application 2 Parity Games, Tree Automata, and S2S Luke Ong University of Oxford TACL Summer School University of Salerno, 14-19 June 2015 Luke Ong S2S 14-19 June

More information

The Incomplete Perfect Phylogeny Haplotype Problem

The Incomplete Perfect Phylogeny Haplotype Problem The Incomplete Perfect Phylogeny Haplotype Prolem Gad Kimmel 1 and Ron Shamir 1 School of Computer Science, Tel-Aviv University, Tel-Aviv 69978, Israel. Email: kgad@tau.ac.il, rshamir@tau.ac.il. Astract.

More information

The WHILE Hierarchy of Program Schemes is Infinite

The WHILE Hierarchy of Program Schemes is Infinite The WHILE Hierarchy of Program Schemes is Infinite Can Adam Alayrak and Thomas Noll RWTH Aachen Ahornstr. 55, 52056 Aachen, Germany alayrak@informatik.rwth-aachen.de and noll@informatik.rwth-aachen.de

More information

FinQuiz Notes

FinQuiz Notes Reading 9 A time series is any series of data that varies over time e.g. the quarterly sales for a company during the past five years or daily returns of a security. When assumptions of the regression

More information

Lecture 12: Grover s Algorithm

Lecture 12: Grover s Algorithm CPSC 519/619: Quantum Computation John Watrous, University of Calgary Lecture 12: Grover s Algorithm March 7, 2006 We have completed our study of Shor s factoring algorithm. The asic technique ehind Shor

More information

Section 8.5. z(t) = be ix(t). (8.5.1) Figure A pendulum. ż = ibẋe ix (8.5.2) (8.5.3) = ( bẋ 2 cos(x) bẍ sin(x)) + i( bẋ 2 sin(x) + bẍ cos(x)).

Section 8.5. z(t) = be ix(t). (8.5.1) Figure A pendulum. ż = ibẋe ix (8.5.2) (8.5.3) = ( bẋ 2 cos(x) bẍ sin(x)) + i( bẋ 2 sin(x) + bẍ cos(x)). Difference Equations to Differential Equations Section 8.5 Applications: Pendulums Mass-Spring Systems In this section we will investigate two applications of our work in Section 8.4. First, we will consider

More information

Depth versus Breadth in Convolutional Polar Codes

Depth versus Breadth in Convolutional Polar Codes Depth versus Breadth in Convolutional Polar Codes Maxime Tremlay, Benjamin Bourassa and David Poulin,2 Département de physique & Institut quantique, Université de Sherrooke, Sherrooke, Quéec, Canada JK

More information

Ahmed Nazeem and Spyros Reveliotis

Ahmed Nazeem and Spyros Reveliotis Designing compact and maximally permissive deadlock avoidance policies for complex resource allocation systems through classification theory: the non-linear case Ahmed Nazeem and Spyros Reveliotis Astract

More information

CS 4120 Lecture 3 Automating lexical analysis 29 August 2011 Lecturer: Andrew Myers. 1 DFAs

CS 4120 Lecture 3 Automating lexical analysis 29 August 2011 Lecturer: Andrew Myers. 1 DFAs CS 42 Lecture 3 Automating lexical analysis 29 August 2 Lecturer: Andrew Myers A lexer generator converts a lexical specification consisting of a list of regular expressions and corresponding actions into

More information

CONNECTOR ALGEBRAS FOR C/E AND P/T NETS INTERACTIONS

CONNECTOR ALGEBRAS FOR C/E AND P/T NETS INTERACTIONS Logical Methods in Computer Science Vol. 9(3:6)203, pp. 65 www.lmcs-online.org Sumitted Apr. 5, 202 Pulished Sep. 7, 203 CONNECTOR ALGEBRAS FOR C/E AND P/T NETS INTERACTIONS ROBERTO BRUNI a, HERNÁN MELGRATTI,

More information

A Multi-Periodic Synchronous Data-Flow Language

A Multi-Periodic Synchronous Data-Flow Language Julien Forget 1 Frédéric Boniol 1 David Lesens 2 Claire Pagetti 1 firstname.lastname@onera.fr 1 ONERA - Toulouse, FRANCE 2 EADS Astrium Space Transportation - Les Mureaux, FRANCE November 19, 2008 1 /

More information

Online Supplementary Appendix B

Online Supplementary Appendix B Online Supplementary Appendix B Uniqueness of the Solution of Lemma and the Properties of λ ( K) We prove the uniqueness y the following steps: () (A8) uniquely determines q as a function of λ () (A) uniquely

More information

Model-Based Design of Game AI

Model-Based Design of Game AI Model-Based Design of Game AI Alexandre Denault, Jörg Kienzle and Hans Vangheluwe School of Computer Science, McGill University Montréal, Canada, H3A 2A7 email: {adenau,joerg,hv}@cs.mcgill.ca KEYORDS Modern

More information

Pipeline no Prediction. Branch Delay Slots A. From before branch B. From branch target C. From fall through. Branch Prediction

Pipeline no Prediction. Branch Delay Slots A. From before branch B. From branch target C. From fall through. Branch Prediction Pipeline no Prediction Branching completes in 2 cycles We know the target address after the second stage? PC fetch Instruction Memory Decode Check the condition Calculate the branch target address PC+4

More information

Copyrighted by Gabriel Tang B.Ed., B.Sc. Page 111.

Copyrighted by Gabriel Tang B.Ed., B.Sc. Page 111. Algera Chapter : Polnomial and Rational Functions Chapter : Polnomial and Rational Functions - Polnomial Functions and Their Graphs Polnomial Functions: - a function that consists of a polnomial epression

More information

The Capacity Region of 2-Receiver Multiple-Input Broadcast Packet Erasure Channels with Channel Output Feedback

The Capacity Region of 2-Receiver Multiple-Input Broadcast Packet Erasure Channels with Channel Output Feedback IEEE TRANSACTIONS ON INFORMATION THEORY, ONLINE PREPRINT 2014 1 The Capacity Region of 2-Receiver Multiple-Input Broadcast Packet Erasure Channels with Channel Output Feedack Chih-Chun Wang, Memer, IEEE,

More information

3.5 Solving Quadratic Equations by the

3.5 Solving Quadratic Equations by the www.ck1.org Chapter 3. Quadratic Equations and Quadratic Functions 3.5 Solving Quadratic Equations y the Quadratic Formula Learning ojectives Solve quadratic equations using the quadratic formula. Identify

More information

Distributed Timed Automata with Independently Evolving Clocks

Distributed Timed Automata with Independently Evolving Clocks Distriuted Timed Automata with Independently Evolving Clocks S. Akshay,3, Benedikt Bollig, Paul Gastin, Madhavan Mukund 2, and K. Narayan Kumar 2 LSV, ENS Cachan, CNRS, France {akshay,ollig,gastin}@lsv.ens-cachan.fr

More information

CMP N 301 Computer Architecture. Appendix C

CMP N 301 Computer Architecture. Appendix C CMP N 301 Computer Architecture Appendix C Outline Introduction Pipelining Hazards Pipelining Implementation Exception Handling Advanced Issues (Dynamic Scheduling, Out of order Issue, Superscalar, etc)

More information

arxiv: v1 [cs.os] 6 Jun 2013

arxiv: v1 [cs.os] 6 Jun 2013 Partitioned scheduling of multimode multiprocessor real-time systems with temporal isolation Joël Goossens Pascal Richard arxiv:1306.1316v1 [cs.os] 6 Jun 2013 Abstract We consider the partitioned scheduling

More information

An Efficient Parallel Algorithm to Solve Block Toeplitz Systems

An Efficient Parallel Algorithm to Solve Block Toeplitz Systems The Journal of Supercomputing, 32, 251 278, 2005 C 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands. An Efficient Parallel Algorithm to Solve Block Toeplitz Systems P. ALONSO

More information

Vertical Implementation

Vertical Implementation Information and Computation 70, 95 33 (00) doi:0.006/inco.00.967, availale online at http://www.idealirary.com on Vertical Implementation Arend Rensink Faculty of Informatics, University of Twente, Postus

More information

Topological structures and phases. in U(1) gauge theory. Abstract. We show that topological properties of minimal Dirac sheets as well as of

Topological structures and phases. in U(1) gauge theory. Abstract. We show that topological properties of minimal Dirac sheets as well as of BUHEP-94-35 Decemer 1994 Topological structures and phases in U(1) gauge theory Werner Kerler a, Claudio Rei and Andreas Weer a a Fachereich Physik, Universitat Marurg, D-35032 Marurg, Germany Department

More information

H-infinity Optimal Distributed Control in Discrete Time

H-infinity Optimal Distributed Control in Discrete Time H-infinity Optimal Distriuted Control in Discrete Time Lidström, Carolina; Pates, Richard; Rantzer, Anders Pulished in: 2017 IEEE 56th Conference on Decision and Control, CDC 2017 DOI: 10.1109/CDC.2017.8264176

More information

Real-Time Issues in Live Migration of Virtual Machines

Real-Time Issues in Live Migration of Virtual Machines Real-Time Issues in Live Migration of Virtual Machines Faio Checconi 1, Tommaso Cucinotta 1, and Manuel Stein 2 1 Scuola Superiore Sant Anna, Pisa, Italy 2 Alcatel-Lucent Bell Las, Stuttgart, Germany Astract.

More information

Design of Distributed Systems Melinda Tóth, Zoltán Horváth

Design of Distributed Systems Melinda Tóth, Zoltán Horváth Design of Distributed Systems Melinda Tóth, Zoltán Horváth Design of Distributed Systems Melinda Tóth, Zoltán Horváth Publication date 2014 Copyright 2014 Melinda Tóth, Zoltán Horváth Supported by TÁMOP-412A/1-11/1-2011-0052

More information

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1 Multi-processor vs. Multi-computer architecture µp vs. DSP RISC vs. DSP RISC Reduced-instruction-set Register-to-register operation Higher throughput by using

More information

Module 9: Further Numbers and Equations. Numbers and Indices. The aim of this lesson is to enable you to: work with rational and irrational numbers

Module 9: Further Numbers and Equations. Numbers and Indices. The aim of this lesson is to enable you to: work with rational and irrational numbers Module 9: Further Numers and Equations Lesson Aims The aim of this lesson is to enale you to: wor with rational and irrational numers wor with surds to rationalise the denominator when calculating interest,

More information

Plan for Today and Beginning Next week (Lexical Analysis)

Plan for Today and Beginning Next week (Lexical Analysis) Plan for Today and Beginning Next week (Lexical Analysis) Regular Expressions Finite State Machines DFAs: Deterministic Finite Automata Complications NFAs: Non Deterministic Finite State Automata From

More information

LR-KFNN: Logistic Regression-Kernel Function Neural Networks and the GFR-NN Model for Renal Function Evaluation

LR-KFNN: Logistic Regression-Kernel Function Neural Networks and the GFR-NN Model for Renal Function Evaluation LR-KF: Logistic Regression-Kernel Function eural etworks and the GFR- Model for Renal Function Evaluation Qun Song, Tianmin Ma and ikola Kasaov Knowledge Engineering & Discovery Research Institute Auckland

More information

STEP Support Programme. Hints and Partial Solutions for Assignment 2

STEP Support Programme. Hints and Partial Solutions for Assignment 2 STEP Support Programme Hints and Partial Solutions for Assignment 2 Warm-up 1 (i) You could either expand the rackets and then factorise, or use the difference of two squares to get [(2x 3) + (x 1)][(2x

More information

Mathematics Background

Mathematics Background UNIT OVERVIEW GOALS AND STANDARDS MATHEMATICS BACKGROUND UNIT INTRODUCTION Patterns of Change and Relationships The introduction to this Unit points out to students that throughout their study of Connected

More information

2 discretized variales approach those of the original continuous variales. Such an assumption is valid when continuous variales are represented as oat

2 discretized variales approach those of the original continuous variales. Such an assumption is valid when continuous variales are represented as oat Chapter 1 CONSTRAINED GENETIC ALGORITHMS AND THEIR APPLICATIONS IN NONLINEAR CONSTRAINED OPTIMIZATION Benjamin W. Wah and Yi-Xin Chen Department of Electrical and Computer Engineering and the Coordinated

More information

N-Synchronous Kahn Networks A Relaxed Model of Synchrony for Real-Time Systems

N-Synchronous Kahn Networks A Relaxed Model of Synchrony for Real-Time Systems N-Synchronous Kahn Networks A Relaxed Model of Synchrony for Real-Time Systems Albert Cohen 1, Marc Duranton 2, Christine Eisenbeis 1, Claire Pagetti 1,4, Florence Plateau 3 and Marc Pouzet 3 POPL, Charleston

More information

Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Problem with Stochastic Demand and Duration Limits

Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Problem with Stochastic Demand and Duration Limits Rollout Policies for Dynamic Solutions to the Multi-Vehicle Routing Prolem with Stochastic Demand and Duration Limits Justin C. Goodson Jeffrey W. Ohlmann Barrett W. Thomas Octoer 2, 2012 Astract We develop

More information

Attributed Graph Transformation via Rule Schemata: Church-Rosser Theorem

Attributed Graph Transformation via Rule Schemata: Church-Rosser Theorem Attriuted Graph Transformation via Rule Schemata: Church-Rosser Theorem Ivaylo Hristakiev and Detlef Plump (B) University of York, York, UK detlef.plump@york.ac.uk Astract. We present an approach to attriuted

More information

New Constructions of Sonar Sequences

New Constructions of Sonar Sequences INTERNATIONAL JOURNAL OF BASIC & APPLIED SCIENCES IJBAS-IJENS VOL.:14 NO.:01 12 New Constructions of Sonar Sequences Diego F. Ruiz 1, Carlos A. Trujillo 1, and Yadira Caicedo 2 1 Department of Mathematics,

More information

Exploiting Bias in the Hysteresis Bit of 2-bit Saturating Counters in Branch Predictors

Exploiting Bias in the Hysteresis Bit of 2-bit Saturating Counters in Branch Predictors Journal of Instruction-Level Parallelism 5(23) -32 Submitted 2/2; published 6/3 Exploiting Bias in the Hysteresis Bit of 2-bit Saturating Counters in Branch Predictors Gabriel H. Loh Dana S. Henry Arvind

More information

Pseudo-automata for generalized regular expressions

Pseudo-automata for generalized regular expressions Pseudo-automata for generalized regular expressions B. F. Melnikov A. A. Melnikova Astract In this paper we introduce a new formalism which is intended for representing a special extensions of finite automata.

More information

An Adaptive Clarke Transform Based Estimator for the Frequency of Balanced and Unbalanced Three-Phase Power Systems

An Adaptive Clarke Transform Based Estimator for the Frequency of Balanced and Unbalanced Three-Phase Power Systems 07 5th European Signal Processing Conference EUSIPCO) An Adaptive Clarke Transform Based Estimator for the Frequency of Balanced and Unalanced Three-Phase Power Systems Elias Aoutanios School of Electrical

More information

Regression-based Statistical Bounds on Software Execution Time

Regression-based Statistical Bounds on Software Execution Time ecos 17, Montreal Regression-based Statistical Bounds on Software Execution Time Ayoub Nouri P. Poplavko, L. Angelis, A. Zerzelidis, S. Bensalem, P. Katsaros University of Grenoble-Alpes - France August

More information