EE M216A.:. Fall Lecture 4. Speed Optimization. Prof. Dejan Marković Speed Optimization via Gate Sizing

EE M216A.:. Fall 2010 Lecture 4 Speed Optimization Prof. Dejan Marković ee216a@gmail.com Speed Optimization via Gate Sizing Gate sizing basics P:N ratio Complex gates Velocity saturation ti Tapering Developing intuition Number of stages vs. fanout Popular inverter chain example Formal approach: logical effort Sizing optimization for speed EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 2 2

Basic Gate Sizing Relationships Rise and fall delays are determined by the pull up and pull down strength Besides the dimensions, strength depends on µ, C OX, V T PMOS is weaker because of lower µ P Larger P network than N network Increasing size of gate can reduce delay Inverse (1/) relationship with resistance (and hence delay) BUT it can slow down the gate driving it Proportional () relationship with Capacitance. So be careful! EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 3 3 P:N Ratio for Equal Rise and Fall Delay Good to have roughly equal delays for different transitions Don t need to worry about a worst case sequence Size P s to compensate for mobility C OX, V T, L are roughly the same R DRV 1/ I 1/ µ Make the Pull up and Pull down resistances equal R N /R P = 1 = µ P P /µ N N = kβ, k = mobility ratio, β = P:N ratio P / N = µ N / µ P Approximately the same as making V THL = V DD /2 Easy for an inverter hat about more complex gates? EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 4 4

Complex Gate Sizing N stack series devices need N times lower resistance N idth Make worst case strength of each path equal Multi input input transition can result in stronger network Long series stacking is VERY bad A B 6 6 6 E.g.: β = 2 2 2 A C B C EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 5 5 Accounting for Velocity Saturation Series stacking is actually less velocity saturated If we use R no_stack = (4/3)R stack Adjust the single device size to account for velocity saturation 4/3 A B 6 6 6 E.g.: β = 2 2 2 4/3 A C B C EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 6 6

P:N Ratio for Minimum Delay Delay of an inverter chain (2 inverters) to include t plh & t phl in P N P N out P N Let R PDRV ~ R 0 / P µ P, R NDRV ~ R 0 / N µ N, C G ~ C 0 (1+ P / N ) t PD = t D1 + t D2 = R 0 (1/ P µ P + 1/ N µ N ) C 0 (1+ P / N ) τ N (1+1/kβ)(1+β) Min(t PD ): dt PD /dβ = 0 = τ N (1 k/β 2 ) So β = P / N = sqrt(µ N /µ P ) Intuition is that since NMOS has more drive for a given size, it is better to use more NMOS EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 7 7 FO4 Inverter Delay vs. P:N Ratio β Optimal β = sqrt(µ) for minimum delay Curve is relatively flat so not a strong delay tradeoff FO4 inverter delay (τ) β EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 8 8

Tapering One observation from Elmore delay is that capacitance closer to the v source has less effect on delay τ delay =R(C 1 1 )+(R 1 +R 2 )(C 2 ) C 1 has less effect on delay than C 2 So taper stacked devices to speed them up Make the bottom ones bigger R 1 R 1 (many occurrences) has less resistance C 3 (multiplying larger R) has smaller capacitance In reality, tapering doesn t win as much because layout is less compact when stacking unequal sized transistors (causing more C) R 2 R 3 out GND C 3 C 2 C 1 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 9 9 Example (1/5): Delay of an N Input AND Function Series stacking: larger devices w/o improving drive strength Greater self loading capacitance e expect that with large number of inputs, it is no longer better to build bigger gates Comparison (approximate) 1 N input NAND gate driving an inverter 2 N/2 input NAND gates driving a NOR gate (to combine) Drive the same output load t PD1 t PD2 N-input β N /f N N /f β N N βf N f N N/2-input β N /f (N/2)( N )/f 2β N N βf N f N Let s analyze building blocks: NAND, NOR, INV EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 10 10

Example (2/5): Delay of an N Input NAND Assume C GN/P = C DN/P = C 0 ff/µm NMOS Resistance = R 0 µm NAND, NMOS size is N N / f For N inputs, R 1 = R 2 = = R 0 /(N N /f) C 1 = C 2 = = C N = NC 0 N /f N inputs NMOS width = N N /f out R C LOAD 1 R N-1 C N-2 C N-1 R N Let β = 2 (t plh = t phl for simplified analysis), NMOS Devices PMOS and Output EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 11 11 Example (3/5): Delay of Inverter and NOR Inverter R INV = R 0 / N C L_INV = C DIFF + C GATE = C 0 ( N (1+β) + f N (1+β)) For β = 2, t INV = R 0 C 0 (3+3f) 3f) C gate_inv = C 0 ( N (1+β)) Input capacitance of inverter NOR2 R NOR = R 0 / N C L_NOR = C DIFF + C GATE = C 0 ( N (2+2β) + f N (1+β)) For β = 2, t NOR = R 0 C 0 (6+3f) C gate_nor = C 0 ( N (1+2β)) Input capacitance of NOR2 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 12 12

Example (4/5): Comparison N input NAND and Inverter N/2 input NAND and NOR N/2 input NAND NMOS width = N/2 N /f Crossover at N = 5 with f = 4 (note the unequal C) EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 13 13 Example (5/5): Table of Comparison N = 4 t p1 = 21 + 6f (45 for f = 4) t p2 = 13 + 8f (45 for f = 4) N = 6 t p1 = 36 + 6f (60 for f = 4) t p2 = 21 + 8f (53 for f = 4) N = 8 t p1 = 55 + 6f (79 for f = 4) t p2 = 30 + 8f (62 for f = 4) It does not make sense in delay to build large fan In static CMOS gates of fan in greater than 4! EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 14 14

Transmission Gate Sizing Attempt to make a T gate have equal pull up and pull down resistance P:N ratio of k is not good for delay: NMOS still has some significant pull up up strength (even if not all the way to V DD ) PMOS has some pull down (but very weak) Using some common numbers R N_DN =R O kω µm, R N_UP =2R O kω µm (2 penalty, weak trans.) R P_UP =2.5R O kω µm, R P_DN =5R O kω µm (2 penalty, weak trans.) Let s try P = N Parallel Up, R TGUP = R N_UP R P_UP =1.1R O Parallel Down, R TGDN = R N_DN R P_DN =0.83R O So, using P / N = 1 is fairly reasonable Actual size may depend on the process technology EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 15 15 Delay Analysis (So Far) Summary The capacitance and resistance of the devices determine the performance of the circuit Elmore Delay approximation gives initial insight into design Step response, does not account for signal slopes The sizing of the transistors (a first glimpse) Determines the logical threshold Determines the drive strength of the gate as well as the load it presents to the preceding gate which effects the delay Determines the cap. and hence power dissipated by the gate Large fan in gates imply large self loading and gate loading to the preceding gate Better to split into 2 gates when fan in is greater than 4 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 16 16

Simplified Problem: Buffering C in N = 0 α 1 0 α 2 0 α N-1 0 Stage 1 Assume β (P:N ratio) = µ (mobility ratio) I PSAT =I NSAT R 0 = Pull down for NMOS with size 0 or PMOS with size β 0 C 0 = Gate capacitance of N+PMOS of size 0, β 0 Ignore Source/Drain & ire Capacitance τ 0 = R 0 C 0 Goal: sizes each of the N stages for minimum delay Delay for stage 1: α 1 C 0 R 0 = α 1 τ 0 Delay for stage 2: α 2 C 0 R 0 /α 1 = α 2 /α 1 τ 0 C out EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 19 19 Optimal Fanout Fanout of each stage of the inverter chain Stage 1 = α 1, Stage 2 = α 2 / α 1 Assuming that the fanout of each stage is equal, α 0 Let α 1 = α 0, α 2 = α 2 0, α 3 = α 3 0 Let C out = C 0 α 0 N Total Delay = Sum (Delay of stage 1 N) Delay = τ 0 Nα 0 Since C in = C 0 (remember: both C in and C out are given) C out /C in = α 0 N For a given N, the optimal α 0 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 20 20

Optimum Number of Stages For an arbitrary N 50 45 40 Delay versus Fanout Delay 35 30 e 25 20 Min Delay 15 1 2 3 4 5 6 Fan Out Optimum buffer fanout is e (2.718) when the self loading is neglected EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 21 21 Constant Fanout Per Stage? Intuition: what if we increase the size of 1 stage by (1+ ) R drv reduce 1/(1+ ) C load (previous stage) increases by (1+ ) Delay is summed and dr reduces less quickly than Ci increases So delay would increase if we deviate Mathematically: Delay = τ 0 (α 1 + α 2 /α 1 + α 3 /α 2 + α 4 /α 3 + α 5 /α 4 + ) ddelay/dα 1 = 0 ddelay/dα 2 1 = τ 0 (1 α 2 /α 12 ) So α 1 2 = α 2 ddelay/dα 2 = 0 ddelay/dα 2 = τ 0 (1/α 1 α 3 /α 22 ) So α 2 2 = α 1 α 3, thus α 1 3 = α 3 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 22 22

Optimal Buffering with Self Loading Intuition: without self loading Delay decreases proportionally with decreasing the # of stages But increasing fanout increases delay proportionally The two are equal at the optimum # of stages and fanout Intuition: with self loading Increasing fanout no longer increases delay proportionally Delay = R 0 (αc 0 + C sd ) New optimum # of stages would be less and fanout is bigger All equations remain the same except Delay EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 23 23 Optimal Buffering as fn (Self Loading) The optimum changes with self loading A reasonable number to use for optimal delay is fanout of 4 Delay 60 50 40 30 Delay versus Fanout 20 10 p=0 p=1 p=2 p=3 p=4 p=5 0 2 3 4 5 6 Fan Out p = C sd /C 0 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 24 24

Buffer Optimization for Energy Delay Optimizing for Energy (Power) doesn t make sense because the optimum will be the smallest possible device size Instead optimize for the best Energy Delay tradeoff Assuming constant fanout rgy Delay Ener 7 6 5 4 3 2 1 0 x 10 4 Energy Delay versus Fanout 2 4 6 8 10 Fan Out Assuming FO is constant, α 0 Results in larger FO FO = 5 is pretty reasonable EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 25 25 p=0 p=1 p=2 p=3 p=4 p=5 Issue with Optimal Energy Delay Constant fanout is not a good assumption Intuition: Reduce a lot of power by reducing the size of the final driver Large fanout at tthe last stage Reduce fanout of prior stages to compensate Example: C in =1, C out =1000 Equal Fanout Result: 4 stages Stage 1 Stage 2 Stage 3 Stage 4 EDP Equal FO = 5.62 1 5.62 31.6 177.8 32200 Unequal FO (tapered FO) 1 4.8 (4.8) 23.1 (4.9) 124.5 (5.4) 31100 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 26 26

Ultimately, e will get Here (~Lecture 8) Energy Delay Optimization Gate size, Supply Voltage, Threshold Voltage Energy egy E 0 D (min D) General form: E α D β ED (min EDP), V DD, V th optimization 0 Delay ED 0 (min E) EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 27 27 Application of Fanout to Logic? hen logic needs to drive a large capacitive load: Fanout ~ 4 hat is fanout? Effective load capacitance driven by the Gate (norm. to C inverter ) Example: NAND gate P =5, N =5 driving 5 equal NAND gates Equivalent Inverter: P =5, N =2.5; Total Gate width = 7.5 Total Load Gate idth = 5*10 = 50 Fanout = 6.6 Try to reorganize logic and add inverters so fanout ~4 hen logic has large N so each stage drives small fanouts: Delay is logic limited so reduce N Balance Fanout so that they are equal OK, but not very systematic EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 28 28

Speed Optimization via Gate Sizing Gate sizing basics P:N ratio Complex gates Velocity saturation ti Tapering Developing intuition Number of stages vs. fanout Popular inverter chain example Formal approach: logical effort Sizing optimization for speed EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 29 29 Concept of Logical Effort Instead of running lots of simulations Simplified: (almost) back of envelope calculations of delay Basic concept: Delay = R gate (C load + C self ) = R gate C load + R gate C self Logical Effort basic equation: d = f + p d is the delay (normalized) f is known as the effort delay p is known as the parasitic delay d = Delay/τ = (R gate C load + R gate C self ) / R 0 C 0 Normalized to the delay of a FO 1inverter(no self load) ith R 0 = R gate, d = fanout + normalized parasitic So f is essentially equivalent to fanout d is a measure that is independent of process, voltage, temp EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 30 30

The Logical Effort ay of Thinking Gate delay we used up to now: Another way to write this formula is: EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 31 31 Now Normalize the Delay Strategy: normalize to the time constant of an inverter Approach 1: normalize to fictitious technology time constant Approach 2: normalize to intrinsic delay of inverter Both formulations exist in the literature e use approach 1 (as in the original logical theory) Doesn t really matter it s just a constant EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 32 32

Normalized Delay Strategy: normalize to a time constant of an inverter Approach 1: normalize to fictitious technology time constant Normalized delay: Even simpler: Logical effort terms Logical effort (g) Electrical fanout (h) Parasitic delay (p) EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 33 33 The Meaning of Logical Effort Terms Logical effort terms Logical effort (g) Electrical fanout (h) Parasitic delay (p) Intuition Logical effort (g) R on ratio for equal C in C in ratio for equal R on Electrical fanout (h) C out / C in ratio (gate cap only, diffusion counts in the p term) Parasitic delay (p) Ratio of parasitic capacitances for equal R on EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 34 34

Calibrating the Model The values for g and p can be extracted from simulation Because, d = g*h+p Simulating the delay of the gate for different loads Drive itself with different multiplication li factor Extract τ using inverter with no self loading (AS, AD, PS, PD = 0) Vary the inputs (and rise/fall) for different g and p Dela ay/τ Gate Intercept is p Slope is g C load /C in EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 35 35 Logical and Electrical Effort Instead of just d = f + p, let f = gh g = logical effort (of a gate) Cost of implementing logic h = electrical effort Cost of driving a load f= R gate C load /R 0 C 0, p = R gate C self /R 0 C 0 Let R 0 = R inv where R inv = R gate, C 0 = C inv p = C self /C inv, f = C in C load /C in C inv C in is the gate s input capacitance (for the particular input) g = C in /C inv Each gate (and each input of every gate) has different values h = C load /C in Output to input capacitance ratio EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 36 36

Typical Simulation Data (*) Normalized dela ay: d 6 5 4 3 2 1 effort delay (*) assumes g INV = 1 parasitic delay 1 2 3 4 5 Electrical effort: g = C out /C in g = 4/3 p= 2 g = 1 d = (4/3)h + 2 p = 1 d = h + 1 d gate = g h + p = effort delay + parasitic delay EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 37 37 Computing Logical Effort: g g is an unitless inherent characteristic of the gate Not a function of gate size It is a function of the construction of the gate (connection and relative size between transistors) An indication of the cost of implementing the function. Procedure: 1. Choose an input, find total device with driven by that input 2. Find P, the pull up device width of a single device that has equivalent drive strength as a gate s pull up of that input 3. For a reference inv with Equal Rise/Fall, β=µ, with P from Step 3, determine the total gate widths of the inverter devices 4. Divide Step 2 by Step 4 to determine g up 5. Repeat Steps 3 5 for pull down device for g down The two g s would only be different if β of gate is not µ EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 38 38

Example: Calculating Logical Effort Def: Logical effort is the ratio of the gate input cap to the input cap of an inverter delivering the same output current NOR2: C in = 5 LE = 5/3 Inverter: C in = 3 LE = 1 (def) Reference NAND2: C in = 4 LE = 4/3 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 39 39 Example: NOR Gate with β = 3 Common assumptions C gate proportional to Device idth R gate inversely proportional to Device idth For a NOR gate β = µ = 3 Units are not so important Equivalent inverter 12 B P : N = 6:2 C G_INV = 8 12 NOR NOR gate input capacitance A 2 2 Output C G_NOR = 14 Logical Effort = 7/4 Caveat: don t get confused with absolute transistor sizing! EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 40 40

Example: Calculating Parasitic Delay Def: Parasitic delay is the ratio of intrinsic cap at the gate output and intrinsic cap at the output of an equivalent inverter NOR2: C int = 6 P = 2 Inverter: C int = 3 P = 1 (def) Reference NAND2: C int = 6 P = 2 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 41 41 Calculating Parasitic Delay: p Typically given since it depends on C diffusion of a gate Example: assume C S/D =0.5C G =0.5C o For an inverter C self /C inv = p INV =05 0.5 Higher C S/D /C G results in larger p (penalizing delay more). C S/D /C G is often close to 1 A 2 6 Output B A 12 12 2 2 NOR Output NOR Gate C S/DNOR = 12 + 2 + 2 =16 = 8C o C INV = 4C o p NOR = 2 (2p INV ) Caveat: γ INV is included in p INV (it is not always 1)! EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 42 42

Calculating p Including Series Stacking hat about the intermediate nodes? One way to account for them is to use an effective p. For example: NOR pull up of B input R NOR =2*R PMOS. Delay = (R NOR /2)*C 1 + R NOR *C 2 + R NOR *C load Self loading p BUP = [(R NOR /2)*C 1 + R NOR *C 2 ]/(R inv *C inv ) (where R inv =R gate ) p BUP = (C 1 /2+ C 2 )/C inv Using C S/D = 0.5C G C 1 = 6C o (shared) C 2 = 8C o p BUP = 11/4 B A 12 C1 C 1 12 NOR Output 2 2 C 2 Note: this increased accuracy requires different p s for different input AND pull up/down. Simplify by ignoring these nodes (unless otherwise specified) EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 43 43 Generalize N input NAND 2 2 C 1 =(2N+N)C o N NAND Output N C 2 =NC o C 3 =NC o Output load = 3N N size 2 PMOS=2N 1 size N NMOS = N Intermediate load = N (shared) Total pull down delay T = R(3NC o ) + sum(i=1 N 1){(iR/N)*NC o } d (norm) = 3N + (N 2 /2 N/2) p = (N 2 /2 N/2) Proportional to N 2!!! This is bad news for large series stacking Even worse for PMOS (NOR) Reality is even worse since C GS makes each intermediate node capacitance > NC o EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 44 44

A Catalog of Gates Gate Type g for Different number of inputs 1 2 3 4 5 n Inverter 1 NAND 4/3 5/3 6/3 7/3 (n+2)/3 NOR 5/3 7/3 9/3 11/3 (2n+1)/3 Multiplexer 2 2 2 2 2 XOR,XNOR 4 12 32 Gate Type Parasitic delay Inverter n-input NAND n-input NOR n-way Multiplexer 2-input XOR,XNOR (sym) p inv np inv np inv 2np inv n2 n-1 p inv β = µ = 2 Mux is tri state inverters shorted together. XOR assumes that input is bundled (a,a ) p INV ~ 1 p GATE in this table does not include intermediate nodes. EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 45 45 Example #1: Ring Oscillator Estimate the frequency of an N stage ring oscillator: D Logical Effort: g = Electrical Effort: h = Parasitic i Delay: p = 1 C out /C in = 1 p inv = 1 gpdk090: t stage = 13ps (TT) Stage Delay: d = g h + p = 2 1 OSC Frequency: f OSC = 2Ndτ = 1 4Nτ EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 46 46

Example #2: Fanout of 4 Inverter Estimate the delay of a fanout of 4 (FO4) inverter: D Logical Effort: g = Electrical Effort: h = Parasitic Delay: p = Stage Delay: d = 1 C out /C in = 4 p inv = 1 g h + p = 5 gpdk090: t FO4 = 33ps (TT) EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 47 47 Example #3: Gate Delays Delay of the path from A to B where β = µ = 2 and p INV =1 g G1 = 4/3, p G1 =2, C IN_G1 =8 g G2 = 5/3, p G2 =2, C IN_G2 =15 g G3 = 4, p G3 =4, C IN_G3 =30 C IN_G4 =15 h G1 = (C IN_G2 +C IN_G3 )/C IN_G1 = 5.625, h G2 = C IN_G4 /C IN_G2 = 1 d G1 = g G1 h G1 +p G1 = 9.5 d G2 = g G2 h G2 +p G2 = 3.66 P : N =20:10 Delay = 13.16 G 3 P : N =4:4 Normalized A G 1 P : N =10:5 B G 2 G 4 P : N =12:3 EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 48 48

Summary Delay and/or power of a logic network depend significantly on the relative sizes of logic gates (not transistors within a gate) Inverter buffering is a simple example of the analysis The analysis leads to ~FO 4 4asbeingoptimalfanout for driving larger capacitive loads To generalize analysis of delay, we introduce logical effort Delay normalized by inverter delay, d = gh + p g and p are characteristics of a logic gate that depends on its structure and does not depend on gate size. May have different g s and p s for different inputs and pull up / pull down Simplifybyusingg g AVG and ignoring C s of intermediate nodes Once a table of g s and p s are created for the catalog of gates, delay can be calculated quickly and easily Next we will look at how to size a network instead of just analyzing it EEM216A.:. Fall 2010 Lecture 4: Speed D. Optimization Markovic / Slide 49 49