LCA14-206: Scheduler tling and benchmarking Tue-4-Mar, 11:15am, Zran Markvic, Vincent Guittt
Scheduler Tls and Benchmarking Frm Energy Aware mini-summit @ Ksummit 2013 extract frm [1]: Ing Mlnar came in with a cmplaint: nne f the pwer-management wrk starts with measurements f the system's pwer behavir. Withut a cherent apprach t measuring the effects f a patch, there is n real way t judge these patches t decide which nes shuld g in. We cannt, he said, merge scheduler patches n faith, hping that they smehw make things better. [1] https://lwn.net/articles/571414/ Tls need t: Generate repetitive, deterministic lad patterns Evaluate perfrmance and/r pwer cnsumptin Check fr vilatin f scheduling cnstraints lad scheduler energy mdel schedule idle stat hardware estimated pwer pwer www.linar.rg
Lad Generatin www.linar.rg
Lad Generatin: Cyclictest Maintained by Clark Williams f RedHat Devised t measure real-time perfrmance Measures latencies in respnse times Perfrms timer sleep, fllwed by clck_gettime() Cmpares requested sleep time t measured time Difference is the latency Starts a number f threads whse sleep can be staggered in time Linar versin als busy-lps a specified number f iteratins after wakeup Tgether with sleep time, this represents the peridic lad pattern Aut Lad calibratin www.linar.rg
Lad Generatin: rt-app Maintained by Giacm Bagnli and Juri Lelli Used t test deadline scheduler Use jsn file t describe scenari Lcking scenari fr thread dependency Task pririty setting Runtime, Perid and deadline Generate stats and trace events fr debugging and analyses www.linar.rg
Lad Generatin: New Develpment Ability t generate custm lad sequences Perid, Lad & Deadline nn peridic lad Aut Lad calibratin Task pririty setting Task dependency with lck scenari Lad cnfiguratin file Generate ftrace event and statistic Shared bject library fr linking int ther scheduler tls Other? www.linar.rg
Estimating Pwer Cnsumptin www.linar.rg
Energy Mdel www.linar.rg
Energy Mdel Each platfrm has different pwer cnsumptin parameters As a cnsequence, schedules are platfrm-specific, i.e. Each platfrm may have its wn view f what s mst efficient There is n apples-t-apples cmparisn f efficiency acrss platfrms, but We can cmpare multiple scheduler slutins n a single platfrm It wuld be prudent t characterize scheduler implementatin n a multitude f platfrms www.linar.rg
Benchmarking www.linar.rg
Benchmarking Given an energy mdel, evaluate a schedule Capture the time spent in each C-state and P-state Run it thrugh the energy mdel t assess pwer cnsumptin Als, verify cnstraints Hw lng did it take t cmplete prcessing? Were any f the deadlines missed? Were tasks prperly priritized? Was it dne within thermal budget? www.linar.rg
Idlestat Helps t assess hw much energy was spent fr a particular schedule Dcumentatin RFC pending Makes n assumptin abut the energy mdel Uses kernel FTRACE functin t capture: Entry and exit times fr each C-state Entry and exit times fr each P-state Raised IRQs idlestat is nn-intrusive t C-state and P-state transitins: Sleeps while traces are captured Parses/analyzes traces after the acquisitin is cmplete www.linar.rg
Idlestat clustera@state hits ttal(us) avg(us) min(us) max(us) C1 10821 5879554.00 543.35 0.00 23163.00 C2 0 0.00 0.00 0.00 0.00 C3 78 2929290.00 37555.00 0.00 101441.00 cpu0@state hits ttal(us) avg(us) min(us) max(us) C1 6744 6407808.00 950.15 0.00 23194.00 C2 3 8819.00 2939.67 549.00 5310.00 C3 75 2960110.00 39468.13 213.00 101441.00 350 1047 204490.00 195.31 0.00 4578.00 700 5628 396247.00 70.41 0.00 1465.00 920 0 0.00 0.00 0.00 0.00 cpu0 wakeups name cunt irq109 ehci_hcd:usb1 1727 irq029 twd 4524 cpu1@state hits ttal(us) avg(us) min(us) max(us) C1 6544 6398931.00 977.83 0.00 36255.00 C2 1 1129.00 1129.00 1129.00 1129.00 C3 77 2955293.00 38380.43 122.00 101471.00 350 1124 212428.00 188.99 0.00 18677.00 700 5366 408782.00 76.18 0.00 946.00 920 0 0.00 0.00 0.00 0.00 cpu1 wakeups name cunt irq029 twd 4737 www.linar.rg
Idlestat Hw d we get pwer parameters? Device tree/manufacturer s data? Linear fitting? Methd evaluated n TC2 Wrk in prgress Has registers t measure per-cluster pwer cnsumptin Has 2 clusters, 5 cres, 2 C-states and 8 P-states each Large slutin space (~50 pwer parameters) Linear fitting simplified t slving 6x6 equatin system fr A7s nly, single P-state (350MHz) ~2.6% errr* (cyclictest -t 10 -L c0p15 --latency 100000 -q) www.linar.rg
Benchmarking: Enhancements Related t verificatin f scheduling cnstraints Assessment f perfrmance: Time executin: launch executable and wait t cmplete Add synchrnizatin pints t lad generatin tl Time between synchrnizatin pints represents a measure f perfrmance Assessment f prcessing latencies Hw d we d this in a nn-intrusive manner? Inside lad generatin utility? Thermal assessment Energy cnsumed ver time Overshting/undershting target www.linar.rg
Mre abut Linar Cnnect: http://cnnect.linar.rg Mre abut Linar: http://www.linar.rg/abut/ Mre abut Linar engineering: http://www.linar.rg/engineering/ Linar members: www.linar.rg/members