Runtime Analysis of 4 VA HiCuM Versions with and without Internal Solver

Similar documents
About Modeling the Reverse Early Effect in HICUM Level 0

Status of HICUM/L2 Model

Investigation of New Bipolar Geometry Scaling Laws

Didier CELI, 22 nd Bipolar Arbeitskreis, Würzburg, October 2009

HICUM release status and development update L2 and L0

A Novel Method for Transit Time Parameter Extraction. Taking into Account the Coupling Between DC and AC Characteristics

Transistor's self-und mutual heating and its impact on circuit performance

Working Group Bipolar (Tr..)

Problem. Problem Given a dictionary and a word. Which page (if any) contains the given word? 3 / 26

Accurate transit time determination and. transfer current parameter extraction

2 nd International HICUM user s meeting

Breakdown mechanisms in advanced SiGe HBTs: scaling and TCAD calibration

A new transit time extraction algorithm based on matrix deembedding techniques

Non-standard geometry scaling effects

Semiconductor Device Simulation

****** bjt model parameters tnom= temp= *****

UTPlaceF 3.0: A Parallelization Framework for Modern FPGA Global Placement

TCAD setup for an advanced SiGe HBT technology applied to the HS, MV and HV transistor versions

HICUM Parameter Extraction Methodology for a Single Transistor Geometry

Regional Approach Methods for SiGe HBT compact modeling

BEOL-investigation on selfheating and SOA of SiGe HBT

Nonlinear distortion in mm-wave SiGe HBTs: modeling and measurements

Homework 6 Solutions and Rubric

Lecture 38 - Bipolar Junction Transistor (cont.) May 9, 2007

THE UNIVERSITY OF MICHIGAN. Faster Static Timing Analysis via Bus Compression

CHAPTER 2 AN OVERVIEW OF TCAD SIMULATOR AND SIMULATION METHODOLOGY

Modeling high-speed SiGe-HBTs with HICUM/L2 v2.31

VLSI Design, Fall Logical Effort. Jacob Abraham

EECS150 - Digital Design Lecture 11 - Shifters & Counters. Register Summary

Direct Self-Consistent Field Computations on GPU Clusters

HICUM / L2. A geometry scalable physics-based compact bipolar. transistor model

University of Pittsburgh

Measuring Goodness of an Algorithm. Asymptotic Analysis of Algorithms. Measuring Efficiency of an Algorithm. Algorithm and Data Structure

c. VH: Heating voltage between the collector and emitter.

What happens to the value of the expression x + y every time we execute this loop? while x>0 do ( y := y+z ; x := x:= x z )

Reducing Noisy-Neighbor Impact with a Fuzzy Affinity- Aware Scheduler

EEE 421 VLSI Circuits

On the Phase Noise and Noise Factor in Circuits and Systems - New Thoughts on an Old Subject

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

MODEL ANSWER SUMMER 17 EXAMINATION Subject Title: Principles of Digital Techniques

Technology Computer Aided Design (TCAD) Laboratory. Lecture 2, A simulation primer

Physics lab Hooke s Law and Pendulums

Parallelization of the QC-lib Quantum Computer Simulator Library

RIB. ELECTRICAL ENGINEERING Analog Electronics. 8 Electrical Engineering RIB-R T7. Detailed Explanations. Rank Improvement Batch ANSWERS.

Whereas the diode was a 1-junction device, the transistor contains two junctions. This leads to two possibilities:

Examination paper for TFY4185 Measurement Technique/ Måleteknikk

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Review: Designing with FSM. EECS Components and Design Techniques for Digital Systems. Lec 09 Counters Outline.

MILITARY SPECIFICATION MICROCIRCUITS, DIGITAL, BIPOLAR, SCHOTTKY TTL, FLIP-FLOPS, CASCADABLE, MONOLITHIC SILICON

Methodology From Chaos in IC Implementation

Switching circuits: basics and switching speed

The PSP compact MOSFET model An update

ECE580 Exam 2 November 01, Name: Score: / (20 points) You are given a two data sets

Recent Progress of Parallel SAMCEF with MUMPS MUMPS User Group Meeting 2013

Lecture 17. The Bipolar Junction Transistor (II) Regimes of Operation. Outline

Analog Simulation. Digital simulation. Analog simulation. discrete values. discrete timing. continuous values. continuous timing

At point G V = = = = = = RB B B. IN RB f

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

Parallel VLSI CAD Algorithms. Lecture 1 Introduction Zhuo Feng

Linear Phase-Noise Model

Effective Entropy for Memory Randomization Defenses

Engineering Mechanics: Statics in SI Units, 12e

Charge-storage related parameter calculation for Si and SiGe bipolar transistors from device simulation

DDR4 Board Design and Signal Integrity Verification Challenges

INTEGRATED CIRCUITS. For a complete data sheet, please also download:

SHORTER GATE SEQUENCES FOR QUANTUM COMPUTING BY MIXING UNITARIES PHYSICAL REVIEW A 95, (2017) ARXIV:

EE115C Winter 2017 Digital Electronic Circuits. Lecture 19: Timing Analysis


Industrial Technology: Electronic Technology Crosswalk to AZ Math Standards

Lecture 17 - The Bipolar Junction Transistor (I) Forward Active Regime. April 10, 2003

EECS150 - Digital Design Lecture 25 Shifters and Counters. Recap

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Electronic Circuits Summary

EECS150 - Digital Design Lecture 23 - FSMs & Counters

SCHOOL OF COMPUTING, ENGINEERING AND MATHEMATICS SEMESTER 1 EXAMINATIONS 2012/2013 XE121. ENGINEERING CONCEPTS (Test)

An Algorithmic Framework of Large-Scale Circuit Simulation Using Exponential Integrators

Testability. Shaahin Hessabi. Sharif University of Technology. Adapted from the presentation prepared by book authors.

EECS150 - Digital Design Lecture 18 - Counters

EECS150 - Digital Design Lecture 18 - Counters

Lecture 6 Power Zhuo Feng. Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 2010

MEXTRAM (level 504) the Philips model for bipolar transistors

A Mathematical Solution to. by Utilizing Soft Edge Flip Flops

Algorithms and Data S tructures Structures Complexity Complexit of Algorithms Ulf Leser

Interconnect Lifetime Prediction for Temperature-Aware Design

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

EE 230 Lecture 31. THE MOS TRANSISTOR Model Simplifcations THE Bipolar Junction TRANSISTOR

Spectral Analysis of Noise in Switching LC-Oscillators

ELEC 3908, Physical Electronics, Lecture 18. The Early Effect, Breakdown and Self-Heating

Symbolic Computation and Theorem Proving in Program Analysis

14 Gb/s AC Coupled Receiver in 90 nm CMOS. Masum Hossain & Tony Chan Carusone University of Toronto

EECS150 - Digital Design Lecture 17 - Sequential Circuits 3 (Counters)

Motivation for CDR: Deserializer (1)

N-Channel Enhancement-Mode Vertical DMOS FET

DESPITE considerable progress in verification of random

EECS150 - Digital Design Lecture 16 Counters. Announcements

Short Introduction to CLIC and CTF3, Technologies for Future Linear Colliders

Phototransistor. Industry Standard Single Channel 6 Pin DIP Optocoupler

Performance Analysis of Lattice QCD Application with APGAS Programming Model

UNIVERSITY OF CALIFORNIA, BERKELEY College of Engineering Department of Electrical Engineering and Computer Sciences

Introduction of a Stabilized Bi-Conjugate Gradient iterative solver for Helmholtz s Equation on the CMA GRAPES Global and Regional models.

Transcription:

Runtime Analysis of 4 VA HiCuM Versions with and without Internal Solver Didier Céli, Jean Remy 28 th ArbeitsKreis Bipolar - Letter Session Unterpremstaetten, Austria, November 5/6, 215 dm23a.15

Outline 1/2 Purpose HiCuM versions and simulator used Results Comments Summary Acknowledgement References

Purpose 2/2 Evaluation of HiCuM/L2 VA codes with and without internal loop for solving the transfer current DC, AC and CML ring-oscillator simulations Accuracy Runtime

HiCuM revision and simulator 3/2 Tested HiCuM revisions (VA code*) HiCuM revision Comments HiCuM/L2 Production version HiCuM/l2 Last HiCUM revison. Beta version under evaluation [1] HiCuM/L2 v2.4 HiCuM/L2 *.OP variables computation not activated in VA code (in fact low impact on runtime) Simulations with ELDO ams15.3 HiCuM without internal solver, proposed by TUD [2] with the help of 2 additionnal internal nodes. Not approved by CMC HiCuM subcommittee. No runtime improvement. HiCuM without internal solver, proposed by Z. Huszka (AMS) [3] with the help of 1 additionnal internal node. Notation Best case in blue Worst case in red

DC and AC simulations (1/2) 4/2 Comparison between HiCuM/L2 and HiCuM/L2 SH = 1 and NQS =1 Gummel plot @ 27 C V BE (V) V BC (V) Number of points CPU time CPU time -1 to 1.1V step.1v.6,.25,, -.25 -.8 15 16s 5ms 15s 43ms I C, I B, I SUB [A] 1-2 1-3 1-4 1-5 1-6 1-7 1-8 V BC =.6V I C V2.33 I B V2.33 I SUB V2.33 I C V2.4Z I B V2.4Z I SUB V2.4Z I C, I B, I SUB [A] 1 1-2 1-4 1-6 1-8 1-1 1-12 1-14 V BC = V I C V2.33 I B V2.33 I SUB V2.33 I C V2.4Z I B V2.4Z I SUB V2.4Z 1-9 1-16 1-1 -1 -.5.5 1 V BE [V] 1-18 -1 -.5.5 1 V BE [V] I C, I B, I SUB [A] 1 1-2 1-4 1-6 1-8 1-1 1-12 V BC = -.8V I C V2.33 I B V2.33 I SUB V2.33 I C V2.4Z I B V2.4Z I SUB V2.4Z Some discrepancies on the collector current between and in the breakdown region (V BC = -.8V) 1-14 1-16 1-18 1-2 -1 -.5.5 1 V BE [V]

DC and AC simulations (2/2) 5/2 f T characteristics @ 27 C (SH = 1 and NQS =1) V BE (V) V BC (V) Number of points CPU time CPU time.6 to 1.1V step.1v.6,.25,, -.25 -.5 25 1s 62ms 1s 47ms 35 3 25 T = 27 o C V BC =.6V V2.33 V BC =.6V V2.4Z V BC =.V V2.33 V BC =.V V2.4Z V BC =-.5V V2.33 V BC =-.5V V2.4Z f T [GHz] 2 15 1 5 Comments.7.75.8.85.9.95 1 V BE [V] Same accuracy between the 2 versions (excepted near and after BV CE ) CPU time too small to see a real difference between the 2 versions Test on ring oscillator for more realistic results

CML ring oscillator 6/2 Simulations at 25 C of CML ring-oscillator of 21 gates at 3 densities of current by keeping the logic swing constant and equal to 5mV Before the f T peak (I C = 1.4 ma) At the f T peak (I C = 8.5 ma) After the f T peak (I C = 23 ma) Simulations done with and without self-heating (SH) f T Simulations done with and without non-quasi-static effects (NQS).1 1 1 I C [ma] Simulations executed using single threading (32-bit) and on the same operating system (Linux) RedHat 5.9 32-bit 3.3 GHz Simulations executed using the default simulator options Netlists are available on request (salim.elghouli@st.com)

SH = and NQS = 7/2 Results Tail Current Parameter HiCuM/L2 HiCuM/L2 HiCuM/L2 v2.4 HiCuM/L2 Number of Newton Iterations 1828 11176 11695 17235 I C = 1.4 ma I C = 8.5 ma I C = 23 ma Number of accepted Time Steps 14538 14538 14538 14538 Elapsed CPU time 5mn 3s 6mn 19s 4mn 54s 4mn 18s Period of the CML Ring (ps) 572.28 572.28 572.28 572.28 Number of Newton Iterations 289544 29785 62395 555514 Number of accepted Time Steps 26344 26414 5948 728 Elapsed CPU time 15mn 43s 16mn 53s 29mn 49s 43mn 19s Period of the CML Ring (ps) 163.42 166.12 14.81 141.3 Number of Newton Iterations 249172 252481 586196 944537 Number of accepted Time Steps 25443 25663 67853 13623 Elapsed CPU time 18mn 1s 18mn 46s 27mn 34s 58mn 36s Period of the CML Ring (ps) 22.3 2.86 191.88 183.43 V2.34 V2.33 V2.4

SH = and NQS = 8/2 Waveforms IC = 1.4 ma All versions give the same results.1 I C = 1.4 ma v2.4 -.1 -.2 -.3 -.4 IC = 8.5 ma and give similar results v2.4 and give similar results but different than and Explanations? -.5.5 1 1.5 2 2.5 3 3.5 4 I C = 8.5 ma.1 -.1 -.2 -.3 v2.4 -.4 IC = 23mA and give similar results v2.4 and give different results and different than and Explanations? -.5.8 1 1.2 1.4 1.6 1.8 2 I C = 23 ma.1 -.1 -.2 -.3 -.4 v2.4 -.5 -.6.8 1 1.2 1.4 1.6 1.8 2

SH = and NQS =1 9/2 Results Tail Current Parameter HiCuM/L2 HiCuM/L2 HiCuM/L2 v2.4 HiCuM/L2 Number of Newton Iterations 11238 11223 11253 1517 I C = 1.4 ma I C = 8.5 ma I C = 23 ma Number of accepted Time Steps 14496 14493 14494 14493 Elapsed CPU time 6mn 28s 6mn 2s 4mn 21s 6mn 2s Period of the CML Ring (ps) 574.6 574.6 574.6 574.6 Number of Newton Iterations 5616 56182 697655 699376 Number of accepted Time Steps 56795 5685 6894 971 Elapsed CPU time 51mn 35s 31mn 36s 32mn 15s 1h 14mn 52s Period of the CML Ring (ps) 15.1 146.96 143.75 142.9 Number of Newton Iterations 67312 677572 14411 1269862 Number of accepted Time Steps 75291 75622 113434 18375 Elapsed CPU time 1h 12mn 36s 49mn 25s 5mn 27s 2h 23mn 35s Period of the CML Ring (ps) 22.56 189.7 193.68 187.79 V2.4 V2.34 V2.33

SH = and NQS =1 1/2 Waveforms IC = 1.4 ma All versions give the same results.1 I C = 1.4 ma v2.4 -.1 -.2 -.3 -.4 -.5.5 1 1.5 2 2.5 3 3.5 4 IC = 8.5 ma and give similar results v2.4 and give similar results but different than and Explanations?.1 -.1 -.2 -.3 I C = 8.5 ma v2.4 -.4 IC = 23mA All versions give different results Explanations? Output Voiltage [V] -.5.8 1 1.2 1.4 1.6 1.8 2 I C = 23 ma.1 -.1 -.2 -.3 -.4 v2.4 -.5 -.6.8 1 1.2 1.4 1.6 1.8 2

SH = 1 and NQS = 11/2 Results Current Tail Parameter HiCuM/L2 HiCuM/L2 HiCuM/L2 v2.4 HiCuM/L2 Number of Newton Iterations 1828 11176 11693 17224 I C = 1.4 ma I C = 8.5 ma I C = 23 ma Number of accepted Time Steps 14538 14538 14538 14538 Elapsed CPU time 6mn 17s 7mn 38s 5mn s 4mn 17s Period of the CML Ring (ps) 572.28 572.28 572.28 572.28 Number of Newton Iterations 2896 29787 621179 518328 Number of accepted Time Steps 26363 2644 59118 67176 Elapsed CPU time 17mn 1s 16mn 43s 28mn 46s 43mn 18s Period of the CML Ring (ps) 163.61 164.41 14.81 14.81 Number of Newton Iterations 251166 25283 58677 952814 Number of accepted Time Steps 25658 25765 71314 136742 Elapsed CPU time 17mn 56s 18mn 25s 24mn 23s 1h 3mn 57s Period of the CML Ring (ps) 22.29 24.45 19.52 184.1 V2.34 V2.33 V2.4

SH = 1 and NQS = 12/2 Waveforms IC = 1.4 ma All versions give the same results.1 I C = 1.4 ma v2.4 -.1 -.2 -.3 -.4 IC = 8.5 ma and give similar results v2.4 and give different results and different than and Explanations? -.5.5 1 1.5 2 2.5 3 3.5 4 I C = 8.5 ma.1 -.1 -.2 -.3 v2.4 -.4 IC = 23mA and give similar results v2.4 and give different results and different than and Explanations? -.5.8 1 1.2 1.4 1.6 1.8 2 I C = 8.5 ma.1 -.1 -.2 -.3 v2.4 -.4 -.5.8 1 1.2 1.4 1.6 1.8 2

SH = 1 and NQS =1 13/2 Results Current Tail Parameter HiCuM/L2 HiCuM/L2 HiCuM/L2 v2.4 HiCuM/L2 Number of Newton Iterations 11221 11221 11253 152 I C = 1.4 ma I C = 8.5 ma I C = 23 ma Number of accepted Time Steps 14493 14493 14494 14493 Elapsed CPU time 6mn 14s 5mn 58s 5mn 12s 6mn 33s Period of the CML Ring (ps) 574.6 574.6 574.6 574.6 Number of Newton Iterations 561688 562262 698591 696174 Number of accepted Time Steps 56696 56792 69765 89451 Elapsed CPU time 33mn 47s 31mn 13s 33mn 1s 1h 13mn 21s Period of the CML Ring (ps) 147.82 15.3 143.75 142.44 Number of Newton Iterations 673661 673392 143992 1331 Number of accepted Time Steps 75212 75137 115816 18771 Elapsed CPU time 57mn 28s 44mn 41s 5mn 23s 2h 3mn 4s Period of the CML Ring (ps) 19.93 196.63 188.51 184.16 V2.33 V2.34 V2.4

SH = 1 and NQS =1 14/2 Waveforms IC = 1.4 ma All versions give the same results.1 I C = 1.4 ma v2.4 -.1 -.2 -.3 -.4 IC = 8.5 ma All versions give different results Explanations? -.5.5 1 1.5 2 2.5 3 3.5 4 I C = 8.5 ma.1 -.1 -.2 -.3 v2.4 -.4 IC = 23mA All versions give different results Explanations? -.5.8 1 1.2 1.4 1.6 1.8 2 I C = 23 ma -.1 -.2 -.3 -.4 v2.4 -.5 -.6.8 1 1.2 1.4 1.6 1.8 2

Impact of SH and NQS on waveforms 15/2 HiCuM/L2 IC = 1.4 ma No impact of SH and NQS on the waveforms.1 I C = 1.4 ma SHNQS SH1NQS SHNQS1 SH1NQS1 -.1 -.2 -.3 -.4 IC = 8.5 ma No impact of the SH on the waveforms NQS effects impact the waveforms -.5.5 1 1.5 2 2.5 3 3.5 4 I C = 8.5 ma.1 -.1 -.2 -.3 SHNQS SH1NQS SHNQS1 SH1NQS1 -.4 IC = 23mA No impact of the SH on the waveforms NQS effects impact the waveforms -.5.8 1 1.2 1.4 1.6 1.8 2 I C = 23 ma -.1 -.2 -.3 -.4 SHNQS SH1NQS SHNQS1 SH1NQS1 -.5 -.6.8 1 1.2 1.4 1.6 1.8 2

Comments (1/2) 16/2 CML ring oscillator waveforms are not strictly identical for all versions. Their differences depend on the current density and are worst if NQS effects are on. Explanation? Best results (runtime) are obtained with the transfer current solver coded inside HiCuM VA code Similar runtime between and (both version with internal solver) Despite the lower nodes number, (-1) used in to solve, by the simulator, the transfer current, there is no CPU time improvement (and iterations number) in comparison with v2.4. Why? At low current (I C = 1.4 ma), all HiCuM versions have the same runtime whatever SH ( or 1) and NQS ( or 1)

Comments (2/2) 17/2 At medium current (I C = 8.5 ma) and high current (I C = 23 ma), the runtime increases strongly for all HiCuM versions and more when the solver is outside the model, the worst case being HiCuM. Why? Is it due to the internal solver or the complexity of HICUM equations and derivatives? At medium current (I C = 8.5 ma) and high current (I C = 23 ma), NQS have a strong impact on the runtime (increase) for all HiCuM revisions. In the opposite, the SH has a lower or negligible effect. Why? Number of external nodes to solve NQS effect? Complexity of the HiCuM formulations and derivatives? What is responsible of the increase of HiCuM runtime at high currents? Is it due to the internal solver? Is it due to the HiCuM equations and derivatives? Is it due to the non optimized C code generated by VA compilers? How to explain the impact of NQS effects On runtime On the difference of the waveforms between model version?

Summary 18/2 Whatever the proposed solution (TuD or AMS), to remove the internal loop for solving the transfer current, no improvement of the runtime. Using a CML ring oscillator as test bench, shows that the runtime is strongly degraded at density of currents around and after the f T peak and by the activation of NQS effects Explanation? Possible improvements? Any for further investigation (if needed)?

Acknowledgement 19/2 to Zoltan Huszka (AMS) for providing the VA code with a new proposal for removing the internal loop of HiCuM/L2 [3]

References 2/2 [1] M. Schröter, A. Pawlak, HiCuM/L2 - Release Notes, August 215. [2] M. Schröter, A. Pawlak, HiCuM/L2 - Productization and Support, Q2 CMC Meeting, June 215. [3] Z. Huszka, Reduction the Computational Cost of HiCuM/L2 at Invariant Node Count, 28 th ArbeitsKreis Bipolar, AMS, November 215.