Applications of Memristors in ANNs

Outline Brief intro to ANNs Firing rate networks Single layer perceptron experiment Other (simulation) examples Spiking networks and STDP

ANNs ANN is bio inpsired inpsired massively parallel network, i.e. directed graph, with nodes acting as neurons and edges acting as synapses. The functionality is learned during training phase by changing weights of synapses By topology By learning paradigm By coding neural information

Very good review

Applications

Complexity ~ 10 11 neurons ~ 10 15 synapses Connectivity ~ 1 : 10000 Massive parallelism li 100 steps long rule: few to several hundred hertz; face recognition in ~100 ms Challenges 2 3 mm think, 2200 cm 2

McCulloch Pitts neuron 1943 dff different activation functions

By topology

By learning paradigm Key questions: Capacity, Sample complexity, Computational complexity

By information coding Firing rate vs spiking Firing rate vs spiking models

Perceptron: Main idea Single layer perceptron Bias, x 0 x x 1 x 2 w 1 w 0 9 x 3 y sgn[ w x i i ] x 9 w 9 i 0

Hebbian rule Learning using local information Orientation selectivity

Multilayer perceptron Key questions: number of layers, number of hidden neurons

Backpropagation Gradient descent method to minimize i i cost function

Competitive learning

Learning binary patterns with competitive network Instar learning law: What happens if more than four unique patterns are presented? What happens when all white pattern is presented?

Complementary coding Resolve no signal issue for a particular (instar) learning law How to learn invariance? (translation, size, angle etc.)

With added complex cells

With added complex cells AND in bottom layer OR in top present one hot AND in bottom layer, OR in top, present one hot patterns to the top layer

Perceptron: Main idea Single layer perceptron Bias, x 0 x 1 hw bottleneck x w 1 2 w 0 9 x 3 y sgn[ w i x i ] i 0 Binary pixel array x 1 x 4 x 7 x 2 x x x = +1 x 5 x 8 x 3 x 6 x 9 x = 1 w 9 w x 9 9 Considered training/test patterns Perceptron training rule: w i = αx (p) i (d (p) y (p) ) Pattern X, class d = +1 Crossbar implementation V x G + -G - = G w V 0 V 1 V 9 V 2 G 0+ G 1+ G + 2 G + 9 I + G 0 G 1 G 2 G 9 I y = sgn[i I + -II - ] A A + param. analyzerbased Pattern T, class d = 1 Alibart et al., submitted, 2012

Windrow s memistor AdaLiNe concept and hardware implementation Bernard Widrow Marcian Hoff B. Widrow and M.E. Hoff, Jr., IRE WESCON Convention Record, 4:96 1960

Pt/TiO 2 x /Pt devices g=i(0.2v)/ 0.2 V 1.0 S = 25 nm Au / 15 nm Pt top electrode e beam 30 nm TiO 2 x patterned Pt protrusion 5 nm Ti / 25 nm Pt bottom electrode Curr rent (ma) 1.0 0 Alibart et al., submitted, 2012 S A 1.0 0 1.0 Voltage (V) V switch +V switch V 20 nm Any state between ON and OFF In principle dynamic system with frequency dependent loop size but. Strongly (superexp) nonlinear switching dynamics Gray area = no change State defined within gray area

Switching dynamics voltage set initialize to R 0FF time read reset initialize to R 0N Small pulse amp = finer state change but may require exp long time Large pulse amp faster but at cruder step RESET: R 0 =R ON SET: R 0 =R OFF 100 10 1 R/R 0 0.1 mv (A) Current @ -200 1E-4 1E-5-0.5V to -0.8V -0.9V -1.0V -1.1V -1.2V -1.3V 0 1x10-5 2x10-5 -1.5-1.0-0.5 Pulse voltage (V) ( 0.0 0.5 1.0 15 1.5 1 1E-8 1E-6 1E-4 0.01 Time (s) Time (s) F. Alibart et al. Nanotechnology, 23 075201, 2012

Nonlinear switching dynamics effective barrier modulation due to: 1 2 electric field 1 heating ~ k B T U A ion hopping initial profile electrode e oxidation ion hoping z + + v z + electrode e reduction 2 ~Eaq/2 energy 3 U A hop distance a 3 phase transition or redox reaction position J. Yang et al. submitted 2012

Speed vs. retention linear ionic transport store ~ write ( v 0) ( v V ) V I D I V V T nonlinear effect due to temperature and/or electric field e.g. temperature only: A A store V kbtstore kbtwrite ~ ( e e write V T U U ) D.Strukov et al. Appl.Phys.A 94 515 (2009)

Switching statistics RESET SET 0.6 0.8 Voltag tage (V) 1.0 1.2 1.4 10-4 10-5 0.0 2.0x10-6 4.0x10-6 6.0x10-6 80x10 8.0x10-6 1.0x10-5 Cu urrent @ 200m mv (A) Cumulative time (s) -1.4-1.2 Voltage (V) -1.0-0.8 2.0x10-6 -0.6 10 TiO 2 x devices 1.5x10-6 10-5 0.0 5.0x10-7 1.0x10-6 10-4 Cumulat ative time (s) Current @ 200 0mV (A) Large switching dynamics dispersion! Alibart et al., submitted, 2012

Variations in switching behavior g = I(0.2V)/ 0.2 V Current (ma) 1.0 1.0 0 1.0 0 1.0 Voltage (V) write 1 10 g INITIAL g AFTER /g SET S = tune read RESET Syn ynaptic weight ht, ms) Continuous state change ginitial (ms 01 0.1 1 1 0-1 RESET Pulse voltage (V) SET Alibart et al., submitted, 2012

Tuning algorithm Processing Write V WRITE = V WRITE +sign * T VSTEP oldsign = sign apply pulse V WRITE Start (inputs: desired state I desired, desired accuracy A desired ; initialize: write voltage to small non disturbing value V WRITE = 200 mv, voltage step T VSTEP = 10 mv; Read (apply V READ = 200 mv and read current I current ) Processing Is state reached within required precision, i.e. (I desired I current )/ I desired < A desired? yes no Processing check for overshoot and set the sign of increment, i.e. sign = I current I desired ; if V WRITE!=V READ and sign!=oldsign then initialize V WRITE = 200 mv Finish Intuitive algorithm voltage set 0 time read reset Implemented algorithm voltage set time 0 read reset non disturbing pulse F. Alibart et al. Nanotechnology, 23 075201, 2012

High precision tuning mv (A) t @-200 1E-4 voltage 0 Decrease set Weight time Increase Weight Stand-by (Read only) reset read 32 120 A 60 A 30 A TiO 2 x devices (w/o protrusion) 100(g des g act )/g des <1% ~ 8 bit precision Current 1E-5 31 30 29 28 950 1000 1050 1100 1150 15 A 7 A 0 1000 2000 3000 Pulse Number F. Alibart et al. Nanotechnology, 23 075201, 2012

Limitation to tuning accuracy: Random telegraph noise.u.) Current (a 3 5k 5k 4k 2k 1k PSD/I 2 (H Hz -1 ) 10-8 10-9 10-10 10-11 R/R (%) 2 1 0 2 4 6 8 10 Resistance (k ) 4k 2k 1k 0.5k 0.5k 0.2 0.4 0.6 0.8 1.0 1.2 Time (s) 10-12 0.2 0.4 0.6 0.8 Time (s) 10 2 10 3 10 4 Frequency (Hz) Solid state electrolyte (electrochemical) are noisier The higher R, the larger is noise For a Si limit to ~5 6 bit precision (but no optimization) Ligang Gao et al, VLSI SoC, 2012

Perceptron experimental setup V Arbitrary waveform generator B1530 A t Switching matrix (Agilent E5250A) Current measurement B1530 (fast IV mode) Ground (GNDU, Agilent) Agilent B1500 Wires implementing crossbar circuit Chip packaged wire bonded memristive devices Alibart et al., submitted, 2012

Perceptron: Ex situ training s 1 s v 1 05 0.5 read pulse v v s 2 t write pulse t s 2 Synaptic weight, g (m ms) g Evolution of synaptic conductance upon sequential tuning 0.6 0.4 0.3 0.2 0.1 weight import accuracy ~10% final weights after programming g + tuning g g + i+, i 1 2 3 4 5 6 weight slightly affected by half select problem 0.0 voltage at g - 0 20 40 60 80 100 120 250 300 8 t +V switch -V switch Pulse number # Crossbar half select trick Half selected l ddevices slightly l affected (>5 bit precision) ii 7 8 9 10 Alibart et al., submitted, 2012

s 1 s 2 g + 1 g + 4 g 1 - g 4 - Perceptron: In situ training s 3 s 4 g ± i = ±αx i (d (p) y (p) ) Four steps α (V, g) 0.00-0.05 0.1 0.0-0.1 Evolution of synaptic conductance upon parallel tuning V train = 0.9V V train = 1V g g +V train /2 v -V t train /2 v v s 1 =PS x=+1 voltage at g1 + 1 x=+1 1 2 3 4 s 2 =PS x= 1 1 s 3=PS + d=+1 voltage at g 1 v t -V train voltage at g - 1 v t t t voltage at g + 4 v t g g g (ms) 005 0.05 0.00 0.00-0.05 0.05 0.00 0.15 0.00-0.15-0.15-0.20 01 0.1 0.0-0.05-0.10-0.15 g g g g g g g v +V switch -V switch s 4=PS d=+1 1 t voltage at g - 4 v t 0.1 00 0.0 Alibart et al., submitted, 2012 0 4 8 12 16 Training epoch g

Results Ex situ In situ 10 initial Initial (random (a weights) X T 10 initial X T Number of patterns 0 10 0 10 0 10 accuracy weight import ~ 40% accuracy ~40% 0 accuracy weight import ~ 10% accuracy ~10% Numb ber of pattern ns accuracy weight import ~ 2% 10 accuracy ~2% 10 after 10 epochs with V train =0.9V 0 after 7 more epochs with V train =1V 10 train 0-0.0002 0.0000 0.0002 I + - I - (A) 3 bit is enough for considered task 0-0.0002 0.0000 0.0002 I + - I - (A) Alibart et al., submitted, 2012

Big picture add on CMOS stack Tight integration i with ihcmos logic (CMOL) Multi layer perceptron network x 1 x 2 x 3 weight x 1 w j1 y w j2 y j x 2 w j3 x 3 g j1 g j2 g j3 + memristor x i g i ji CMOS CMOS cell

Spiking Networks and Spike Timing Dependent Plasticity (STDP)

Spiking vs. firing rate neural networks Firing rate (average frequency matters, high frequency level 1, low frequency level 0) Spiking networks Relative timing of the spikes matters Delay between neurons matters Enriches the functionality

Spiking neural networks Spatiotemporal processing Known to happen in biology, e.g. detecting the direction i of the sound with two sensors and two neurons

Polychronization: Computation with Spikes According to Izhikevitch: Accounting for timing of spikes allows to increase the capacity of the network beyond that of Hopfield networks

Hopfield Networks Binary Hopfield network v j ( t 1) sgn[ i 0 w ji v ( t)] i Capacity is p max = N/logN

Polychro nization: Computa tion with Spikes Due to STDP system can self organized to activate various polychronous groups

Spike Timing Dependent Plasticity

STDP Implementation (first attempt) we have implemented a CMOS neuron circuit to convert the relative timing information of the neuron spikes into pulse width information seen by the memristor synapse

STDP Implementation Proposalfor Memristors Assumed rate change as a function of applied voltage

STDP Implementation with PCM

Long Term Depression and Short Term Potentiating

Electronic Pavlov s Dog

Snider s Spiking Networks

Example: Network Self- Organization (Spatial Orientation Filter Array) adaptive recurrent network output + + + + x i input 49 G. Snider, Nanotechnology 18 365202 (2007)