Applications of Memristors in ANNs
Outline Brief intro to ANNs Firing rate networks Single layer perceptron experiment Other (simulation) examples Spiking networks and STDP
ANNs ANN is bio inpsired inpsired massively parallel network, i.e. directed graph, with nodes acting as neurons and edges acting as synapses. The functionality is learned during training phase by changing weights of synapses By topology By learning paradigm By coding neural information
Very good review
Applications
Complexity ~ 10 11 neurons ~ 10 15 synapses Connectivity ~ 1 : 10000 Massive parallelism li 100 steps long rule: few to several hundred hertz; face recognition in ~100 ms Challenges 2 3 mm think, 2200 cm 2
McCulloch Pitts neuron 1943 dff different activation functions
By topology
By learning paradigm Key questions: Capacity, Sample complexity, Computational complexity
By information coding Firing rate vs spiking Firing rate vs spiking models
Perceptron: Main idea Single layer perceptron Bias, x 0 x x 1 x 2 w 1 w 0 9 x 3 y sgn[ w x i i ] x 9 w 9 i 0
Hebbian rule Learning using local information Orientation selectivity
Multilayer perceptron Key questions: number of layers, number of hidden neurons
Backpropagation Gradient descent method to minimize i i cost function
Competitive learning
Learning binary patterns with competitive network Instar learning law: What happens if more than four unique patterns are presented? What happens when all white pattern is presented?
Complementary coding Resolve no signal issue for a particular (instar) learning law How to learn invariance? (translation, size, angle etc.)
With added complex cells
With added complex cells AND in bottom layer OR in top present one hot AND in bottom layer, OR in top, present one hot patterns to the top layer
Perceptron: Main idea Single layer perceptron Bias, x 0 x 1 hw bottleneck x w 1 2 w 0 9 x 3 y sgn[ w i x i ] i 0 Binary pixel array x 1 x 4 x 7 x 2 x x x = +1 x 5 x 8 x 3 x 6 x 9 x = 1 w 9 w x 9 9 Considered training/test patterns Perceptron training rule: w i = αx (p) i (d (p) y (p) ) Pattern X, class d = +1 Crossbar implementation V x G + -G - = G w V 0 V 1 V 9 V 2 G 0+ G 1+ G + 2 G + 9 I + G 0 G 1 G 2 G 9 I y = sgn[i I + -II - ] A A + param. analyzerbased Pattern T, class d = 1 Alibart et al., submitted, 2012
Windrow s memistor AdaLiNe concept and hardware implementation Bernard Widrow Marcian Hoff B. Widrow and M.E. Hoff, Jr., IRE WESCON Convention Record, 4:96 1960
Pt/TiO 2 x /Pt devices g=i(0.2v)/ 0.2 V 1.0 S = 25 nm Au / 15 nm Pt top electrode e beam 30 nm TiO 2 x patterned Pt protrusion 5 nm Ti / 25 nm Pt bottom electrode Curr rent (ma) 1.0 0 Alibart et al., submitted, 2012 S A 1.0 0 1.0 Voltage (V) V switch +V switch V 20 nm Any state between ON and OFF In principle dynamic system with frequency dependent loop size but. Strongly (superexp) nonlinear switching dynamics Gray area = no change State defined within gray area
Switching dynamics voltage set initialize to R 0FF time read reset initialize to R 0N Small pulse amp = finer state change but may require exp long time Large pulse amp faster but at cruder step RESET: R 0 =R ON SET: R 0 =R OFF 100 10 1 R/R 0 0.1 mv (A) Current @ -200 1E-4 1E-5-0.5V to -0.8V -0.9V -1.0V -1.1V -1.2V -1.3V 0 1x10-5 2x10-5 -1.5-1.0-0.5 Pulse voltage (V) ( 0.0 0.5 1.0 15 1.5 1 1E-8 1E-6 1E-4 0.01 Time (s) Time (s) F. Alibart et al. Nanotechnology, 23 075201, 2012
Nonlinear switching dynamics effective barrier modulation due to: 1 2 electric field 1 heating ~ k B T U A ion hopping initial profile electrode e oxidation ion hoping z + + v z + electrode e reduction 2 ~Eaq/2 energy 3 U A hop distance a 3 phase transition or redox reaction position J. Yang et al. submitted 2012
Speed vs. retention linear ionic transport store ~ write ( v 0) ( v V ) V I D I V V T nonlinear effect due to temperature and/or electric field e.g. temperature only: A A store V kbtstore kbtwrite ~ ( e e write V T U U ) D.Strukov et al. Appl.Phys.A 94 515 (2009)
Switching statistics RESET SET 0.6 0.8 Voltag tage (V) 1.0 1.2 1.4 10-4 10-5 0.0 2.0x10-6 4.0x10-6 6.0x10-6 80x10 8.0x10-6 1.0x10-5 Cu urrent @ 200m mv (A) Cumulative time (s) -1.4-1.2 Voltage (V) -1.0-0.8 2.0x10-6 -0.6 10 TiO 2 x devices 1.5x10-6 10-5 0.0 5.0x10-7 1.0x10-6 10-4 Cumulat ative time (s) Current @ 200 0mV (A) Large switching dynamics dispersion! Alibart et al., submitted, 2012
Variations in switching behavior g = I(0.2V)/ 0.2 V Current (ma) 1.0 1.0 0 1.0 0 1.0 Voltage (V) write 1 10 g INITIAL g AFTER /g SET S = tune read RESET Syn ynaptic weight ht, ms) Continuous state change ginitial (ms 01 0.1 1 1 0-1 RESET Pulse voltage (V) SET Alibart et al., submitted, 2012
Tuning algorithm Processing Write V WRITE = V WRITE +sign * T VSTEP oldsign = sign apply pulse V WRITE Start (inputs: desired state I desired, desired accuracy A desired ; initialize: write voltage to small non disturbing value V WRITE = 200 mv, voltage step T VSTEP = 10 mv; Read (apply V READ = 200 mv and read current I current ) Processing Is state reached within required precision, i.e. (I desired I current )/ I desired < A desired? yes no Processing check for overshoot and set the sign of increment, i.e. sign = I current I desired ; if V WRITE!=V READ and sign!=oldsign then initialize V WRITE = 200 mv Finish Intuitive algorithm voltage set 0 time read reset Implemented algorithm voltage set time 0 read reset non disturbing pulse F. Alibart et al. Nanotechnology, 23 075201, 2012
High precision tuning mv (A) t @-200 1E-4 voltage 0 Decrease set Weight time Increase Weight Stand-by (Read only) reset read 32 120 A 60 A 30 A TiO 2 x devices (w/o protrusion) 100(g des g act )/g des <1% ~ 8 bit precision Current 1E-5 31 30 29 28 950 1000 1050 1100 1150 15 A 7 A 0 1000 2000 3000 Pulse Number F. Alibart et al. Nanotechnology, 23 075201, 2012
Limitation to tuning accuracy: Random telegraph noise.u.) Current (a 3 5k 5k 4k 2k 1k PSD/I 2 (H Hz -1 ) 10-8 10-9 10-10 10-11 R/R (%) 2 1 0 2 4 6 8 10 Resistance (k ) 4k 2k 1k 0.5k 0.5k 0.2 0.4 0.6 0.8 1.0 1.2 Time (s) 10-12 0.2 0.4 0.6 0.8 Time (s) 10 2 10 3 10 4 Frequency (Hz) Solid state electrolyte (electrochemical) are noisier The higher R, the larger is noise For a Si limit to ~5 6 bit precision (but no optimization) Ligang Gao et al, VLSI SoC, 2012
Perceptron experimental setup V Arbitrary waveform generator B1530 A t Switching matrix (Agilent E5250A) Current measurement B1530 (fast IV mode) Ground (GNDU, Agilent) Agilent B1500 Wires implementing crossbar circuit Chip packaged wire bonded memristive devices Alibart et al., submitted, 2012
Perceptron: Ex situ training s 1 s v 1 05 0.5 read pulse v v s 2 t write pulse t s 2 Synaptic weight, g (m ms) g Evolution of synaptic conductance upon sequential tuning 0.6 0.4 0.3 0.2 0.1 weight import accuracy ~10% final weights after programming g + tuning g g + i+, i 1 2 3 4 5 6 weight slightly affected by half select problem 0.0 voltage at g - 0 20 40 60 80 100 120 250 300 8 t +V switch -V switch Pulse number # Crossbar half select trick Half selected l ddevices slightly l affected (>5 bit precision) ii 7 8 9 10 Alibart et al., submitted, 2012
s 1 s 2 g + 1 g + 4 g 1 - g 4 - Perceptron: In situ training s 3 s 4 g ± i = ±αx i (d (p) y (p) ) Four steps α (V, g) 0.00-0.05 0.1 0.0-0.1 Evolution of synaptic conductance upon parallel tuning V train = 0.9V V train = 1V g g +V train /2 v -V t train /2 v v s 1 =PS x=+1 voltage at g1 + 1 x=+1 1 2 3 4 s 2 =PS x= 1 1 s 3=PS + d=+1 voltage at g 1 v t -V train voltage at g - 1 v t t t voltage at g + 4 v t g g g (ms) 005 0.05 0.00 0.00-0.05 0.05 0.00 0.15 0.00-0.15-0.15-0.20 01 0.1 0.0-0.05-0.10-0.15 g g g g g g g v +V switch -V switch s 4=PS d=+1 1 t voltage at g - 4 v t 0.1 00 0.0 Alibart et al., submitted, 2012 0 4 8 12 16 Training epoch g
Results Ex situ In situ 10 initial Initial (random (a weights) X T 10 initial X T Number of patterns 0 10 0 10 0 10 accuracy weight import ~ 40% accuracy ~40% 0 accuracy weight import ~ 10% accuracy ~10% Numb ber of pattern ns accuracy weight import ~ 2% 10 accuracy ~2% 10 after 10 epochs with V train =0.9V 0 after 7 more epochs with V train =1V 10 train 0-0.0002 0.0000 0.0002 I + - I - (A) 3 bit is enough for considered task 0-0.0002 0.0000 0.0002 I + - I - (A) Alibart et al., submitted, 2012
Big picture add on CMOS stack Tight integration i with ihcmos logic (CMOL) Multi layer perceptron network x 1 x 2 x 3 weight x 1 w j1 y w j2 y j x 2 w j3 x 3 g j1 g j2 g j3 + memristor x i g i ji CMOS CMOS cell
Spiking Networks and Spike Timing Dependent Plasticity (STDP)
Spiking vs. firing rate neural networks Firing rate (average frequency matters, high frequency level 1, low frequency level 0) Spiking networks Relative timing of the spikes matters Delay between neurons matters Enriches the functionality
Spiking neural networks Spatiotemporal processing Known to happen in biology, e.g. detecting the direction i of the sound with two sensors and two neurons
Polychronization: Computation with Spikes According to Izhikevitch: Accounting for timing of spikes allows to increase the capacity of the network beyond that of Hopfield networks
Hopfield Networks Binary Hopfield network v j ( t 1) sgn[ i 0 w ji v ( t)] i Capacity is p max = N/logN
Polychro nization: Computa tion with Spikes Due to STDP system can self organized to activate various polychronous groups
Spike Timing Dependent Plasticity
STDP Implementation (first attempt) we have implemented a CMOS neuron circuit to convert the relative timing information of the neuron spikes into pulse width information seen by the memristor synapse
STDP Implementation Proposalfor Memristors Assumed rate change as a function of applied voltage
STDP Implementation with PCM
Long Term Depression and Short Term Potentiating
Electronic Pavlov s Dog
Snider s Spiking Networks
Example: Network Self- Organization (Spatial Orientation Filter Array) adaptive recurrent network output + + + + x i input 49 G. Snider, Nanotechnology 18 365202 (2007)