Scalable Front End Designs for Communication and Learning. Aseem Wadhwa, Department of ECE UCSB PhD Defense

Size: px

Start display at page:

Download "Scalable Front End Designs for Communication and Learning. Aseem Wadhwa, Department of ECE UCSB PhD Defense"

Adelia O’Neal’
5 years ago
Views:

1 Scalable Front End Designs for Communication and Learning Aseem Wadhwa, Department of ECE UCSB PhD Defense 1

2 Estimation/Detection Problem Artificial/Natural distortions Receiver eg: Communication System, Inter Symbol Interference eg: Computer Vision = { cars, trucks, planes etc} = raw images

3 Receiver Design Front End Back End System-Specific Generic Principles example: Machine Learning ----.txt Feature Extractor Front End: Bag-of-words (text) SIFT (images) Back End: SVM (support vector machine) 3

4 Receiver Design Front End Back End System-Specific Generic Principles example: Analog to Digital Interfaces Front End: Nyquist Sampling (preserve waveform information) Matched filtering, symbol rate sampling (communication) Back End: Generic MAP detector (maximum a posteriori probability )

5 Scalability Challenges In this talk, we discuss the Front end designs for systems that face Scalability Issues Settings Communication Systems High Bandwidth (s GHz) links ---> power bottleneck with high resolution ADC Need to adapt to low resolution, high Quantization Error Machine Vision Complexity computational ease of implementation Power considerations eg: handheld devices 5

6 3 Canonical Problems We revisit the System Design for 3 specific problems and demonstrate how some of these scalability issues can be addressed 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering Communication System Machine Vision

7 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering 7

8 Modern Communication Receiver Architecture Digitize Early, high resolution samples Received Analog Signal Analog Frontend Analog Downconvert to Baseband, Receive Filter ADC Digital Faithful Conversion DSP Leveraging Moore s Law Synchronization Equalization Error Decoding etc ADC Bottleneck High Speed Links (s Gbps) (eg: chip-chip, mm wave, optical) Effective Number of Bits (ENOB) * Data from B. Murmann, "ADC Performance Survey ," [Online]. Available: html. f s (Hz) - sampling frequency Challenge: Adapting Systems to Low Resolution ADCs

9 Modified Architecture: Bayesian Mixed-Signal Processing TX Channel Analog Frontend (Preprocessing) A-to-D Conversion Coarse Digitally Controlled Feedback (phase rotation, Quantization thresholds etc) Bayesian Inference (DSP) Non-linear Algorithms Circuit implementations: Chan (01), Minwei (01), : Brodersen et al (011) Revisiting classical problems synchronization equalization 9

10 Blind Phase/Frequency Synchronization: System Model TX complex symbols (unknown) Unknown Channel Phase Unknown Frequency Offset Complex AWGN Objective: estimate φ c and Δf, decode b k Simplifying Assumptions: QPSK Non-Dispersive Channel Perfect Timing Sync, Nyquist rate symbols

11 Phase-only Quantization using 1-bit ADCs (AGC-free quantization) Received Passband Waveform Downconversion I Q Pass Linear combinations of I & Q through 1-bit ADCs M= bins ( ADCs) M=1 bins ( ADCs) 11

12 Feedback Transmitter Phase Quantization Channel QPSK Feedback DSP for Bayesian estimation Decoded Symbols Quantized phase measurements (1,,..,M) Digitally controlled Derotation phase: Dither Signal 1

13 Break into sub-problems Phase changes Slowly over a few s of symbols example: T s = ( GHz) -1 ; f c = 0GHz Δf = (ppm)f c =() - * f c η = (π) - radians = 0.03 o per symbol period 1. Blind phase estimation: estimate φ given z k. Once phase ``locked in, start tracking and also decoding 13

14 Bayesian Estimation of φ Compute observation density pmf Use Recursive Bayes Update posterior: or Can choosing θ k (dither) cleverly improve estimation? What if we set θ k = constant (say 0 o )? 1

15 To dither or not (1) Example 1 : M= bins θ k = 0 0 θ k = 0 θ k Random Posterior of φ after 150 symbols (SNR = 5dB) Symmetric Bins and equiprobable QPSK symbols (KL Divergence) 15

16 To dither or not () Example 3 : Asymmetric Bins (M=1) θ k = 0 0 Posterior of φ after 0 symbols (SNR = 5dB) 1

17 To dither or not (3) Example 3 : High SNR: Flat Posterior θ k = 0 0 θ k = 0 θ k Random Posterior of φ after 30 symbols (SNR = 35dB) 1. Dithering is Required. Random Dithering is a robust choice 3. Can we do better than Random Dither? 17

18 What is the Best Dither? Estimation best when Fisher Information maximized High SNR (15dB) Low SNR (5dB) ? ? Bin Edge Bin Center 1

Policies We don t know true φ though Can use our best guess of φ : Maximizing Fisher Information Policy (MFI) Literature: (Atia 13) Optimal if φ MAP close to the true

19 Policies We don t know true φ though Can use our best guess of φ : Maximizing Fisher Information Policy (MFI) Literature: (Atia 13) Optimal if φ MAP close to the true value What if the uncertainty in φ is high? Greedy Entropy Policy (GEP): Choose the next action to minimize the entropy of the next posterior (avg. observation density) 19

20 Greedy Entropy Policy: Properties 1. It can be shown that as number of observations increase, GEP --> MFI, which is optimal for large N. Zero noise Case: is flat, support size: Entropy of a flat distribution eg: GEP is optimal, reduces the support by half every step 0

21 RMSE (in degrees) RMSE (in degrees) Numerical Results: Coarser Quantization (M=) 5 0 RMSE (M= ; SNR=5dB) CRLB GE MFI R 1 1 RMSE (M= ; SNR=15dB) CRLB GE MFI R #symbols low SNR #symbols high SNR At low SNR, GEP slightly better than MFI Random dithering becomes more sub-optimal as SNR increases 1

22 RMSE (in degrees) RMSE (in degrees) Numerical Results: Finer Quantization (M=1) RMSE (M=1 ; SNR=5dB) #symbols low SNR CRLB GE MFI R Const RMSE (M=1 ; SNR=15dB) #symbols high SNR CRLB GE MFI R Const At low SNR, noise provides enough dither, constant action is also fine

23 Literature: Adaptive Control for Estimation Literature: sequential design of experiments, active multihypothesis testing etc Optimal Policy : Equivalent to solving a POMDP (partially observable Markov Decision Process): High complexity : set of policy trees, can be very large, worst case: gridding of belief space (posterior ) Asymptotically Optimal policies have been proposed: not applicable to our problem directly Multihypothesis Testing (Chernoff 19, Nagshvar 13, Nitinawarat 13 etc) Ideas similar to GEP to derive bounds (Nagshvar 13) Continuous parameter estimation (Atia 13) Shows MFI Asymptotically Optimal 3

24 Frequency Tracking Very slowly Varying Phase 1. Sliding MAP estimate of phase coarse estimate. feed to an EKF (Extended Kalman filter) for tracking both phase and frequency

25 Numerical Results 5

26 Take Away 1. Phase/Frequency Synchronization Mixed Signal Front End Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering

27 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering 7

Setup Q( ) TX channel Front End (Rx Filter+ ADC) (dispersive) DSP ADC Just enough precision to preserve information Goal: Fundamental investigation to

28 Setup Q( ) TX channel Front End (Rx Filter+ ADC) (dispersive) DSP ADC Just enough precision to preserve information Goal: Fundamental investigation to characterize BER with as low #slicers as possible to avoid error floors Optimal MAP processing : BCJR Setting: BPSK, high SNR, uncoded, static channel

29 Examples: Dispersive Channels Memory: 5- Symbol Periods Channel Impulse Response : h A (t) Channel A (maximum phase) t h B (t) t Channel B (mixed phase) h C (t) t Channel C (broad peak) (*0 inch FR GHz) Backplane channels (wireline chip-chip high speed links) 9

30 BER Standard Flash ADC Amplitude N-slicers log (N+1)-bit ADC (eg: 3-bit ADC: 7 Slicers) Same Sampling Phase Thresholds spread uniformly Time Unquantized 15 slicers 5 slicers 3 slicers slicers SNR N min = Minimum #slicers to avoid error floors Hard to find Analytically For channel A (δ=0) h = [ ]; 30

31 Standard Flash ADC : N min Lower & Upper Bounds Channel A (maximum phase) N l = N u = N min = 3 Channel B (mixed phase) N l = N u = N min = 5 Channel C (maximum phase, strong peak) N l = N u = N min = Bounds depend on: Relative strength of the strongest tap (Energy spread) Position of the Strongest Tap 31

32 BER BER The Problem with Uniform ADC Amplitude Slicer Thresholds fixed: BER sensitive to channel -1-3-bit ADC (7) -bit ADC (15) Unquantized Channel B 0-1 Vs Large Gap SNR -bit ADC (3) 3-bit ADC (7) Unquantized Channel C Time TSE (Symbol Spaced Equalization) FSE (Fractionally spaced Equalization)--> sample at a rate higher than Nyquist: more Robust SNR Cannot simply double the number of Slicers 3

33 Generalizing the ADC Standard ADC architecture Generalized Space-time Slicer Architecture Waveform shape preservation Information preservation Each Slicer location characterized by Expansion of the Optimization space. Special Cases: 1-bit architecture --> all δ i different TSE uniform -> t i uniform, δ i same TSE non-uniform -> t i non-uniform, δ i same FSE T/ --> δ i = 0 and

34 Key Results Randomly dispersed 1-bit slicers can preserve information Proof uses ideas from compressive sensing Channel optimized slicer thresholds TSE case (non-uniform ADC, fixed sampling phase) FSE T/ case (two fixed sampling phases) 3

35 Algorithm for Finding TSE Thresholds: Idea Problem: Find M Thresholds [t 1,t,.,t M ] to minimize BER Union Bound (MLSE) Pairwise error Probability eg: e = {-1,0} b = {+1,+1} b = {-1,+1} High SNR: error events with weights 1 or bits 35

36 Binary Hypothesis Testing An upper bound of can be computed as a function of a single threshold parameter t i=1 i= i=3 i= i=5 i= i= t -5 min() t Set of M thresholds that minimize the sum over all pairs of sequences We end up minimizing a loose upper bound, using K-means 3

37 BER BER Numerical Results (TSE) Channel B Channel A -1 - unquantized uniform ADC (7 slicers) non-uniform thresholds (7) -1 - unquantized uniform ADC (5 slicers) non-uniform thresholds (5) SNR SNR 37

38 BER Channel C Numerical Results (FSE T/) -1 - uniform ADC (0.5) uniform ADC (0) non-uniform ADC (0.5) non-uniform ADC (0) space-time slicers SNR

39 Take Away 1. Phase/Frequency Synchronization. Equalization No need for waveform-preservation Almost any configuration of Slicers can preserve information Benefit of tailoring to the channel 3. Object Classification Reducing number of Tunable parameters via neuro-mimetic design and clustering 39

40 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering 0

41 Significant recent progress in machine vision Current records held by Supervised Deep Neural Nets Loosely inspired by spatial organization of the visual cortex Hierarchical Structure Krizhevsky et al 01 Ciresan et al 011 1

42 Supervised Deep Nets: Issues Large number of tunable parameters learning rates, weight decay, momentum etc Tricks, clever engineering required to make them work DropOut, DropConnect etc Not clear what information extracted by different layers lower layers? higher layers? Can we put hierarchical feature extraction in an understandable framework?

43 Ideas 1. Neuro-Mimetic Design. Use Standard clustering building blocks + neuro-inspiration 3

44 Loose Neuro-Inspiration plays a key role already Convolutional and Hierarchical Architecture Local spatial organization of cells in the visual cortex Figure from LeCun et al, 199 Rectification Neurons firing when inputs exceed a threshold Local Contrast Normalization (LCN) Local inhibition and competition between neighboring neurons Pooling - Complex cells - Translation invariance Normalize Figure from Brady et al, 000 Much less work on neuro-mimetic computational models

45 How much Supervision do we need? We can see things even if we don t know their labels Perhaps our visual system be extracting a universal set of features? Unsupervised approach has been tried in the literature. eg: Transfer Learning Further effort needed to improve performance 1. Can we leverage everything we know for sure about mammalian vision?. Can most of the learning be unsupervised? 5

46 Architecture N X N (Raw Image) RGC Layer Simple Cell Layer N X N X f Feature Maps 1. Neuro-mimetic Frontend. Unsupervised Feature Extraction Clustering and Pooling Supervised Classifier (eg: SVM or neural net)

47 Neuro-Mimetic Frontend (Most of the modeling work done by Emre Akbas, Post-Doc with Prof. Eckstein in Department of Psychology and Brain Sciences, UCSB) 7

48 How well do we understand the visual pathway? Graphic from Bengio, Lecun ICML Workshop 009, Montreal Scene RGC/LGN optic nerve V1 Simple Cells Complex Cells in the Retina (Retinal Ganglion Cells/ Lateral Geniculate Nucleus) in the cortex

Retinal Ganglion Cells (RGCs) Retina RGC/LGN

Filter References: Wandell 1995 Croner et al

49 Retinal Ganglion Cells (RGCs) Retina RGC/LGN V1 Simple Cells Complex Cells What it does: 1. Luminance Gain Control. Center Surround Filtering 3. Local Contrast Normalization (LCN) and Rectification Retinal Ganglion Cell (RGC) Filter References: Wandell 1995 Croner et al 1995 Carandini et al 005 Luminance Control center ON center OFF Relevant parts of the image lit up (>0) 9

50 V1 Simple Cells Retina RGC/LGN V1 Simple Cells Complex Cells Simple cells sum RGC outputs References: Hubel and Wiesel 19 Edge Oriented Filters etc Output: Feature Maps etc 50

51 RGC + Simple Cells.. X X X N X N (Raw Image) N X N X Feature Maps Each Spatial location: neurons a X1 (x,y) viewing distance : only tunable parameter. We set it guided by the resolution of the image. Receptive field sizes : roughly 7X7 pixels in the original images space 51

52 Clustering Natural candidate to find patterns in data Simple Encoding operation similar to a layer in a neural network cluster center == neuron Processing in Neural Layers Clustering Normalize input and cluster centers and perform K-means (spherical K-means) x 1 x Once centers are learned, non-linear function of soft encoding x d Layer i neurons Layer i+1 neurons Same as neural layer processing 5

53 Clustering Simple Cell Outputs x 1 x x c 1 c Input Data {x X1 } Learn K 1 centers Use online spherical K-means algorithm (Shi Zhong 005) c k1 K-means and Encode N X N (Raw Image) N X N X Simple Cell Outputs N X N X K 1 53

54 Encoding Function (Activations): Sparsity Level choice of f? Patch We choose: soft Threshold choose T to keep a sparsity level, example: 0% / 90% Activations after threshold 5

55 Cluster Centers (correspond to 7 X 7 patches) Mostly Edges Interpretation: orientation sensitive neurons 55

56 N X N (Raw Image) N X N X Simple Cell Outputs N X N X K 1 Next Layer? 1. either feed to the classifier OR. zoom out via pooling and cluster larger patches 5

57 nd layer Cluster Centers curves, t-junctions 57

58 The Datasets (Standard for image classification tests) MNIST NORB (uniform dataset) Truck Car Human Plane Animal digits X images 0K Training data, k Testing 5 objects 9 X 9 dimensions Varying Illumination, Elevation, Rotation 300 Training, 300 Testing 5

59 Experimental results Summary Decent performance on MNIST Beats state of the art on NORB Can do it with very sparse encoding 59

60 Test Scenarios 1. 1 layer of clustering K 1 = layer of clustering K 1 = layers of clustering K 1 =00, K =00 concatenate layer 1 and activations to form the feature vector (consistent with current neuroscientific understanding) Case and 3 : similar lengths of feature vectors No augmentation using affine distortions (translation, rotation etc) of the data RBF (Radial Basis Function) SVM as the final supervised classifier 0

61 MNIST Sparsity level: 0% Sparsity level: 95% Error Rate Layer 1 (K 1 =00) 0.73% Layer 1 (K 1 =00) 0.7% Layer 1 + (K 1 =00,K =00) 0.% Error Rate Layer 1 (K 1 =00) 0.7% Layer 1 (K 1 =00) 0.7% Layer 1 + (K 1 =00,K =00) 0.% State of the art (without distortions): 0.39%, Chen Yu Lee et al, 01, Deeply supervised Nets Works that use unsupervised layers followed by supervised classifiers: 0.% - Lee et al % - Ranzato et al 007, uses -layer neural net for supervised training on top of layers 0.59% - Labusch et al 009, sparsenet algorithm + RBF SVM 1

62 NORB Sparsity level: 0% Sparsity level: 95% Error Rate Layer 1 (K 1 =00) 3.9% Layer 1 (K 1 =00) 3.71% Layer 1 + (K 1 =00,K =00).9% Error Rate Layer 1 (K 1 =00).5% Layer 1 (K 1 =00).5% Layer 1 + (K 1 =00,K =00).90% Performance with single layer greatly improves with more sparsity: better than state of the art State of the art (with translations):.53% - Ciresan et al 011; (without translation)- 3.9% - supervised Deep Net State of the art (without translation):.7% - Uetz & Behnke 009; supervised Deep Net 3.0% - Coates et al 0, unsupervised + SVM, use K-means (K=000) 5.% - Jarrett et al 009, unsupervised pretraining + fine tuning

63 A promising start Neuromimetic frontend + clustering is a promising approach to universal feature extraction Easy to implement, very few tunable parameters (viewing distance, #cluster centers, sparsity level) Potential for interpreting cluster centers as successively more abstract and zoomed out representations 3

64 Many unanswered questions How to tell if we are capturing all the information? Is there an alternative metric to classification performance? Impact of design choices on classification performance appears to be dataset dependent How many layers? What sparsity level? Layer-dependent sparsity? What are the best approaches for low-power hardware implementations? Power savings from sparsity, backend complexity

65 List of Publications E. Akbas, A. Wadhwa, M. Eckstein and U. Madhow, A Framework for Machine Vision based on Neuro-Mimetic Front End Processing and Clustering, Proc. of 5nd Allerton Conference on Communication Control and Computing, October 01 A. Wadhwa, U. Madhow and N. Shanbhag, Space-time Slicer Architectures for Analog to Information Conversion in Channel Equalizers, in Proc. of IEEE International Conference on Communications (ICC'1), Sydney, Australia, June 01 A. Wadhwa and U. Madhow, Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach, Proc. of 51st Allerton Conference on Communication Control and Computing, Oct 013 A. Wadhwa, U. Madhow, J. Hespanha and B.Sadler, Following an RF trail to its source, Proc. of 9th Allerton Conference on Communication Control and Computing, Sept 011 Two Journal Submissions in Preparation 5

66 Acknowledgements Advisor: Prof. Upamanyu Madhow Committee members Funding Sources: Institute of Collaborative Biotechnologies (ICB), Army Research Office (ARO), SONIC, MARCO, DARPA Collaborators Prof. Hespanha, Prof. Shanbhag, Prof. Ashby and Prof. Eckstein Students: Jason, Yingyan, Erick, Ben & Emre WCSL Colleagues Questions?

67 7

68 Back-up Slides

69 9

70 Another Way of looking at it: maximize Information Utility KL Divergence between Theorem: Suppose, then Specifically, 70

., Nagshvar 013): Multihypotheis Testing avg. no. of samples prob.

71 Literature: Adaptive Control for Estimation Literature: sequential design of experiments, active hypothesis testing etc (Chernoff 1959,.., Nagshvar 013): Multihypotheis Testing avg. no. of samples prob. of error Value function/optimal Cost satisfies the fixed point/dp equation: expected value function on taking action a when belief is ρ 71

72 Idea (Algorithm) 1. MLSE (maximum likelihood sequence estimation): performance similar to BCJR for uncoded systems. P e < P u union bound 3. High SNR---> truncate P u to a few dominant terms. Truncated Sum: further upper bound, simple to evaluate 5. Final cost function : approximate upper bound Ω n represents a pair of bit sequences f( ) function of a scalar variable t given Ω n final optimization can be solved using K-means with M cluster centers 7

73 Union Bound eg: e = {-1,0} b = {+1,+1} b = {-1,+1} Truncated error events: : pairwise error probability of sequences 73

eg: X 0 (i) t X 1 (i) eg: 1.5 1 0.5 0-0.5-1 -1.

74 Reduces to binary Hypothesis testing between vectors H 0 : H 1 : element wise prob. of error is easy to evaluate. eg: X 0 (i) t X 1 (i) eg: i=1 i= i=3 i= i=5 i= i= t t 7

75 Questions 1. Can a dispersed slicer architecture guarantee channel inversion? - Can be shown that even randomly choosing t i works - Intuitively should work with enough #slicers Result : 1-bit Quantization with uniformly distributed thresholds T = [t 1,.,t n ] guarantees no error floor if n is large enough Quantization Preserves pair wise L1 norm distances 75

76 Result : 1-bit Quantization with uniformly distributed thresholds T = [t 1,.,t n ] guarantees absence of an error floor if n is large enough Reassuring Result Discussion: Next few Slides.. 7

77 Received Signal: Samples: or Lower Bound Information Rate* (LT s : length of channel) * Zeitler, G.; Singer, A.C.; Kramer, G., "Low-Precision A/D Conversion for Maximum Information Rate in Channels with Memory," Communications, IEEE Transactions on, September 01 77

78 Received Signal: Samples: or Lower Bound Information Rate* (LT s : length of channel) Observations affected by b i Conditioned on past bits * Zeitler, G.; Singer, A.C.; Kramer, G., "Low-Precision A/D Conversion for Maximum Information Rate in Channels with Memory," Communications, IEEE Transactions on, September 01 7

79 Wlog, consider i=0 i.e. to decode b 0 Fix past bits {b -(L+1),,b -1 } s(t) := {r(t), t ϵ [0,LT s ]} S 0 = {s(t) ; b 0 = -1} ; S 1 = {s(t) ; b 0 = +1} S 0 = S 1 = L-1, as future bits are varied Sample s(t) n times to get x Set of sampled signals - S 0 X 0, S 1 X 1 Condition for No Error Floor (I Lower bound = 1) 1 79

80 Result : 1-bit Quantization with uniformly distributed thresholds T = [t 1,.,t n ] guarantees no error floor if n is large enough Quantization Preserves pair wise L1 norm distances 0

81 Sketch of Proof Similar to Johnson-Lindenstrauss (JL) Lemma (Compressive sensing Literature) u, v ϵ R d f(u), f(v) ϵ R k, random subspace 1

82 Sketch of Proof Note: q(x) is a binary vector Steps: (for each pair of x 0 and x 1, conditioned over past bits) 1.. Bound probability of large deviations (Chernoff Bound) 3. Invoke union bound to consider all set of past bits

83 Does Unsupervised Learning Make Sense? It should.even though most state of the art Nets are purely supervised Transfer Learning has been shown to help Intuitively: extract low level features like edges, corners, t-junction etc independent of labels Low level Representative features (unsupervised) followed by higher level Discriminative features (supervised) Not entirely a new idea: Lee et al 009, Jarrett et 009, Kavukcuoglo et al 0, Zeiler et al 011, Labusch et al 009 etc Our approach: clustering to extract low level features 3 3

84 Related Work Unsupervised layers followed by Supervised layers Lee et al 009, Jarrett et 009, Kavukcuoglo et al 0, Zeiler et al 011, Labusch et al 009 All of them utilize some form of reconstruction + sparsity cost function for unsupervised training We investigate a much simpler alternative (K-means clustering) Using K-means clustering to build features Coates and Ng 011, 01 Use K-means directly on raw images Number of cluster centers learned very large (a few thousands) We build clustering on top of neuro-mimetic preprocessing, we get competitive performance with a few cluster centers Benefits of tuning Pre-processing Step Ciresan et al 011, Fukushima 003 gains obtained using contrast normalization, center surround filtering

85 RBF SVM as Supervised Classifier Since we are using clustering to organize patterns, we expect a classifier that uses a Gaussian mixture model (GMM) to do well: Radial Basis function classifier (Sung 199) RBF classifier: like GMM but mixing probabilities trained discriminatively RBF+ SVM superior to GMM-based classification (Scholkopf et al 199) Scale parameter of RBF kernel set via cross-validation on training set. 5 5

86 Area of Active Research Systems Level Research (Dealing with the Quantization Non-linearity) Capacity of AWGN channel - Singh et al (009) Capacity of Block Non-coherent Communication - Singh et al (009) Fading Channel Capacity Krone et al(0) Equalization Information Rate Zeitler et al (0) Channel Estimation Dabeer et al (0), Zeitler et al(011) Time-Interleaved ADC Mismatch- Ponnuru et al (011) Circuit level Research: Analog Signal Processing, Circuit implementations (>1Gbps) Carrier phase recovery: Brodersen et al (011) Continuous time Equalizers: Wang (0), Alon et al (011), Chan (01) and many more..

87 Bayesian Estimation of φ {1,3,5,7} Blind Mode (Unknown Sequence) net phase rotation observations: 1. Derive Conditional distribution of u given β. Compute observation density pmf 3. Use Bayes rule to get the recursive equation: or 7

88 1. Conditional distribution of u : has a closed form expression β = 0 shifts circularly with β.

89 Setup Q( ) TX Front End channel (Rx Filter+ A/D) A (dispersive) B C DSP Objective: Designing Q( ) to optimize Equalization performance (BER), while keeping bits of precision as low as possible 9

90 Supervised Deep Nets: Issues Large number of tunable parameters Long training times Overfitting an issue Large amounts of labeled training data required Tricks, clever engineering required to make them work DropOut, DropConnect etc setting correct values of learning rates, weight decay, momentum Not clear what information extracted by different layers lower layers? higher layers? Simpler ways of implementing hierarchical feature extraction? 90

91 RGC + Simple Cells N X N (Raw Image) N X N X Feature Maps Each Spatial location: neurons their activations represent 7X7 patches Example: X 7 Patch Simple Cell Response Response prior to LCN 91

inhibits weaker responses Example:.5 0. 0.5 1 3 5 7 1 3 5 7 1.5 1 0.5 0. 0.3 0.

92 LCN (Local Contrast Normalization) operation: i : feature index j : spatial index sum over features sum over spatial neighborhood normalizes activations inhibits weaker responses Example: X 7 Patch Simple Cell Response Response prior to LCN 9

93 N X N (Raw Image) N X N X Simple Cell Outputs N X N X K 1 Next Layer? 1. either feed to classifier OR. zoom out via pooling and cluster larger patches 93

94 Next layer: Option 1 Pool and feed it to a supervised classifier Pooling: Local Translation invariance, edge anywhere in a cell pooling over X Grid eg: MNIST, K 1 = 00 before pooling X X 00 after pooling X X 00 = 300: length of the feature 9

95 Next layer: Option One more layer of extracting unsupervised features before discriminative learning Zoom out and cluster: Receptive fields now larger than 7X7 Local X pooling and X concatenation X X 00 7 X 7 X 00 Represents X Patches K 1 :#first layer centers 95

nd Layer Clustering 1 15 3 99 3 15 139 X Patch 3 Individual matching score (0-1) 9 91 1 9 919 19 9 91 1 9

119 19 11 1 1 1 13 13 1 1 13 13 19 19 13 13 1 1 13 13 13 131 1 13 1319 19 13 131 1 13 13 13 15 555

96 nd Layer Clustering X Patch 3 Individual matching score (0-1) Similarity metric for clustering: average matching score over the quadrants Concatenation pooling 3 0 1

Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach

Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach Aseem Wadhwa, Upamanyu Madhow Department of ECE, UCSB 1/26 Modern Communication Receiver Architecture Analog Digital TX