Scalable Front End Designs for Communication and Learning. Aseem Wadhwa, Department of ECE UCSB PhD Defense

Size: px
Start display at page:

Download "Scalable Front End Designs for Communication and Learning. Aseem Wadhwa, Department of ECE UCSB PhD Defense"

Transcription

1 Scalable Front End Designs for Communication and Learning Aseem Wadhwa, Department of ECE UCSB PhD Defense 1

2 Estimation/Detection Problem Artificial/Natural distortions Receiver eg: Communication System, Inter Symbol Interference eg: Computer Vision = { cars, trucks, planes etc} = raw images

3 Receiver Design Front End Back End System-Specific Generic Principles example: Machine Learning ----.txt Feature Extractor Front End: Bag-of-words (text) SIFT (images) Back End: SVM (support vector machine) 3

4 Receiver Design Front End Back End System-Specific Generic Principles example: Analog to Digital Interfaces Front End: Nyquist Sampling (preserve waveform information) Matched filtering, symbol rate sampling (communication) Back End: Generic MAP detector (maximum a posteriori probability )

5 Scalability Challenges In this talk, we discuss the Front end designs for systems that face Scalability Issues Settings Communication Systems High Bandwidth (s GHz) links ---> power bottleneck with high resolution ADC Need to adapt to low resolution, high Quantization Error Machine Vision Complexity computational ease of implementation Power considerations eg: handheld devices 5

6 3 Canonical Problems We revisit the System Design for 3 specific problems and demonstrate how some of these scalability issues can be addressed 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering Communication System Machine Vision

7 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering 7

8 Modern Communication Receiver Architecture Digitize Early, high resolution samples Received Analog Signal Analog Frontend Analog Downconvert to Baseband, Receive Filter ADC Digital Faithful Conversion DSP Leveraging Moore s Law Synchronization Equalization Error Decoding etc ADC Bottleneck High Speed Links (s Gbps) (eg: chip-chip, mm wave, optical) Effective Number of Bits (ENOB) * Data from B. Murmann, "ADC Performance Survey ," [Online]. Available: html. f s (Hz) - sampling frequency Challenge: Adapting Systems to Low Resolution ADCs

9 Modified Architecture: Bayesian Mixed-Signal Processing TX Channel Analog Frontend (Preprocessing) A-to-D Conversion Coarse Digitally Controlled Feedback (phase rotation, Quantization thresholds etc) Bayesian Inference (DSP) Non-linear Algorithms Circuit implementations: Chan (01), Minwei (01), : Brodersen et al (011) Revisiting classical problems synchronization equalization 9

10 Blind Phase/Frequency Synchronization: System Model TX complex symbols (unknown) Unknown Channel Phase Unknown Frequency Offset Complex AWGN Objective: estimate φ c and Δf, decode b k Simplifying Assumptions: QPSK Non-Dispersive Channel Perfect Timing Sync, Nyquist rate symbols

11 Phase-only Quantization using 1-bit ADCs (AGC-free quantization) Received Passband Waveform Downconversion I Q Pass Linear combinations of I & Q through 1-bit ADCs M= bins ( ADCs) M=1 bins ( ADCs) 11

12 Feedback Transmitter Phase Quantization Channel QPSK Feedback DSP for Bayesian estimation Decoded Symbols Quantized phase measurements (1,,..,M) Digitally controlled Derotation phase: Dither Signal 1

13 Break into sub-problems Phase changes Slowly over a few s of symbols example: T s = ( GHz) -1 ; f c = 0GHz Δf = (ppm)f c =() - * f c η = (π) - radians = 0.03 o per symbol period 1. Blind phase estimation: estimate φ given z k. Once phase ``locked in, start tracking and also decoding 13

14 Bayesian Estimation of φ Compute observation density pmf Use Recursive Bayes Update posterior: or Can choosing θ k (dither) cleverly improve estimation? What if we set θ k = constant (say 0 o )? 1

15 To dither or not (1) Example 1 : M= bins θ k = 0 0 θ k = 0 θ k Random Posterior of φ after 150 symbols (SNR = 5dB) Symmetric Bins and equiprobable QPSK symbols (KL Divergence) 15

16 To dither or not () Example 3 : Asymmetric Bins (M=1) θ k = 0 0 Posterior of φ after 0 symbols (SNR = 5dB) 1

17 To dither or not (3) Example 3 : High SNR: Flat Posterior θ k = 0 0 θ k = 0 θ k Random Posterior of φ after 30 symbols (SNR = 35dB) 1. Dithering is Required. Random Dithering is a robust choice 3. Can we do better than Random Dither? 17

18 What is the Best Dither? Estimation best when Fisher Information maximized High SNR (15dB) Low SNR (5dB) ? ? Bin Edge Bin Center 1

19 Policies We don t know true φ though Can use our best guess of φ : Maximizing Fisher Information Policy (MFI) Literature: (Atia 13) Optimal if φ MAP close to the true value What if the uncertainty in φ is high? Greedy Entropy Policy (GEP): Choose the next action to minimize the entropy of the next posterior (avg. observation density) 19

20 Greedy Entropy Policy: Properties 1. It can be shown that as number of observations increase, GEP --> MFI, which is optimal for large N. Zero noise Case: is flat, support size: Entropy of a flat distribution eg: GEP is optimal, reduces the support by half every step 0

21 RMSE (in degrees) RMSE (in degrees) Numerical Results: Coarser Quantization (M=) 5 0 RMSE (M= ; SNR=5dB) CRLB GE MFI R 1 1 RMSE (M= ; SNR=15dB) CRLB GE MFI R #symbols low SNR #symbols high SNR At low SNR, GEP slightly better than MFI Random dithering becomes more sub-optimal as SNR increases 1

22 RMSE (in degrees) RMSE (in degrees) Numerical Results: Finer Quantization (M=1) RMSE (M=1 ; SNR=5dB) #symbols low SNR CRLB GE MFI R Const RMSE (M=1 ; SNR=15dB) #symbols high SNR CRLB GE MFI R Const At low SNR, noise provides enough dither, constant action is also fine

23 Literature: Adaptive Control for Estimation Literature: sequential design of experiments, active multihypothesis testing etc Optimal Policy : Equivalent to solving a POMDP (partially observable Markov Decision Process): High complexity : set of policy trees, can be very large, worst case: gridding of belief space (posterior ) Asymptotically Optimal policies have been proposed: not applicable to our problem directly Multihypothesis Testing (Chernoff 19, Nagshvar 13, Nitinawarat 13 etc) Ideas similar to GEP to derive bounds (Nagshvar 13) Continuous parameter estimation (Atia 13) Shows MFI Asymptotically Optimal 3

24 Frequency Tracking Very slowly Varying Phase 1. Sliding MAP estimate of phase coarse estimate. feed to an EKF (Extended Kalman filter) for tracking both phase and frequency

25 Numerical Results 5

26 Take Away 1. Phase/Frequency Synchronization Mixed Signal Front End Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering

27 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering 7

28 Setup Q( ) TX channel Front End (Rx Filter+ ADC) (dispersive) DSP ADC Just enough precision to preserve information Goal: Fundamental investigation to characterize BER with as low #slicers as possible to avoid error floors Optimal MAP processing : BCJR Setting: BPSK, high SNR, uncoded, static channel

29 Examples: Dispersive Channels Memory: 5- Symbol Periods Channel Impulse Response : h A (t) Channel A (maximum phase) t h B (t) t Channel B (mixed phase) h C (t) t Channel C (broad peak) (*0 inch FR GHz) Backplane channels (wireline chip-chip high speed links) 9

30 BER Standard Flash ADC Amplitude N-slicers log (N+1)-bit ADC (eg: 3-bit ADC: 7 Slicers) Same Sampling Phase Thresholds spread uniformly Time Unquantized 15 slicers 5 slicers 3 slicers slicers SNR N min = Minimum #slicers to avoid error floors Hard to find Analytically For channel A (δ=0) h = [ ]; 30

31 Standard Flash ADC : N min Lower & Upper Bounds Channel A (maximum phase) N l = N u = N min = 3 Channel B (mixed phase) N l = N u = N min = 5 Channel C (maximum phase, strong peak) N l = N u = N min = Bounds depend on: Relative strength of the strongest tap (Energy spread) Position of the Strongest Tap 31

32 BER BER The Problem with Uniform ADC Amplitude Slicer Thresholds fixed: BER sensitive to channel -1-3-bit ADC (7) -bit ADC (15) Unquantized Channel B 0-1 Vs Large Gap SNR -bit ADC (3) 3-bit ADC (7) Unquantized Channel C Time TSE (Symbol Spaced Equalization) FSE (Fractionally spaced Equalization)--> sample at a rate higher than Nyquist: more Robust SNR Cannot simply double the number of Slicers 3

33 Generalizing the ADC Standard ADC architecture Generalized Space-time Slicer Architecture Waveform shape preservation Information preservation Each Slicer location characterized by Expansion of the Optimization space. Special Cases: 1-bit architecture --> all δ i different TSE uniform -> t i uniform, δ i same TSE non-uniform -> t i non-uniform, δ i same FSE T/ --> δ i = 0 and

34 Key Results Randomly dispersed 1-bit slicers can preserve information Proof uses ideas from compressive sensing Channel optimized slicer thresholds TSE case (non-uniform ADC, fixed sampling phase) FSE T/ case (two fixed sampling phases) 3

35 Algorithm for Finding TSE Thresholds: Idea Problem: Find M Thresholds [t 1,t,.,t M ] to minimize BER Union Bound (MLSE) Pairwise error Probability eg: e = {-1,0} b = {+1,+1} b = {-1,+1} High SNR: error events with weights 1 or bits 35

36 Binary Hypothesis Testing An upper bound of can be computed as a function of a single threshold parameter t i=1 i= i=3 i= i=5 i= i= t -5 min() t Set of M thresholds that minimize the sum over all pairs of sequences We end up minimizing a loose upper bound, using K-means 3

37 BER BER Numerical Results (TSE) Channel B Channel A -1 - unquantized uniform ADC (7 slicers) non-uniform thresholds (7) -1 - unquantized uniform ADC (5 slicers) non-uniform thresholds (5) SNR SNR 37

38 BER Channel C Numerical Results (FSE T/) -1 - uniform ADC (0.5) uniform ADC (0) non-uniform ADC (0.5) non-uniform ADC (0) space-time slicers SNR

39 Take Away 1. Phase/Frequency Synchronization. Equalization No need for waveform-preservation Almost any configuration of Slicers can preserve information Benefit of tailoring to the channel 3. Object Classification Reducing number of Tunable parameters via neuro-mimetic design and clustering 39

40 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering 0

41 Significant recent progress in machine vision Current records held by Supervised Deep Neural Nets Loosely inspired by spatial organization of the visual cortex Hierarchical Structure Krizhevsky et al 01 Ciresan et al 011 1

42 Supervised Deep Nets: Issues Large number of tunable parameters learning rates, weight decay, momentum etc Tricks, clever engineering required to make them work DropOut, DropConnect etc Not clear what information extracted by different layers lower layers? higher layers? Can we put hierarchical feature extraction in an understandable framework?

43 Ideas 1. Neuro-Mimetic Design. Use Standard clustering building blocks + neuro-inspiration 3

44 Loose Neuro-Inspiration plays a key role already Convolutional and Hierarchical Architecture Local spatial organization of cells in the visual cortex Figure from LeCun et al, 199 Rectification Neurons firing when inputs exceed a threshold Local Contrast Normalization (LCN) Local inhibition and competition between neighboring neurons Pooling - Complex cells - Translation invariance Normalize Figure from Brady et al, 000 Much less work on neuro-mimetic computational models

45 How much Supervision do we need? We can see things even if we don t know their labels Perhaps our visual system be extracting a universal set of features? Unsupervised approach has been tried in the literature. eg: Transfer Learning Further effort needed to improve performance 1. Can we leverage everything we know for sure about mammalian vision?. Can most of the learning be unsupervised? 5

46 Architecture N X N (Raw Image) RGC Layer Simple Cell Layer N X N X f Feature Maps 1. Neuro-mimetic Frontend. Unsupervised Feature Extraction Clustering and Pooling Supervised Classifier (eg: SVM or neural net)

47 Neuro-Mimetic Frontend (Most of the modeling work done by Emre Akbas, Post-Doc with Prof. Eckstein in Department of Psychology and Brain Sciences, UCSB) 7

48 How well do we understand the visual pathway? Graphic from Bengio, Lecun ICML Workshop 009, Montreal Scene RGC/LGN optic nerve V1 Simple Cells Complex Cells in the Retina (Retinal Ganglion Cells/ Lateral Geniculate Nucleus) in the cortex

49 Retinal Ganglion Cells (RGCs) Retina RGC/LGN V1 Simple Cells Complex Cells What it does: 1. Luminance Gain Control. Center Surround Filtering 3. Local Contrast Normalization (LCN) and Rectification Retinal Ganglion Cell (RGC) Filter References: Wandell 1995 Croner et al 1995 Carandini et al 005 Luminance Control center ON center OFF Relevant parts of the image lit up (>0) 9

50 V1 Simple Cells Retina RGC/LGN V1 Simple Cells Complex Cells Simple cells sum RGC outputs References: Hubel and Wiesel 19 Edge Oriented Filters etc Output: Feature Maps etc 50

51 RGC + Simple Cells.. X X X N X N (Raw Image) N X N X Feature Maps Each Spatial location: neurons a X1 (x,y) viewing distance : only tunable parameter. We set it guided by the resolution of the image. Receptive field sizes : roughly 7X7 pixels in the original images space 51

52 Clustering Natural candidate to find patterns in data Simple Encoding operation similar to a layer in a neural network cluster center == neuron Processing in Neural Layers Clustering Normalize input and cluster centers and perform K-means (spherical K-means) x 1 x Once centers are learned, non-linear function of soft encoding x d Layer i neurons Layer i+1 neurons Same as neural layer processing 5

53 Clustering Simple Cell Outputs x 1 x x c 1 c Input Data {x X1 } Learn K 1 centers Use online spherical K-means algorithm (Shi Zhong 005) c k1 K-means and Encode N X N (Raw Image) N X N X Simple Cell Outputs N X N X K 1 53

54 Encoding Function (Activations): Sparsity Level choice of f? Patch We choose: soft Threshold choose T to keep a sparsity level, example: 0% / 90% Activations after threshold 5

55 Cluster Centers (correspond to 7 X 7 patches) Mostly Edges Interpretation: orientation sensitive neurons 55

56 N X N (Raw Image) N X N X Simple Cell Outputs N X N X K 1 Next Layer? 1. either feed to the classifier OR. zoom out via pooling and cluster larger patches 5

57 nd layer Cluster Centers curves, t-junctions 57

58 The Datasets (Standard for image classification tests) MNIST NORB (uniform dataset) Truck Car Human Plane Animal digits X images 0K Training data, k Testing 5 objects 9 X 9 dimensions Varying Illumination, Elevation, Rotation 300 Training, 300 Testing 5

59 Experimental results Summary Decent performance on MNIST Beats state of the art on NORB Can do it with very sparse encoding 59

60 Test Scenarios 1. 1 layer of clustering K 1 = layer of clustering K 1 = layers of clustering K 1 =00, K =00 concatenate layer 1 and activations to form the feature vector (consistent with current neuroscientific understanding) Case and 3 : similar lengths of feature vectors No augmentation using affine distortions (translation, rotation etc) of the data RBF (Radial Basis Function) SVM as the final supervised classifier 0

61 MNIST Sparsity level: 0% Sparsity level: 95% Error Rate Layer 1 (K 1 =00) 0.73% Layer 1 (K 1 =00) 0.7% Layer 1 + (K 1 =00,K =00) 0.% Error Rate Layer 1 (K 1 =00) 0.7% Layer 1 (K 1 =00) 0.7% Layer 1 + (K 1 =00,K =00) 0.% State of the art (without distortions): 0.39%, Chen Yu Lee et al, 01, Deeply supervised Nets Works that use unsupervised layers followed by supervised classifiers: 0.% - Lee et al % - Ranzato et al 007, uses -layer neural net for supervised training on top of layers 0.59% - Labusch et al 009, sparsenet algorithm + RBF SVM 1

62 NORB Sparsity level: 0% Sparsity level: 95% Error Rate Layer 1 (K 1 =00) 3.9% Layer 1 (K 1 =00) 3.71% Layer 1 + (K 1 =00,K =00).9% Error Rate Layer 1 (K 1 =00).5% Layer 1 (K 1 =00).5% Layer 1 + (K 1 =00,K =00).90% Performance with single layer greatly improves with more sparsity: better than state of the art State of the art (with translations):.53% - Ciresan et al 011; (without translation)- 3.9% - supervised Deep Net State of the art (without translation):.7% - Uetz & Behnke 009; supervised Deep Net 3.0% - Coates et al 0, unsupervised + SVM, use K-means (K=000) 5.% - Jarrett et al 009, unsupervised pretraining + fine tuning

63 A promising start Neuromimetic frontend + clustering is a promising approach to universal feature extraction Easy to implement, very few tunable parameters (viewing distance, #cluster centers, sparsity level) Potential for interpreting cluster centers as successively more abstract and zoomed out representations 3

64 Many unanswered questions How to tell if we are capturing all the information? Is there an alternative metric to classification performance? Impact of design choices on classification performance appears to be dataset dependent How many layers? What sparsity level? Layer-dependent sparsity? What are the best approaches for low-power hardware implementations? Power savings from sparsity, backend complexity

65 List of Publications E. Akbas, A. Wadhwa, M. Eckstein and U. Madhow, A Framework for Machine Vision based on Neuro-Mimetic Front End Processing and Clustering, Proc. of 5nd Allerton Conference on Communication Control and Computing, October 01 A. Wadhwa, U. Madhow and N. Shanbhag, Space-time Slicer Architectures for Analog to Information Conversion in Channel Equalizers, in Proc. of IEEE International Conference on Communications (ICC'1), Sydney, Australia, June 01 A. Wadhwa and U. Madhow, Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach, Proc. of 51st Allerton Conference on Communication Control and Computing, Oct 013 A. Wadhwa, U. Madhow, J. Hespanha and B.Sadler, Following an RF trail to its source, Proc. of 9th Allerton Conference on Communication Control and Computing, Sept 011 Two Journal Submissions in Preparation 5

66 Acknowledgements Advisor: Prof. Upamanyu Madhow Committee members Funding Sources: Institute of Collaborative Biotechnologies (ICB), Army Research Office (ARO), SONIC, MARCO, DARPA Collaborators Prof. Hespanha, Prof. Shanbhag, Prof. Ashby and Prof. Eckstein Students: Jason, Yingyan, Erick, Ben & Emre WCSL Colleagues Questions?

67 7

68 Back-up Slides

69 9

70 Another Way of looking at it: maximize Information Utility KL Divergence between Theorem: Suppose, then Specifically, 70

71 Literature: Adaptive Control for Estimation Literature: sequential design of experiments, active hypothesis testing etc (Chernoff 1959,.., Nagshvar 013): Multihypotheis Testing avg. no. of samples prob. of error Value function/optimal Cost satisfies the fixed point/dp equation: expected value function on taking action a when belief is ρ 71

72 Idea (Algorithm) 1. MLSE (maximum likelihood sequence estimation): performance similar to BCJR for uncoded systems. P e < P u union bound 3. High SNR---> truncate P u to a few dominant terms. Truncated Sum: further upper bound, simple to evaluate 5. Final cost function : approximate upper bound Ω n represents a pair of bit sequences f( ) function of a scalar variable t given Ω n final optimization can be solved using K-means with M cluster centers 7

73 Union Bound eg: e = {-1,0} b = {+1,+1} b = {-1,+1} Truncated error events: : pairwise error probability of sequences 73

74 Reduces to binary Hypothesis testing between vectors H 0 : H 1 : element wise prob. of error is easy to evaluate. eg: X 0 (i) t X 1 (i) eg: i=1 i= i=3 i= i=5 i= i= t t 7

75 Questions 1. Can a dispersed slicer architecture guarantee channel inversion? - Can be shown that even randomly choosing t i works - Intuitively should work with enough #slicers Result : 1-bit Quantization with uniformly distributed thresholds T = [t 1,.,t n ] guarantees no error floor if n is large enough Quantization Preserves pair wise L1 norm distances 75

76 Result : 1-bit Quantization with uniformly distributed thresholds T = [t 1,.,t n ] guarantees absence of an error floor if n is large enough Reassuring Result Discussion: Next few Slides.. 7

77 Received Signal: Samples: or Lower Bound Information Rate* (LT s : length of channel) * Zeitler, G.; Singer, A.C.; Kramer, G., "Low-Precision A/D Conversion for Maximum Information Rate in Channels with Memory," Communications, IEEE Transactions on, September 01 77

78 Received Signal: Samples: or Lower Bound Information Rate* (LT s : length of channel) Observations affected by b i Conditioned on past bits * Zeitler, G.; Singer, A.C.; Kramer, G., "Low-Precision A/D Conversion for Maximum Information Rate in Channels with Memory," Communications, IEEE Transactions on, September 01 7

79 Wlog, consider i=0 i.e. to decode b 0 Fix past bits {b -(L+1),,b -1 } s(t) := {r(t), t ϵ [0,LT s ]} S 0 = {s(t) ; b 0 = -1} ; S 1 = {s(t) ; b 0 = +1} S 0 = S 1 = L-1, as future bits are varied Sample s(t) n times to get x Set of sampled signals - S 0 X 0, S 1 X 1 Condition for No Error Floor (I Lower bound = 1) 1 79

80 Result : 1-bit Quantization with uniformly distributed thresholds T = [t 1,.,t n ] guarantees no error floor if n is large enough Quantization Preserves pair wise L1 norm distances 0

81 Sketch of Proof Similar to Johnson-Lindenstrauss (JL) Lemma (Compressive sensing Literature) u, v ϵ R d f(u), f(v) ϵ R k, random subspace 1

82 Sketch of Proof Note: q(x) is a binary vector Steps: (for each pair of x 0 and x 1, conditioned over past bits) 1.. Bound probability of large deviations (Chernoff Bound) 3. Invoke union bound to consider all set of past bits

83 Does Unsupervised Learning Make Sense? It should.even though most state of the art Nets are purely supervised Transfer Learning has been shown to help Intuitively: extract low level features like edges, corners, t-junction etc independent of labels Low level Representative features (unsupervised) followed by higher level Discriminative features (supervised) Not entirely a new idea: Lee et al 009, Jarrett et 009, Kavukcuoglo et al 0, Zeiler et al 011, Labusch et al 009 etc Our approach: clustering to extract low level features 3 3

84 Related Work Unsupervised layers followed by Supervised layers Lee et al 009, Jarrett et 009, Kavukcuoglo et al 0, Zeiler et al 011, Labusch et al 009 All of them utilize some form of reconstruction + sparsity cost function for unsupervised training We investigate a much simpler alternative (K-means clustering) Using K-means clustering to build features Coates and Ng 011, 01 Use K-means directly on raw images Number of cluster centers learned very large (a few thousands) We build clustering on top of neuro-mimetic preprocessing, we get competitive performance with a few cluster centers Benefits of tuning Pre-processing Step Ciresan et al 011, Fukushima 003 gains obtained using contrast normalization, center surround filtering

85 RBF SVM as Supervised Classifier Since we are using clustering to organize patterns, we expect a classifier that uses a Gaussian mixture model (GMM) to do well: Radial Basis function classifier (Sung 199) RBF classifier: like GMM but mixing probabilities trained discriminatively RBF+ SVM superior to GMM-based classification (Scholkopf et al 199) Scale parameter of RBF kernel set via cross-validation on training set. 5 5

86 Area of Active Research Systems Level Research (Dealing with the Quantization Non-linearity) Capacity of AWGN channel - Singh et al (009) Capacity of Block Non-coherent Communication - Singh et al (009) Fading Channel Capacity Krone et al(0) Equalization Information Rate Zeitler et al (0) Channel Estimation Dabeer et al (0), Zeitler et al(011) Time-Interleaved ADC Mismatch- Ponnuru et al (011) Circuit level Research: Analog Signal Processing, Circuit implementations (>1Gbps) Carrier phase recovery: Brodersen et al (011) Continuous time Equalizers: Wang (0), Alon et al (011), Chan (01) and many more..

87 Bayesian Estimation of φ {1,3,5,7} Blind Mode (Unknown Sequence) net phase rotation observations: 1. Derive Conditional distribution of u given β. Compute observation density pmf 3. Use Bayes rule to get the recursive equation: or 7

88 1. Conditional distribution of u : has a closed form expression β = 0 shifts circularly with β.

89 Setup Q( ) TX Front End channel (Rx Filter+ A/D) A (dispersive) B C DSP Objective: Designing Q( ) to optimize Equalization performance (BER), while keeping bits of precision as low as possible 9

90 Supervised Deep Nets: Issues Large number of tunable parameters Long training times Overfitting an issue Large amounts of labeled training data required Tricks, clever engineering required to make them work DropOut, DropConnect etc setting correct values of learning rates, weight decay, momentum Not clear what information extracted by different layers lower layers? higher layers? Simpler ways of implementing hierarchical feature extraction? 90

91 RGC + Simple Cells N X N (Raw Image) N X N X Feature Maps Each Spatial location: neurons their activations represent 7X7 patches Example: X 7 Patch Simple Cell Response Response prior to LCN 91

92 LCN (Local Contrast Normalization) operation: i : feature index j : spatial index sum over features sum over spatial neighborhood normalizes activations inhibits weaker responses Example: X 7 Patch Simple Cell Response Response prior to LCN 9

93 N X N (Raw Image) N X N X Simple Cell Outputs N X N X K 1 Next Layer? 1. either feed to classifier OR. zoom out via pooling and cluster larger patches 93

94 Next layer: Option 1 Pool and feed it to a supervised classifier Pooling: Local Translation invariance, edge anywhere in a cell pooling over X Grid eg: MNIST, K 1 = 00 before pooling X X 00 after pooling X X 00 = 300: length of the feature 9

95 Next layer: Option One more layer of extracting unsupervised features before discriminative learning Zoom out and cluster: Receptive fields now larger than 7X7 Local X pooling and X concatenation X X 00 7 X 7 X 00 Represents X Patches K 1 :#first layer centers 95

96 nd Layer Clustering X Patch 3 Individual matching score (0-1) Similarity metric for clustering: average matching score over the quadrants Concatenation pooling 3 0 1

Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach

Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach Aseem Wadhwa, Upamanyu Madhow Department of ECE, UCSB 1/26 Modern Communication Receiver Architecture Analog Digital TX

More information

Scalable Front End Designs for Communication and Learning

Scalable Front End Designs for Communication and Learning UNIVERSITY OF CALIFORNIA Santa Barbara Scalable Front End Designs for Communication and Learning A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy

More information

Space-time Slicer Architectures for Analog-to-Information Conversion in Channel Equalizers

Space-time Slicer Architectures for Analog-to-Information Conversion in Channel Equalizers Space-time Slicer Architectures for Analog-to-Information Conversion in Channel Equalizers Aseem Wadhwa, Upamanyu Madhow and Naresh Shanbhag Department of ECE, University of California Santa Barbara, CA

More information

Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach

Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach Aseem Wadhwa and Upamanyu Madhow Department of ECE, University of California Santa Barbara, CA 936 Email: {aseem, madhow}@ece.ucsb.edu

More information

4432 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 17, SEPTEMBER 1, Aseem Wadhwa and Upamanyu Madhow

4432 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 17, SEPTEMBER 1, Aseem Wadhwa and Upamanyu Madhow 443 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 17, SEPTEMBER 1, 016 Near-Coherent QPSK Performance With Coarse Phase Quantization: A Feedback-Based Architecture for Joint Phase/Frequency Synchronization

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

Digital Band-pass Modulation PROF. MICHAEL TSAI 2011/11/10

Digital Band-pass Modulation PROF. MICHAEL TSAI 2011/11/10 Digital Band-pass Modulation PROF. MICHAEL TSAI 211/11/1 Band-pass Signal Representation a t g t General form: 2πf c t + φ t g t = a t cos 2πf c t + φ t Envelope Phase Envelope is always non-negative,

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Channel Estimation with Low-Precision Analog-to-Digital Conversion

Channel Estimation with Low-Precision Analog-to-Digital Conversion Channel Estimation with Low-Precision Analog-to-Digital Conversion Onkar Dabeer School of Technology and Computer Science Tata Institute of Fundamental Research Mumbai India Email: onkar@tcs.tifr.res.in

More information

RegML 2018 Class 8 Deep learning

RegML 2018 Class 8 Deep learning RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x

More information

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia

Deep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia Deep Learning of Invariant Spatiotemporal Features from Video Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia Introduction Focus: Unsupervised feature extraction from

More information

How to do backpropagation in a brain

How to do backpropagation in a brain How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep

More information

Ângelo Cardoso 27 May, Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico

Ângelo Cardoso 27 May, Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico BIOLOGICALLY INSPIRED COMPUTER MODELS FOR VISUAL RECOGNITION Ângelo Cardoso 27 May, 2010 Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico Index Human Vision Retinal Ganglion Cells Simple

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Spatial Transformation

Spatial Transformation Spatial Transformation Presented by Liqun Chen June 30, 2017 1 Overview 2 Spatial Transformer Networks 3 STN experiments 4 Recurrent Models of Visual Attention (RAM) 5 Recurrent Models of Visual Attention

More information

Exploiting Sparsity for Wireless Communications

Exploiting Sparsity for Wireless Communications Exploiting Sparsity for Wireless Communications Georgios B. Giannakis Dept. of ECE, Univ. of Minnesota http://spincom.ece.umn.edu Acknowledgements: D. Angelosante, J.-A. Bazerque, H. Zhu; and NSF grants

More information

Analysis of Receiver Quantization in Wireless Communication Systems

Analysis of Receiver Quantization in Wireless Communication Systems Analysis of Receiver Quantization in Wireless Communication Systems Theory and Implementation Gareth B. Middleton Committee: Dr. Behnaam Aazhang Dr. Ashutosh Sabharwal Dr. Joseph Cavallaro 18 April 2007

More information

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/

More information

UNSUPERVISED LEARNING

UNSUPERVISED LEARNING UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

TUTORIAL PART 1 Unsupervised Learning

TUTORIAL PART 1 Unsupervised Learning TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew

More information

Pattern Recognition and Machine Learning

Pattern Recognition and Machine Learning Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability

More information

Improved Bayesian Compression

Improved Bayesian Compression Improved Bayesian Compression Marco Federici University of Amsterdam marco.federici@student.uva.nl Karen Ullrich University of Amsterdam karen.ullrich@uva.nl Max Welling University of Amsterdam Canadian

More information

Expectation propagation for signal detection in flat-fading channels

Expectation propagation for signal detection in flat-fading channels Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA

More information

Convolutional Neural Networks. Srikumar Ramalingam

Convolutional Neural Networks. Srikumar Ramalingam Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

SerDes_Channel_Impulse_Modeling_with_Rambus

SerDes_Channel_Impulse_Modeling_with_Rambus SerDes_Channel_Impulse_Modeling_with_Rambus Author: John Baprawski; John Baprawski Inc. (JB) Email: John.baprawski@gmail.com Web sites: https://www.johnbaprawski.com; https://www.serdesdesign.com Date:

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and

More information

Beyond Spatial Pyramids

Beyond Spatial Pyramids Beyond Spatial Pyramids Receptive Field Learning for Pooled Image Features Yangqing Jia 1 Chang Huang 2 Trevor Darrell 1 1 UC Berkeley EECS 2 NEC Labs America Goal coding pooling Bear Analysis of the pooling

More information

Timing Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL

Timing Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL T F T I G E O R G A I N S T I T U T E O H E O F E A L P R O G R ESS S A N D 1 8 8 5 S E R V L O G Y I C E E C H N O Timing Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL Aravind R. Nayak

More information

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Advanced Introduction to Machine Learning

Advanced Introduction to Machine Learning 10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date

More information

SGD and Deep Learning

SGD and Deep Learning SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Direct-Sequence Spread-Spectrum

Direct-Sequence Spread-Spectrum Chapter 3 Direct-Sequence Spread-Spectrum In this chapter we consider direct-sequence spread-spectrum systems. Unlike frequency-hopping, a direct-sequence signal occupies the entire bandwidth continuously.

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing

+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable

More information

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation

More information

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji

Dynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

CONVOLUTIONAL DEEP BELIEF NETWORKS

CONVOLUTIONAL DEEP BELIEF NETWORKS CONVOLUTIONAL DEEP BELIEF NETWORKS Talk by Emanuele Coviello Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations Honglak Lee Roger Grosse Rajesh Ranganath

More information

Spatial Transformer Networks

Spatial Transformer Networks BIL722 - Deep Learning for Computer Vision Spatial Transformer Networks Max Jaderberg Andrew Zisserman Karen Simonyan Koray Kavukcuoglu Contents Introduction to Spatial Transformers Related Works Spatial

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih

Unsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition

More information

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition

More information

Derived Distance: Beyond a model, towards a theory

Derived Distance: Beyond a model, towards a theory Derived Distance: Beyond a model, towards a theory 9.520 April 23 2008 Jake Bouvrie work with Steve Smale, Tomaso Poggio, Andrea Caponnetto and Lorenzo Rosasco Reference: Smale, S., T. Poggio, A. Caponnetto,

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Capacity of the Discrete-Time AWGN Channel Under Output Quantization

Capacity of the Discrete-Time AWGN Channel Under Output Quantization Capacity of the Discrete-Time AWGN Channel Under Output Quantization Jaspreet Singh, Onkar Dabeer and Upamanyu Madhow Abstract We investigate the limits of communication over the discrete-time Additive

More information

BASICS OF DETECTION AND ESTIMATION THEORY

BASICS OF DETECTION AND ESTIMATION THEORY BASICS OF DETECTION AND ESTIMATION THEORY 83050E/158 In this chapter we discuss how the transmitted symbols are detected optimally from a noisy received signal (observation). Based on these results, optimal

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive

More information

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Learning Deep Architectures for AI. Part II - Vijay Chakilam Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model

More information

Low Resolution Adaptive Compressed Sensing for mmwave MIMO receivers

Low Resolution Adaptive Compressed Sensing for mmwave MIMO receivers Low Resolution Adaptive Compressed Sensing for mmwave MIMO receivers Cristian Rusu, Nuria González-Prelcic and Robert W. Heath Motivation 2 Power consumption at mmwave in a MIMO receiver Power at 60 GHz

More information

Neural networks and optimization

Neural networks and optimization Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional

More information

The connection of dropout and Bayesian statistics

The connection of dropout and Bayesian statistics The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University

More information

Novel spectrum sensing schemes for Cognitive Radio Networks

Novel spectrum sensing schemes for Cognitive Radio Networks Novel spectrum sensing schemes for Cognitive Radio Networks Cantabria University Santander, May, 2015 Supélec, SCEE Rennes, France 1 The Advanced Signal Processing Group http://gtas.unican.es The Advanced

More information

Deep Learning Autoencoder Models

Deep Learning Autoencoder Models Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Learning Deep Architectures

Learning Deep Architectures Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and

More information

On the Limits of Communication with Low-Precision Analog-to-Digital Conversion at the Receiver

On the Limits of Communication with Low-Precision Analog-to-Digital Conversion at the Receiver 1 On the Limits of Communication with Low-Precision Analog-to-Digital Conversion at the Receiver Jaspreet Singh, Onkar Dabeer, and Upamanyu Madhow, Abstract As communication systems scale up in speed and

More information

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire

More information

Mathematical Formulation of Our Example

Mathematical Formulation of Our Example Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Deep Feedforward Networks. Sargur N. Srihari

Deep Feedforward Networks. Sargur N. Srihari Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation

More information

Natural Image Statistics and Neural Representations

Natural Image Statistics and Neural Representations Natural Image Statistics and Neural Representations Michael Lewicki Center for the Neural Basis of Cognition & Department of Computer Science Carnegie Mellon University? 1 Outline 1. Information theory

More information

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding

Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab

More information

Mobile Communications (KECE425) Lecture Note Prof. Young-Chai Ko

Mobile Communications (KECE425) Lecture Note Prof. Young-Chai Ko Mobile Communications (KECE425) Lecture Note 20 5-19-2014 Prof Young-Chai Ko Summary Complexity issues of diversity systems ADC and Nyquist sampling theorem Transmit diversity Channel is known at the transmitter

More information

Feature Design. Feature Design. Feature Design. & Deep Learning

Feature Design. Feature Design. Feature Design. & Deep Learning Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately

More information

The Origin of Deep Learning. Lili Mou Jan, 2015

The Origin of Deep Learning. Lili Mou Jan, 2015 The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets

More information

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion

Tutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion Tutorial on Methods for Interpreting and Understanding Deep Neural Networks W. Samek, G. Montavon, K.-R. Müller Part 3: Applications & Discussion ICASSP 2017 Tutorial W. Samek, G. Montavon & K.-R. Müller

More information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information

Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University

More information

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,

WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Determining the Optimal Decision Delay Parameter for a Linear Equalizer

Determining the Optimal Decision Delay Parameter for a Linear Equalizer International Journal of Automation and Computing 1 (2005) 20-24 Determining the Optimal Decision Delay Parameter for a Linear Equalizer Eng Siong Chng School of Computer Engineering, Nanyang Technological

More information

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

CSC321 Lecture 9: Generalization

CSC321 Lecture 9: Generalization CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions

More information

Understanding How ConvNets See

Understanding How ConvNets See Understanding How ConvNets See Slides from Andrej Karpathy Springerberg et al, Striving for Simplicity: The All Convolutional Net (ICLR 2015 workshops) CSC321: Intro to Machine Learning and Neural Networks,

More information

Loss Functions and Optimization. Lecture 3-1

Loss Functions and Optimization. Lecture 3-1 Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending

More information

PATTERN CLASSIFICATION

PATTERN CLASSIFICATION PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, ovember 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.12.13

More information

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others) Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward

More information

Probabilistic Reasoning in Deep Learning

Probabilistic Reasoning in Deep Learning Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

Computation of Bit-Error Rate of Coherent and Non-Coherent Detection M-Ary PSK With Gray Code in BFWA Systems

Computation of Bit-Error Rate of Coherent and Non-Coherent Detection M-Ary PSK With Gray Code in BFWA Systems Computation of Bit-Error Rate of Coherent and Non-Coherent Detection M-Ary PSK With Gray Code in BFWA Systems Department of Electrical Engineering, College of Engineering, Basrah University Basrah Iraq,

More information

Lecture 12. Block Diagram

Lecture 12. Block Diagram Lecture 12 Goals Be able to encode using a linear block code Be able to decode a linear block code received over a binary symmetric channel or an additive white Gaussian channel XII-1 Block Diagram Data

More information

Artificial Neural Networks Examination, June 2005

Artificial Neural Networks Examination, June 2005 Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either

More information

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Neural Networks 2. 2 Receptive fields and dealing with image inputs CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Deep Feedforward Networks

Deep Feedforward Networks Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information