Scalable Front End Designs for Communication and Learning. Aseem Wadhwa, Department of ECE UCSB PhD Defense
|
|
- Adelia O’Neal’
- 5 years ago
- Views:
Transcription
1 Scalable Front End Designs for Communication and Learning Aseem Wadhwa, Department of ECE UCSB PhD Defense 1
2 Estimation/Detection Problem Artificial/Natural distortions Receiver eg: Communication System, Inter Symbol Interference eg: Computer Vision = { cars, trucks, planes etc} = raw images
3 Receiver Design Front End Back End System-Specific Generic Principles example: Machine Learning ----.txt Feature Extractor Front End: Bag-of-words (text) SIFT (images) Back End: SVM (support vector machine) 3
4 Receiver Design Front End Back End System-Specific Generic Principles example: Analog to Digital Interfaces Front End: Nyquist Sampling (preserve waveform information) Matched filtering, symbol rate sampling (communication) Back End: Generic MAP detector (maximum a posteriori probability )
5 Scalability Challenges In this talk, we discuss the Front end designs for systems that face Scalability Issues Settings Communication Systems High Bandwidth (s GHz) links ---> power bottleneck with high resolution ADC Need to adapt to low resolution, high Quantization Error Machine Vision Complexity computational ease of implementation Power considerations eg: handheld devices 5
6 3 Canonical Problems We revisit the System Design for 3 specific problems and demonstrate how some of these scalability issues can be addressed 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering Communication System Machine Vision
7 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering 7
8 Modern Communication Receiver Architecture Digitize Early, high resolution samples Received Analog Signal Analog Frontend Analog Downconvert to Baseband, Receive Filter ADC Digital Faithful Conversion DSP Leveraging Moore s Law Synchronization Equalization Error Decoding etc ADC Bottleneck High Speed Links (s Gbps) (eg: chip-chip, mm wave, optical) Effective Number of Bits (ENOB) * Data from B. Murmann, "ADC Performance Survey ," [Online]. Available: html. f s (Hz) - sampling frequency Challenge: Adapting Systems to Low Resolution ADCs
9 Modified Architecture: Bayesian Mixed-Signal Processing TX Channel Analog Frontend (Preprocessing) A-to-D Conversion Coarse Digitally Controlled Feedback (phase rotation, Quantization thresholds etc) Bayesian Inference (DSP) Non-linear Algorithms Circuit implementations: Chan (01), Minwei (01), : Brodersen et al (011) Revisiting classical problems synchronization equalization 9
10 Blind Phase/Frequency Synchronization: System Model TX complex symbols (unknown) Unknown Channel Phase Unknown Frequency Offset Complex AWGN Objective: estimate φ c and Δf, decode b k Simplifying Assumptions: QPSK Non-Dispersive Channel Perfect Timing Sync, Nyquist rate symbols
11 Phase-only Quantization using 1-bit ADCs (AGC-free quantization) Received Passband Waveform Downconversion I Q Pass Linear combinations of I & Q through 1-bit ADCs M= bins ( ADCs) M=1 bins ( ADCs) 11
12 Feedback Transmitter Phase Quantization Channel QPSK Feedback DSP for Bayesian estimation Decoded Symbols Quantized phase measurements (1,,..,M) Digitally controlled Derotation phase: Dither Signal 1
13 Break into sub-problems Phase changes Slowly over a few s of symbols example: T s = ( GHz) -1 ; f c = 0GHz Δf = (ppm)f c =() - * f c η = (π) - radians = 0.03 o per symbol period 1. Blind phase estimation: estimate φ given z k. Once phase ``locked in, start tracking and also decoding 13
14 Bayesian Estimation of φ Compute observation density pmf Use Recursive Bayes Update posterior: or Can choosing θ k (dither) cleverly improve estimation? What if we set θ k = constant (say 0 o )? 1
15 To dither or not (1) Example 1 : M= bins θ k = 0 0 θ k = 0 θ k Random Posterior of φ after 150 symbols (SNR = 5dB) Symmetric Bins and equiprobable QPSK symbols (KL Divergence) 15
16 To dither or not () Example 3 : Asymmetric Bins (M=1) θ k = 0 0 Posterior of φ after 0 symbols (SNR = 5dB) 1
17 To dither or not (3) Example 3 : High SNR: Flat Posterior θ k = 0 0 θ k = 0 θ k Random Posterior of φ after 30 symbols (SNR = 35dB) 1. Dithering is Required. Random Dithering is a robust choice 3. Can we do better than Random Dither? 17
18 What is the Best Dither? Estimation best when Fisher Information maximized High SNR (15dB) Low SNR (5dB) ? ? Bin Edge Bin Center 1
19 Policies We don t know true φ though Can use our best guess of φ : Maximizing Fisher Information Policy (MFI) Literature: (Atia 13) Optimal if φ MAP close to the true value What if the uncertainty in φ is high? Greedy Entropy Policy (GEP): Choose the next action to minimize the entropy of the next posterior (avg. observation density) 19
20 Greedy Entropy Policy: Properties 1. It can be shown that as number of observations increase, GEP --> MFI, which is optimal for large N. Zero noise Case: is flat, support size: Entropy of a flat distribution eg: GEP is optimal, reduces the support by half every step 0
21 RMSE (in degrees) RMSE (in degrees) Numerical Results: Coarser Quantization (M=) 5 0 RMSE (M= ; SNR=5dB) CRLB GE MFI R 1 1 RMSE (M= ; SNR=15dB) CRLB GE MFI R #symbols low SNR #symbols high SNR At low SNR, GEP slightly better than MFI Random dithering becomes more sub-optimal as SNR increases 1
22 RMSE (in degrees) RMSE (in degrees) Numerical Results: Finer Quantization (M=1) RMSE (M=1 ; SNR=5dB) #symbols low SNR CRLB GE MFI R Const RMSE (M=1 ; SNR=15dB) #symbols high SNR CRLB GE MFI R Const At low SNR, noise provides enough dither, constant action is also fine
23 Literature: Adaptive Control for Estimation Literature: sequential design of experiments, active multihypothesis testing etc Optimal Policy : Equivalent to solving a POMDP (partially observable Markov Decision Process): High complexity : set of policy trees, can be very large, worst case: gridding of belief space (posterior ) Asymptotically Optimal policies have been proposed: not applicable to our problem directly Multihypothesis Testing (Chernoff 19, Nagshvar 13, Nitinawarat 13 etc) Ideas similar to GEP to derive bounds (Nagshvar 13) Continuous parameter estimation (Atia 13) Shows MFI Asymptotically Optimal 3
24 Frequency Tracking Very slowly Varying Phase 1. Sliding MAP estimate of phase coarse estimate. feed to an EKF (Extended Kalman filter) for tracking both phase and frequency
25 Numerical Results 5
26 Take Away 1. Phase/Frequency Synchronization Mixed Signal Front End Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering
27 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering 7
28 Setup Q( ) TX channel Front End (Rx Filter+ ADC) (dispersive) DSP ADC Just enough precision to preserve information Goal: Fundamental investigation to characterize BER with as low #slicers as possible to avoid error floors Optimal MAP processing : BCJR Setting: BPSK, high SNR, uncoded, static channel
29 Examples: Dispersive Channels Memory: 5- Symbol Periods Channel Impulse Response : h A (t) Channel A (maximum phase) t h B (t) t Channel B (mixed phase) h C (t) t Channel C (broad peak) (*0 inch FR GHz) Backplane channels (wireline chip-chip high speed links) 9
30 BER Standard Flash ADC Amplitude N-slicers log (N+1)-bit ADC (eg: 3-bit ADC: 7 Slicers) Same Sampling Phase Thresholds spread uniformly Time Unquantized 15 slicers 5 slicers 3 slicers slicers SNR N min = Minimum #slicers to avoid error floors Hard to find Analytically For channel A (δ=0) h = [ ]; 30
31 Standard Flash ADC : N min Lower & Upper Bounds Channel A (maximum phase) N l = N u = N min = 3 Channel B (mixed phase) N l = N u = N min = 5 Channel C (maximum phase, strong peak) N l = N u = N min = Bounds depend on: Relative strength of the strongest tap (Energy spread) Position of the Strongest Tap 31
32 BER BER The Problem with Uniform ADC Amplitude Slicer Thresholds fixed: BER sensitive to channel -1-3-bit ADC (7) -bit ADC (15) Unquantized Channel B 0-1 Vs Large Gap SNR -bit ADC (3) 3-bit ADC (7) Unquantized Channel C Time TSE (Symbol Spaced Equalization) FSE (Fractionally spaced Equalization)--> sample at a rate higher than Nyquist: more Robust SNR Cannot simply double the number of Slicers 3
33 Generalizing the ADC Standard ADC architecture Generalized Space-time Slicer Architecture Waveform shape preservation Information preservation Each Slicer location characterized by Expansion of the Optimization space. Special Cases: 1-bit architecture --> all δ i different TSE uniform -> t i uniform, δ i same TSE non-uniform -> t i non-uniform, δ i same FSE T/ --> δ i = 0 and
34 Key Results Randomly dispersed 1-bit slicers can preserve information Proof uses ideas from compressive sensing Channel optimized slicer thresholds TSE case (non-uniform ADC, fixed sampling phase) FSE T/ case (two fixed sampling phases) 3
35 Algorithm for Finding TSE Thresholds: Idea Problem: Find M Thresholds [t 1,t,.,t M ] to minimize BER Union Bound (MLSE) Pairwise error Probability eg: e = {-1,0} b = {+1,+1} b = {-1,+1} High SNR: error events with weights 1 or bits 35
36 Binary Hypothesis Testing An upper bound of can be computed as a function of a single threshold parameter t i=1 i= i=3 i= i=5 i= i= t -5 min() t Set of M thresholds that minimize the sum over all pairs of sequences We end up minimizing a loose upper bound, using K-means 3
37 BER BER Numerical Results (TSE) Channel B Channel A -1 - unquantized uniform ADC (7 slicers) non-uniform thresholds (7) -1 - unquantized uniform ADC (5 slicers) non-uniform thresholds (5) SNR SNR 37
38 BER Channel C Numerical Results (FSE T/) -1 - uniform ADC (0.5) uniform ADC (0) non-uniform ADC (0.5) non-uniform ADC (0) space-time slicers SNR
39 Take Away 1. Phase/Frequency Synchronization. Equalization No need for waveform-preservation Almost any configuration of Slicers can preserve information Benefit of tailoring to the channel 3. Object Classification Reducing number of Tunable parameters via neuro-mimetic design and clustering 39
40 1. Phase/Frequency Synchronization Importance of Dither. Equalization Importance of Adapting Quantization thresholds to the channel 3. Object Classification Reducing number of Tunable parameters via neuromimetic design and clustering 0
41 Significant recent progress in machine vision Current records held by Supervised Deep Neural Nets Loosely inspired by spatial organization of the visual cortex Hierarchical Structure Krizhevsky et al 01 Ciresan et al 011 1
42 Supervised Deep Nets: Issues Large number of tunable parameters learning rates, weight decay, momentum etc Tricks, clever engineering required to make them work DropOut, DropConnect etc Not clear what information extracted by different layers lower layers? higher layers? Can we put hierarchical feature extraction in an understandable framework?
43 Ideas 1. Neuro-Mimetic Design. Use Standard clustering building blocks + neuro-inspiration 3
44 Loose Neuro-Inspiration plays a key role already Convolutional and Hierarchical Architecture Local spatial organization of cells in the visual cortex Figure from LeCun et al, 199 Rectification Neurons firing when inputs exceed a threshold Local Contrast Normalization (LCN) Local inhibition and competition between neighboring neurons Pooling - Complex cells - Translation invariance Normalize Figure from Brady et al, 000 Much less work on neuro-mimetic computational models
45 How much Supervision do we need? We can see things even if we don t know their labels Perhaps our visual system be extracting a universal set of features? Unsupervised approach has been tried in the literature. eg: Transfer Learning Further effort needed to improve performance 1. Can we leverage everything we know for sure about mammalian vision?. Can most of the learning be unsupervised? 5
46 Architecture N X N (Raw Image) RGC Layer Simple Cell Layer N X N X f Feature Maps 1. Neuro-mimetic Frontend. Unsupervised Feature Extraction Clustering and Pooling Supervised Classifier (eg: SVM or neural net)
47 Neuro-Mimetic Frontend (Most of the modeling work done by Emre Akbas, Post-Doc with Prof. Eckstein in Department of Psychology and Brain Sciences, UCSB) 7
48 How well do we understand the visual pathway? Graphic from Bengio, Lecun ICML Workshop 009, Montreal Scene RGC/LGN optic nerve V1 Simple Cells Complex Cells in the Retina (Retinal Ganglion Cells/ Lateral Geniculate Nucleus) in the cortex
49 Retinal Ganglion Cells (RGCs) Retina RGC/LGN V1 Simple Cells Complex Cells What it does: 1. Luminance Gain Control. Center Surround Filtering 3. Local Contrast Normalization (LCN) and Rectification Retinal Ganglion Cell (RGC) Filter References: Wandell 1995 Croner et al 1995 Carandini et al 005 Luminance Control center ON center OFF Relevant parts of the image lit up (>0) 9
50 V1 Simple Cells Retina RGC/LGN V1 Simple Cells Complex Cells Simple cells sum RGC outputs References: Hubel and Wiesel 19 Edge Oriented Filters etc Output: Feature Maps etc 50
51 RGC + Simple Cells.. X X X N X N (Raw Image) N X N X Feature Maps Each Spatial location: neurons a X1 (x,y) viewing distance : only tunable parameter. We set it guided by the resolution of the image. Receptive field sizes : roughly 7X7 pixels in the original images space 51
52 Clustering Natural candidate to find patterns in data Simple Encoding operation similar to a layer in a neural network cluster center == neuron Processing in Neural Layers Clustering Normalize input and cluster centers and perform K-means (spherical K-means) x 1 x Once centers are learned, non-linear function of soft encoding x d Layer i neurons Layer i+1 neurons Same as neural layer processing 5
53 Clustering Simple Cell Outputs x 1 x x c 1 c Input Data {x X1 } Learn K 1 centers Use online spherical K-means algorithm (Shi Zhong 005) c k1 K-means and Encode N X N (Raw Image) N X N X Simple Cell Outputs N X N X K 1 53
54 Encoding Function (Activations): Sparsity Level choice of f? Patch We choose: soft Threshold choose T to keep a sparsity level, example: 0% / 90% Activations after threshold 5
55 Cluster Centers (correspond to 7 X 7 patches) Mostly Edges Interpretation: orientation sensitive neurons 55
56 N X N (Raw Image) N X N X Simple Cell Outputs N X N X K 1 Next Layer? 1. either feed to the classifier OR. zoom out via pooling and cluster larger patches 5
57 nd layer Cluster Centers curves, t-junctions 57
58 The Datasets (Standard for image classification tests) MNIST NORB (uniform dataset) Truck Car Human Plane Animal digits X images 0K Training data, k Testing 5 objects 9 X 9 dimensions Varying Illumination, Elevation, Rotation 300 Training, 300 Testing 5
59 Experimental results Summary Decent performance on MNIST Beats state of the art on NORB Can do it with very sparse encoding 59
60 Test Scenarios 1. 1 layer of clustering K 1 = layer of clustering K 1 = layers of clustering K 1 =00, K =00 concatenate layer 1 and activations to form the feature vector (consistent with current neuroscientific understanding) Case and 3 : similar lengths of feature vectors No augmentation using affine distortions (translation, rotation etc) of the data RBF (Radial Basis Function) SVM as the final supervised classifier 0
61 MNIST Sparsity level: 0% Sparsity level: 95% Error Rate Layer 1 (K 1 =00) 0.73% Layer 1 (K 1 =00) 0.7% Layer 1 + (K 1 =00,K =00) 0.% Error Rate Layer 1 (K 1 =00) 0.7% Layer 1 (K 1 =00) 0.7% Layer 1 + (K 1 =00,K =00) 0.% State of the art (without distortions): 0.39%, Chen Yu Lee et al, 01, Deeply supervised Nets Works that use unsupervised layers followed by supervised classifiers: 0.% - Lee et al % - Ranzato et al 007, uses -layer neural net for supervised training on top of layers 0.59% - Labusch et al 009, sparsenet algorithm + RBF SVM 1
62 NORB Sparsity level: 0% Sparsity level: 95% Error Rate Layer 1 (K 1 =00) 3.9% Layer 1 (K 1 =00) 3.71% Layer 1 + (K 1 =00,K =00).9% Error Rate Layer 1 (K 1 =00).5% Layer 1 (K 1 =00).5% Layer 1 + (K 1 =00,K =00).90% Performance with single layer greatly improves with more sparsity: better than state of the art State of the art (with translations):.53% - Ciresan et al 011; (without translation)- 3.9% - supervised Deep Net State of the art (without translation):.7% - Uetz & Behnke 009; supervised Deep Net 3.0% - Coates et al 0, unsupervised + SVM, use K-means (K=000) 5.% - Jarrett et al 009, unsupervised pretraining + fine tuning
63 A promising start Neuromimetic frontend + clustering is a promising approach to universal feature extraction Easy to implement, very few tunable parameters (viewing distance, #cluster centers, sparsity level) Potential for interpreting cluster centers as successively more abstract and zoomed out representations 3
64 Many unanswered questions How to tell if we are capturing all the information? Is there an alternative metric to classification performance? Impact of design choices on classification performance appears to be dataset dependent How many layers? What sparsity level? Layer-dependent sparsity? What are the best approaches for low-power hardware implementations? Power savings from sparsity, backend complexity
65 List of Publications E. Akbas, A. Wadhwa, M. Eckstein and U. Madhow, A Framework for Machine Vision based on Neuro-Mimetic Front End Processing and Clustering, Proc. of 5nd Allerton Conference on Communication Control and Computing, October 01 A. Wadhwa, U. Madhow and N. Shanbhag, Space-time Slicer Architectures for Analog to Information Conversion in Channel Equalizers, in Proc. of IEEE International Conference on Communications (ICC'1), Sydney, Australia, June 01 A. Wadhwa and U. Madhow, Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach, Proc. of 51st Allerton Conference on Communication Control and Computing, Oct 013 A. Wadhwa, U. Madhow, J. Hespanha and B.Sadler, Following an RF trail to its source, Proc. of 9th Allerton Conference on Communication Control and Computing, Sept 011 Two Journal Submissions in Preparation 5
66 Acknowledgements Advisor: Prof. Upamanyu Madhow Committee members Funding Sources: Institute of Collaborative Biotechnologies (ICB), Army Research Office (ARO), SONIC, MARCO, DARPA Collaborators Prof. Hespanha, Prof. Shanbhag, Prof. Ashby and Prof. Eckstein Students: Jason, Yingyan, Erick, Ben & Emre WCSL Colleagues Questions?
67 7
68 Back-up Slides
69 9
70 Another Way of looking at it: maximize Information Utility KL Divergence between Theorem: Suppose, then Specifically, 70
71 Literature: Adaptive Control for Estimation Literature: sequential design of experiments, active hypothesis testing etc (Chernoff 1959,.., Nagshvar 013): Multihypotheis Testing avg. no. of samples prob. of error Value function/optimal Cost satisfies the fixed point/dp equation: expected value function on taking action a when belief is ρ 71
72 Idea (Algorithm) 1. MLSE (maximum likelihood sequence estimation): performance similar to BCJR for uncoded systems. P e < P u union bound 3. High SNR---> truncate P u to a few dominant terms. Truncated Sum: further upper bound, simple to evaluate 5. Final cost function : approximate upper bound Ω n represents a pair of bit sequences f( ) function of a scalar variable t given Ω n final optimization can be solved using K-means with M cluster centers 7
73 Union Bound eg: e = {-1,0} b = {+1,+1} b = {-1,+1} Truncated error events: : pairwise error probability of sequences 73
74 Reduces to binary Hypothesis testing between vectors H 0 : H 1 : element wise prob. of error is easy to evaluate. eg: X 0 (i) t X 1 (i) eg: i=1 i= i=3 i= i=5 i= i= t t 7
75 Questions 1. Can a dispersed slicer architecture guarantee channel inversion? - Can be shown that even randomly choosing t i works - Intuitively should work with enough #slicers Result : 1-bit Quantization with uniformly distributed thresholds T = [t 1,.,t n ] guarantees no error floor if n is large enough Quantization Preserves pair wise L1 norm distances 75
76 Result : 1-bit Quantization with uniformly distributed thresholds T = [t 1,.,t n ] guarantees absence of an error floor if n is large enough Reassuring Result Discussion: Next few Slides.. 7
77 Received Signal: Samples: or Lower Bound Information Rate* (LT s : length of channel) * Zeitler, G.; Singer, A.C.; Kramer, G., "Low-Precision A/D Conversion for Maximum Information Rate in Channels with Memory," Communications, IEEE Transactions on, September 01 77
78 Received Signal: Samples: or Lower Bound Information Rate* (LT s : length of channel) Observations affected by b i Conditioned on past bits * Zeitler, G.; Singer, A.C.; Kramer, G., "Low-Precision A/D Conversion for Maximum Information Rate in Channels with Memory," Communications, IEEE Transactions on, September 01 7
79 Wlog, consider i=0 i.e. to decode b 0 Fix past bits {b -(L+1),,b -1 } s(t) := {r(t), t ϵ [0,LT s ]} S 0 = {s(t) ; b 0 = -1} ; S 1 = {s(t) ; b 0 = +1} S 0 = S 1 = L-1, as future bits are varied Sample s(t) n times to get x Set of sampled signals - S 0 X 0, S 1 X 1 Condition for No Error Floor (I Lower bound = 1) 1 79
80 Result : 1-bit Quantization with uniformly distributed thresholds T = [t 1,.,t n ] guarantees no error floor if n is large enough Quantization Preserves pair wise L1 norm distances 0
81 Sketch of Proof Similar to Johnson-Lindenstrauss (JL) Lemma (Compressive sensing Literature) u, v ϵ R d f(u), f(v) ϵ R k, random subspace 1
82 Sketch of Proof Note: q(x) is a binary vector Steps: (for each pair of x 0 and x 1, conditioned over past bits) 1.. Bound probability of large deviations (Chernoff Bound) 3. Invoke union bound to consider all set of past bits
83 Does Unsupervised Learning Make Sense? It should.even though most state of the art Nets are purely supervised Transfer Learning has been shown to help Intuitively: extract low level features like edges, corners, t-junction etc independent of labels Low level Representative features (unsupervised) followed by higher level Discriminative features (supervised) Not entirely a new idea: Lee et al 009, Jarrett et 009, Kavukcuoglo et al 0, Zeiler et al 011, Labusch et al 009 etc Our approach: clustering to extract low level features 3 3
84 Related Work Unsupervised layers followed by Supervised layers Lee et al 009, Jarrett et 009, Kavukcuoglo et al 0, Zeiler et al 011, Labusch et al 009 All of them utilize some form of reconstruction + sparsity cost function for unsupervised training We investigate a much simpler alternative (K-means clustering) Using K-means clustering to build features Coates and Ng 011, 01 Use K-means directly on raw images Number of cluster centers learned very large (a few thousands) We build clustering on top of neuro-mimetic preprocessing, we get competitive performance with a few cluster centers Benefits of tuning Pre-processing Step Ciresan et al 011, Fukushima 003 gains obtained using contrast normalization, center surround filtering
85 RBF SVM as Supervised Classifier Since we are using clustering to organize patterns, we expect a classifier that uses a Gaussian mixture model (GMM) to do well: Radial Basis function classifier (Sung 199) RBF classifier: like GMM but mixing probabilities trained discriminatively RBF+ SVM superior to GMM-based classification (Scholkopf et al 199) Scale parameter of RBF kernel set via cross-validation on training set. 5 5
86 Area of Active Research Systems Level Research (Dealing with the Quantization Non-linearity) Capacity of AWGN channel - Singh et al (009) Capacity of Block Non-coherent Communication - Singh et al (009) Fading Channel Capacity Krone et al(0) Equalization Information Rate Zeitler et al (0) Channel Estimation Dabeer et al (0), Zeitler et al(011) Time-Interleaved ADC Mismatch- Ponnuru et al (011) Circuit level Research: Analog Signal Processing, Circuit implementations (>1Gbps) Carrier phase recovery: Brodersen et al (011) Continuous time Equalizers: Wang (0), Alon et al (011), Chan (01) and many more..
87 Bayesian Estimation of φ {1,3,5,7} Blind Mode (Unknown Sequence) net phase rotation observations: 1. Derive Conditional distribution of u given β. Compute observation density pmf 3. Use Bayes rule to get the recursive equation: or 7
88 1. Conditional distribution of u : has a closed form expression β = 0 shifts circularly with β.
89 Setup Q( ) TX Front End channel (Rx Filter+ A/D) A (dispersive) B C DSP Objective: Designing Q( ) to optimize Equalization performance (BER), while keeping bits of precision as low as possible 9
90 Supervised Deep Nets: Issues Large number of tunable parameters Long training times Overfitting an issue Large amounts of labeled training data required Tricks, clever engineering required to make them work DropOut, DropConnect etc setting correct values of learning rates, weight decay, momentum Not clear what information extracted by different layers lower layers? higher layers? Simpler ways of implementing hierarchical feature extraction? 90
91 RGC + Simple Cells N X N (Raw Image) N X N X Feature Maps Each Spatial location: neurons their activations represent 7X7 patches Example: X 7 Patch Simple Cell Response Response prior to LCN 91
92 LCN (Local Contrast Normalization) operation: i : feature index j : spatial index sum over features sum over spatial neighborhood normalizes activations inhibits weaker responses Example: X 7 Patch Simple Cell Response Response prior to LCN 9
93 N X N (Raw Image) N X N X Simple Cell Outputs N X N X K 1 Next Layer? 1. either feed to classifier OR. zoom out via pooling and cluster larger patches 93
94 Next layer: Option 1 Pool and feed it to a supervised classifier Pooling: Local Translation invariance, edge anywhere in a cell pooling over X Grid eg: MNIST, K 1 = 00 before pooling X X 00 after pooling X X 00 = 300: length of the feature 9
95 Next layer: Option One more layer of extracting unsupervised features before discriminative learning Zoom out and cluster: Receptive fields now larger than 7X7 Local X pooling and X concatenation X X 00 7 X 7 X 00 Represents X Patches K 1 :#first layer centers 95
96 nd Layer Clustering X Patch 3 Individual matching score (0-1) Similarity metric for clustering: average matching score over the quadrants Concatenation pooling 3 0 1
Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach
Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach Aseem Wadhwa, Upamanyu Madhow Department of ECE, UCSB 1/26 Modern Communication Receiver Architecture Analog Digital TX
More informationScalable Front End Designs for Communication and Learning
UNIVERSITY OF CALIFORNIA Santa Barbara Scalable Front End Designs for Communication and Learning A Dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy
More informationSpace-time Slicer Architectures for Analog-to-Information Conversion in Channel Equalizers
Space-time Slicer Architectures for Analog-to-Information Conversion in Channel Equalizers Aseem Wadhwa, Upamanyu Madhow and Naresh Shanbhag Department of ECE, University of California Santa Barbara, CA
More informationBlind phase/frequency synchronization with low-precision ADC: a Bayesian approach
Blind phase/frequency synchronization with low-precision ADC: a Bayesian approach Aseem Wadhwa and Upamanyu Madhow Department of ECE, University of California Santa Barbara, CA 936 Email: {aseem, madhow}@ece.ucsb.edu
More information4432 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 17, SEPTEMBER 1, Aseem Wadhwa and Upamanyu Madhow
443 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 64, NO. 17, SEPTEMBER 1, 016 Near-Coherent QPSK Performance With Coarse Phase Quantization: A Feedback-Based Architecture for Joint Phase/Frequency Synchronization
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationDigital Band-pass Modulation PROF. MICHAEL TSAI 2011/11/10
Digital Band-pass Modulation PROF. MICHAEL TSAI 211/11/1 Band-pass Signal Representation a t g t General form: 2πf c t + φ t g t = a t cos 2πf c t + φ t Envelope Phase Envelope is always non-negative,
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal Microsoft Cambridge, U.K. July 7th, 2009, Montreal Thanks to: Aaron Courville, Pascal Vincent, Dumitru Erhan, Olivier Delalleau, Olivier Breuleux,
More informationIntroduction to Machine Learning Midterm Exam
10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationChannel Estimation with Low-Precision Analog-to-Digital Conversion
Channel Estimation with Low-Precision Analog-to-Digital Conversion Onkar Dabeer School of Technology and Computer Science Tata Institute of Fundamental Research Mumbai India Email: onkar@tcs.tifr.res.in
More informationRegML 2018 Class 8 Deep learning
RegML 2018 Class 8 Deep learning Lorenzo Rosasco UNIGE-MIT-IIT June 18, 2018 Supervised vs unsupervised learning? So far we have been thinking of learning schemes made in two steps f(x) = w, Φ(x) F, x
More informationDeep Learning of Invariant Spatiotemporal Features from Video. Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia
Deep Learning of Invariant Spatiotemporal Features from Video Bo Chen, Jo-Anne Ting, Ben Marlin, Nando de Freitas University of British Columbia Introduction Focus: Unsupervised feature extraction from
More informationHow to do backpropagation in a brain
How to do backpropagation in a brain Geoffrey Hinton Canadian Institute for Advanced Research & University of Toronto & Google Inc. Prelude I will start with three slides explaining a popular type of deep
More informationÂngelo Cardoso 27 May, Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico
BIOLOGICALLY INSPIRED COMPUTER MODELS FOR VISUAL RECOGNITION Ângelo Cardoso 27 May, 2010 Symbolic and Sub-Symbolic Learning Course Instituto Superior Técnico Index Human Vision Retinal Ganglion Cells Simple
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationSpatial Transformation
Spatial Transformation Presented by Liqun Chen June 30, 2017 1 Overview 2 Spatial Transformer Networks 3 STN experiments 4 Recurrent Models of Visual Attention (RAM) 5 Recurrent Models of Visual Attention
More informationExploiting Sparsity for Wireless Communications
Exploiting Sparsity for Wireless Communications Georgios B. Giannakis Dept. of ECE, Univ. of Minnesota http://spincom.ece.umn.edu Acknowledgements: D. Angelosante, J.-A. Bazerque, H. Zhu; and NSF grants
More informationAnalysis of Receiver Quantization in Wireless Communication Systems
Analysis of Receiver Quantization in Wireless Communication Systems Theory and Implementation Gareth B. Middleton Committee: Dr. Behnaam Aazhang Dr. Ashutosh Sabharwal Dr. Joseph Cavallaro 18 April 2007
More informationMachine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group
Machine Learning for Computer Vision 8. Neural Networks and Deep Learning Vladimir Golkov Technical University of Munich Computer Vision Group INTRODUCTION Nonlinear Coordinate Transformation http://cs.stanford.edu/people/karpathy/convnetjs/
More informationUNSUPERVISED LEARNING
UNSUPERVISED LEARNING Topics Layer-wise (unsupervised) pre-training Restricted Boltzmann Machines Auto-encoders LAYER-WISE (UNSUPERVISED) PRE-TRAINING Breakthrough in 2006 Layer-wise (unsupervised) pre-training
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationTUTORIAL PART 1 Unsupervised Learning
TUTORIAL PART 1 Unsupervised Learning Marc'Aurelio Ranzato Department of Computer Science Univ. of Toronto ranzato@cs.toronto.edu Co-organizers: Honglak Lee, Yoshua Bengio, Geoff Hinton, Yann LeCun, Andrew
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationImproved Bayesian Compression
Improved Bayesian Compression Marco Federici University of Amsterdam marco.federici@student.uva.nl Karen Ullrich University of Amsterdam karen.ullrich@uva.nl Max Welling University of Amsterdam Canadian
More informationExpectation propagation for signal detection in flat-fading channels
Expectation propagation for signal detection in flat-fading channels Yuan Qi MIT Media Lab Cambridge, MA, 02139 USA yuanqi@media.mit.edu Thomas Minka CMU Statistics Department Pittsburgh, PA 15213 USA
More informationConvolutional Neural Networks. Srikumar Ramalingam
Convolutional Neural Networks Srikumar Ramalingam Reference Many of the slides are prepared using the following resources: neuralnetworksanddeeplearning.com (mainly Chapter 6) http://cs231n.github.io/convolutional-networks/
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationSerDes_Channel_Impulse_Modeling_with_Rambus
SerDes_Channel_Impulse_Modeling_with_Rambus Author: John Baprawski; John Baprawski Inc. (JB) Email: John.baprawski@gmail.com Web sites: https://www.johnbaprawski.com; https://www.serdesdesign.com Date:
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative: Live Questions We ll use Zoom to take questions from remote students live-streaming the lecture Check Piazza for instructions and
More informationBeyond Spatial Pyramids
Beyond Spatial Pyramids Receptive Field Learning for Pooled Image Features Yangqing Jia 1 Chang Huang 2 Trevor Darrell 1 1 UC Berkeley EECS 2 NEC Labs America Goal coding pooling Bear Analysis of the pooling
More informationTiming Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL
T F T I G E O R G A I N S T I T U T E O H E O F E A L P R O G R ESS S A N D 1 8 8 5 S E R V L O G Y I C E E C H N O Timing Recovery at Low SNR Cramer-Rao bound, and outperforming the PLL Aravind R. Nayak
More information9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering
Types of learning Modeling data Supervised: we know input and targets Goal is to learn a model that, given input data, accurately predicts target data Unsupervised: we know the input only and want to make
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationAdvanced Introduction to Machine Learning
10-715 Advanced Introduction to Machine Learning Homework Due Oct 15, 10.30 am Rules Please follow these guidelines. Failure to do so, will result in loss of credit. 1. Homework is due on the due date
More informationSGD and Deep Learning
SGD and Deep Learning Subgradients Lets make the gradient cheating more formal. Recall that the gradient is the slope of the tangent. f(w 1 )+rf(w 1 ) (w w 1 ) Non differentiable case? w 1 Subgradients
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationDirect-Sequence Spread-Spectrum
Chapter 3 Direct-Sequence Spread-Spectrum In this chapter we consider direct-sequence spread-spectrum systems. Unlike frequency-hopping, a direct-sequence signal occupies the entire bandwidth continuously.
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write
More information+ + ( + ) = Linear recurrent networks. Simpler, much more amenable to analytic treatment E.g. by choosing
Linear recurrent networks Simpler, much more amenable to analytic treatment E.g. by choosing + ( + ) = Firing rates can be negative Approximates dynamics around fixed point Approximation often reasonable
More information> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel
Logistic Regression Pattern Recognition 2016 Sandro Schönborn University of Basel Two Worlds: Probabilistic & Algorithmic We have seen two conceptual approaches to classification: data class density estimation
More informationDynamic Data Modeling, Recognition, and Synthesis. Rui Zhao Thesis Defense Advisor: Professor Qiang Ji
Dynamic Data Modeling, Recognition, and Synthesis Rui Zhao Thesis Defense Advisor: Professor Qiang Ji Contents Introduction Related Work Dynamic Data Modeling & Analysis Temporal localization Insufficient
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationCONVOLUTIONAL DEEP BELIEF NETWORKS
CONVOLUTIONAL DEEP BELIEF NETWORKS Talk by Emanuele Coviello Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations Honglak Lee Roger Grosse Rajesh Ranganath
More informationSpatial Transformer Networks
BIL722 - Deep Learning for Computer Vision Spatial Transformer Networks Max Jaderberg Andrew Zisserman Karen Simonyan Koray Kavukcuoglu Contents Introduction to Spatial Transformers Related Works Spatial
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationUnsupervised Learning of Hierarchical Models. in collaboration with Josh Susskind and Vlad Mnih
Unsupervised Learning of Hierarchical Models Marc'Aurelio Ranzato Geoff Hinton in collaboration with Josh Susskind and Vlad Mnih Advanced Machine Learning, 9 March 2011 Example: facial expression recognition
More informationArtificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen
Artificial Neural Networks Introduction to Computational Neuroscience Tambet Matiisen 2.04.2018 Artificial neural network NB! Inspired by biology, not based on biology! Applications Automatic speech recognition
More informationDerived Distance: Beyond a model, towards a theory
Derived Distance: Beyond a model, towards a theory 9.520 April 23 2008 Jake Bouvrie work with Steve Smale, Tomaso Poggio, Andrea Caponnetto and Lorenzo Rosasco Reference: Smale, S., T. Poggio, A. Caponnetto,
More informationScale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract
Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses
More informationCapacity of the Discrete-Time AWGN Channel Under Output Quantization
Capacity of the Discrete-Time AWGN Channel Under Output Quantization Jaspreet Singh, Onkar Dabeer and Upamanyu Madhow Abstract We investigate the limits of communication over the discrete-time Additive
More informationBASICS OF DETECTION AND ESTIMATION THEORY
BASICS OF DETECTION AND ESTIMATION THEORY 83050E/158 In this chapter we discuss how the transmitted symbols are detected optimally from a noisy received signal (observation). Based on these results, optimal
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationMachine Learning, Midterm Exam
10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have
More informationLecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Max-margin learning of GM Eric Xing Lecture 28, Apr 28, 2014 b r a c e Reading: 1 Classical Predictive Models Input and output space: Predictive
More informationLearning Deep Architectures for AI. Part II - Vijay Chakilam
Learning Deep Architectures for AI - Yoshua Bengio Part II - Vijay Chakilam Limitations of Perceptron x1 W, b 0,1 1,1 y x2 weight plane output =1 output =0 There is no value for W and b such that the model
More informationLow Resolution Adaptive Compressed Sensing for mmwave MIMO receivers
Low Resolution Adaptive Compressed Sensing for mmwave MIMO receivers Cristian Rusu, Nuria González-Prelcic and Robert W. Heath Motivation 2 Power consumption at mmwave in a MIMO receiver Power at 60 GHz
More informationNeural networks and optimization
Neural networks and optimization Nicolas Le Roux Criteo 18/05/15 Nicolas Le Roux (Criteo) Neural networks and optimization 18/05/15 1 / 85 1 Introduction 2 Deep networks 3 Optimization 4 Convolutional
More informationThe connection of dropout and Bayesian statistics
The connection of dropout and Bayesian statistics Interpretation of dropout as approximate Bayesian modelling of NN http://mlg.eng.cam.ac.uk/yarin/thesis/thesis.pdf Dropout Geoffrey Hinton Google, University
More informationNovel spectrum sensing schemes for Cognitive Radio Networks
Novel spectrum sensing schemes for Cognitive Radio Networks Cantabria University Santander, May, 2015 Supélec, SCEE Rennes, France 1 The Advanced Signal Processing Group http://gtas.unican.es The Advanced
More informationDeep Learning Autoencoder Models
Deep Learning Autoencoder Models Davide Bacciu Dipartimento di Informatica Università di Pisa Intelligent Systems for Pattern Recognition (ISPR) Generative Models Wrap-up Deep Learning Module Lecture Generative
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationLearning Deep Architectures
Learning Deep Architectures Yoshua Bengio, U. Montreal CIFAR NCAP Summer School 2009 August 6th, 2009, Montreal Main reference: Learning Deep Architectures for AI, Y. Bengio, to appear in Foundations and
More informationOn the Limits of Communication with Low-Precision Analog-to-Digital Conversion at the Receiver
1 On the Limits of Communication with Low-Precision Analog-to-Digital Conversion at the Receiver Jaspreet Singh, Onkar Dabeer, and Upamanyu Madhow, Abstract As communication systems scale up in speed and
More informationApprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning
Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning Nicolas Thome Prenom.Nom@cnam.fr http://cedric.cnam.fr/vertigo/cours/ml2/ Département Informatique Conservatoire
More informationMathematical Formulation of Our Example
Mathematical Formulation of Our Example We define two binary random variables: open and, where is light on or light off. Our question is: What is? Computer Vision 1 Combining Evidence Suppose our robot
More informationDeep Generative Models. (Unsupervised Learning)
Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What
More informationDeep Feedforward Networks. Sargur N. Srihari
Deep Feedforward Networks Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation and Other Differentiation
More informationNatural Image Statistics and Neural Representations
Natural Image Statistics and Neural Representations Michael Lewicki Center for the Neural Basis of Cognition & Department of Computer Science Carnegie Mellon University? 1 Outline 1. Information theory
More informationLarge-Scale Feature Learning with Spike-and-Slab Sparse Coding
Large-Scale Feature Learning with Spike-and-Slab Sparse Coding Ian J. Goodfellow, Aaron Courville, Yoshua Bengio ICML 2012 Presented by Xin Yuan January 17, 2013 1 Outline Contributions Spike-and-Slab
More informationMobile Communications (KECE425) Lecture Note Prof. Young-Chai Ko
Mobile Communications (KECE425) Lecture Note 20 5-19-2014 Prof Young-Chai Ko Summary Complexity issues of diversity systems ADC and Nyquist sampling theorem Transmit diversity Channel is known at the transmitter
More informationFeature Design. Feature Design. Feature Design. & Deep Learning
Artificial Intelligence and its applications Lecture 9 & Deep Learning Professor Daniel Yeung danyeung@ieee.org Dr. Patrick Chan patrickchan@ieee.org South China University of Technology, China Appropriately
More informationThe Origin of Deep Learning. Lili Mou Jan, 2015
The Origin of Deep Learning Lili Mou Jan, 2015 Acknowledgment Most of the materials come from G. E. Hinton s online course. Outline Introduction Preliminary Boltzmann Machines and RBMs Deep Belief Nets
More informationTutorial on Methods for Interpreting and Understanding Deep Neural Networks. Part 3: Applications & Discussion
Tutorial on Methods for Interpreting and Understanding Deep Neural Networks W. Samek, G. Montavon, K.-R. Müller Part 3: Applications & Discussion ICASSP 2017 Tutorial W. Samek, G. Montavon & K.-R. Müller
More informationMeasuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information
Measuring the Usefulness of Hidden Units in Boltzmann Machines with Mutual Information Mathias Berglund, Tapani Raiko, and KyungHyun Cho Department of Information and Computer Science Aalto University
More informationWHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY,
WHY ARE DEEP NETS REVERSIBLE: A SIMPLE THEORY, WITH IMPLICATIONS FOR TRAINING Sanjeev Arora, Yingyu Liang & Tengyu Ma Department of Computer Science Princeton University Princeton, NJ 08540, USA {arora,yingyul,tengyu}@cs.princeton.edu
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationDetermining the Optimal Decision Delay Parameter for a Linear Equalizer
International Journal of Automation and Computing 1 (2005) 20-24 Determining the Optimal Decision Delay Parameter for a Linear Equalizer Eng Siong Chng School of Computer Engineering, Nanyang Technological
More informationCS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS
CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS LAST TIME Intro to cudnn Deep neural nets using cublas and cudnn TODAY Building a better model for image classification Overfitting
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction
More informationCSC321 Lecture 9: Generalization
CSC321 Lecture 9: Generalization Roger Grosse Roger Grosse CSC321 Lecture 9: Generalization 1 / 27 Overview We ve focused so far on how to optimize neural nets how to get them to make good predictions
More informationUnderstanding How ConvNets See
Understanding How ConvNets See Slides from Andrej Karpathy Springerberg et al, Striving for Simplicity: The All Convolutional Net (ICLR 2015 workshops) CSC321: Intro to Machine Learning and Neural Networks,
More informationLoss Functions and Optimization. Lecture 3-1
Lecture 3: Loss Functions and Optimization Lecture 3-1 Administrative Assignment 1 is released: http://cs231n.github.io/assignments2017/assignment1/ Due Thursday April 20, 11:59pm on Canvas (Extending
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2012-2013 Jakob Verbeek, ovember 23, 2012 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.12.13
More informationMachine Learning. Neural Networks. (slides from Domingos, Pardo, others)
Machine Learning Neural Networks (slides from Domingos, Pardo, others) For this week, Reading Chapter 4: Neural Networks (Mitchell, 1997) See Canvas For subsequent weeks: Scaling Learning Algorithms toward
More informationProbabilistic Reasoning in Deep Learning
Probabilistic Reasoning in Deep Learning Dr Konstantina Palla, PhD palla@stats.ox.ac.uk September 2017 Deep Learning Indaba, Johannesburgh Konstantina Palla 1 / 39 OVERVIEW OF THE TALK Basics of Bayesian
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationComputation of Bit-Error Rate of Coherent and Non-Coherent Detection M-Ary PSK With Gray Code in BFWA Systems
Computation of Bit-Error Rate of Coherent and Non-Coherent Detection M-Ary PSK With Gray Code in BFWA Systems Department of Electrical Engineering, College of Engineering, Basrah University Basrah Iraq,
More informationLecture 12. Block Diagram
Lecture 12 Goals Be able to encode using a linear block code Be able to decode a linear block code received over a binary symmetric channel or an additive white Gaussian channel XII-1 Block Diagram Data
More informationArtificial Neural Networks Examination, June 2005
Artificial Neural Networks Examination, June 2005 Instructions There are SIXTY questions. (The pass mark is 30 out of 60). For each question, please select a maximum of ONE of the given answers (either
More informationNeural Networks 2. 2 Receptive fields and dealing with image inputs
CS 446 Machine Learning Fall 2016 Oct 04, 2016 Neural Networks 2 Professor: Dan Roth Scribe: C. Cheng, C. Cervantes Overview Convolutional Neural Networks Recurrent Neural Networks 1 Introduction There
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationDeep Feedforward Networks
Deep Feedforward Networks Yongjin Park 1 Goal of Feedforward Networks Deep Feedforward Networks are also called as Feedforward neural networks or Multilayer Perceptrons Their Goal: approximate some function
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More informationRegression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)
Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features
More information