Course in Data Science

Similar documents
Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Experimental Design and Data Analysis for Biologists

6.036 midterm review. Wednesday, March 18, 15

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Chart types and when to use them

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Pattern Recognition and Machine Learning

Subject CS1 Actuarial Statistics 1 Core Principles

FINAL: CS 6375 (Machine Learning) Fall 2014

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

CS 6375 Machine Learning

The exam is closed book, closed notes except your one-page (two sides) or two-page (one side) crib sheet.

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

PATTERN CLASSIFICATION

9/26/17. Ridge regression. What our model needs to do. Ridge Regression: L2 penalty. Ridge coefficients. Ridge coefficients

sphericity, 5-29, 5-32 residuals, 7-1 spread and level, 2-17 t test, 1-13 transformations, 2-15 violations, 1-19

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

Part I. Linear regression & LASSO. Linear Regression. Linear Regression. Week 10 Based in part on slides from textbook, slides of Susan Holmes

Lecture 2: Linear Algebra Review

CS534 Machine Learning - Spring Final Exam

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Statistics Toolbox 6. Apply statistical algorithms and probability models

SUPERVISED LEARNING: INTRODUCTION TO CLASSIFICATION

Machine learning for pervasive systems Classification in high-dimensional spaces

Prerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3

Data Mining. 3.6 Regression Analysis. Fall Instructor: Dr. Masoud Yaghini. Numeric Prediction

Introduction to Spatial Analysis. Spatial Analysis. Session organization. Learning objectives. Module organization. GIS and spatial analysis

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Learning with multiple models. Boosting.

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

From Practical Data Analysis with JMP, Second Edition. Full book available for purchase here. About This Book... xiii About The Author...

VBM683 Machine Learning

Linear regression. Linear regression is a simple approach to supervised learning. It assumes that the dependence of Y on X 1,X 2,...X p is linear.

From statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu

For Bonnie and Jesse (again)

BIOMETRICS INFORMATION

Machine Learning Practice Page 2 of 2 10/28/13

Stat 5101 Lecture Notes

Chapter 1 Statistical Inference

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

STATS216v Introduction to Statistical Learning Stanford University, Summer Midterm Exam (Solutions) Duration: 1 hours

ECE 5424: Introduction to Machine Learning

Regression Model Building

Neural Networks. Nethra Sambamoorthi, Ph.D. Jan CRMportals Inc., Nethra Sambamoorthi, Ph.D. Phone:

Applied Regression Modeling

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Real Estate Price Prediction with Regression and Classification CS 229 Autumn 2016 Project Final Report

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Kernel Methods. Konstantin Tretyakov MTAT Machine Learning

Introduction to Machine Learning. Introduction to ML - TAU 2016/7 1

Stat 502X Exam 2 Spring 2014

Notes on Discriminant Functions and Optimal Classification

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

18.9 SUPPORT VECTOR MACHINES

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Text Mining. Dr. Yanjun Li. Associate Professor. Department of Computer and Information Sciences Fordham University

STATISTICS-STAT (STAT)

Learning From Data Lecture 15 Reflecting on Our Path - Epilogue to Part I

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

CS7267 MACHINE LEARNING

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

CONTEMPORARY ANALYTICAL ECOSYSTEM PATRICK HALL, SAS INSTITUTE

INTRODUCTORY REGRESSION ANALYSIS

> DEPARTMENT OF MATHEMATICS AND COMPUTER SCIENCE GRAVIS 2016 BASEL. Logistic Regression. Pattern Recognition 2016 Sandro Schönborn University of Basel

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Linear Classifiers. Blaine Nelson, Tobias Scheffer

INTRODUCTION TO DATA SCIENCE

Machine Learning Linear Classification. Prof. Matteo Matteucci

The exam is closed book, closed notes except your one-page cheat sheet.

Numerical Learning Algorithms

7.1. Correlation analysis. Regression.

Support Vector Machine I

Pharmaceutical Experimental Design and Interpretation

PAC-learning, VC Dimension and Margin-based Bounds

Data Mining und Maschinelles Lernen

Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Bayesian Learning (II)

Reduced Error Logistic Regression

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

CS6220: DATA MINING TECHNIQUES

Wiley. Methods and Applications of Linear Models. Regression and the Analysis. of Variance. Third Edition. Ishpeming, Michigan RONALD R.

CS6220: DATA MINING TECHNIQUES

Introduction to Machine Learning Midterm, Tues April 8

Prediction of Bike Rental using Model Reuse Strategy

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

The prediction of house price

Prediction & Feature Selection in GLM

Discovery Through Situational Awareness

Introduction to Machine Learning Midterm Exam

day month year documentname/initials 1

Holdout and Cross-Validation Methods Overfitting Avoidance

Nonparametric Bayesian Methods (Gaussian Processes)

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Linear Models in Statistics

On Bias, Variance, 0/1-Loss, and the Curse-of-Dimensionality. Weiqiang Dong

Practical Statistics for the Analytical Scientist Table of Contents

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

Transcription:

Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an overview of the data, questions, and tools that data analysts and data scientists work with. There are two components to this course. The first is a conceptual introduction to the ideas behind turning data into actionable knowledge. The second is a practical introduction to the tools that will be used in the program like R Programming, SAS, MINITAB and EXCEL. Course features: Exclusive doubt clarification session on every weekend Real Time Case Study driven approach Live Project Placement Assistance Qualification Any Graduate. No programming and statistics knowledge or skills required Duration of the course: 4 months(on working days-one hour and weekends-2hrs). Mode of course delivery Classroom/Online Training Faculty Details: A team of faculty having an average 20 + years experience in the data analysis across various industries and training.

Module:1 - Descriptive & Inferential Statistics:(35 Hrs) 1. Turning Data into Information 5. Hypothesis Testing Data Visualization Hypothesis Testing Measures of Central Tendency Measures of Variability Type I and Type II Errors Decision Making in Hypothesis Testing Measures of Shape Covariance, Correlation Hypothesis Testing for a Mean, Variance, Proportion Power in Hypothesis Testing 2. Probability Distributions Probability Distributions: Discrete Random Variables Mean, Expected Value Binomial Random Variable Poisson Random Variable Continuous Random Variable Normal distribution 3. Sampling Distributions Central Limit Theorem Sampling Distributions for Sample Proportion, p-hat Sampling Distribution of the Sample Mean, x-bar 4. Confidence Intervals Statistical Inference Constructing confidence intervals to estimate a population Mean, Variance, Proportion 6. Comparing Two Groups Comparing Two Groups Comparing Two Independent Means, Proportions Pairs wise testing for Means Two Variances Test(F-Test) 7. Analysis of Variance (ANOVA) One-Way and Two-way ANOVA ANOVA Assumptions Multiple Comparisons (Tukey, Dunnett) 8. Association Between Categorical Variables Two Categorical Variables Relation Statistical Significance of Observed Relationship / Chi-Square Test Calculating the Chi-Square Test Statistic Contingency Table

Module:2 Prediction Analytics (35Hrs) 1. Simple Linear Regression Simple Linear Regression Model Least-Square Estimation of the Parameters Hypothesis Testing on the Slope and Intercept Coefficient of Determination Using Software-Real Time 2. Multiple Linear Regression Multiple Regression Models Estimation of Model Parameters Hypothesis Testing in Multiple Linear Regression Multicollinearity 3. Model Adequacy Checking Residual Analysis The PRESS Statistic Detection and Treatment of Outliers Lack of Fit of the Regression Model 4. Transformations Variance-Stabilizing Transformations Transformations to Linearize the Model Box-Cox, Tidwell Transformations Generalized and Weighted Least Squares 5. Diagnostics for Leverage and Influence Leverage/ Cook s D /DFFITS/DFBETAS Treatment of Influential Observations 6. Polynomial Regression Polynomial Model in One/ Two /More Variable 7. Dummy Variables The General Concept of Indicator Variables 8. Variables Selection and Model Building Forward Selection/Backward Elimination Stepwise Regression 9. Generalized Linear Models Concept of GLM Logistic Regression Poisson Regression Negative Binomial Regression Exponential Regression 10. Autocorrelation Regression Models with Autocorrelation Errors

Module:3 Applied Multivariate Analysis (20hrs) 1. Measures of Central Tendency, Dispersion and Association Measures of Central Tendency/ Measures of Dispersion 2. Multivariate Normal Distribution Exponent of Multivariate Normal Distribution Multivariate Normality and Outliers Eigenvalues and Eigenvectors Spectral Decomposition Single Value Decomposition 3. Sample Mean Vector and Sample Correlation Distribution of Sample Mean Vector Interval Estimate of Population Mean Inferences for Correlations 4. Principal Components Analysis (PCA) Principal Component Analysis (PCA) Procedure 5. Factor Analysis Principal Component Method Communalities Factor Rotations Varimax Rotation 6. Discriminant Analysis Discriminant Analysis (Linear/Quadratic) Estimating Misclassification Probabilities 7. MANOVA MANOVA Test Statistics for MANOVA Hypothesis Tests MANOVA table

Module:4 - Machine Learning(30hrs) 1. Introduction 6. Support Vector Machine Application Examples Maximum Marginal Classifier Supervised Learning Support Vector Classifier Unsupervised Learning Kernel Trick 2. Regression Shrinkage Methods Ridge Regression Lasso Regression 3. Classification Variance-Bias Tradeoff Gradient Descent/Ascent Procedure Maximum Likelihood Method Logistic Regression Bayes Law Naïve Bayes Nearest-Neighbor Methods (K-NN Classifier) 4. Tree-based Methods The Basics of Decision Trees Regression Trees Classification Trees Ensemble Methods Bagging, Bootstrap, Random Forests, Boosting 5. Neural Networks Introduction Single Layer Perceptron Multi-layer Perceptron Forward Feed and Backward Propagation Support Vector Machine SVMs with More than Two Classes 7. Cluster Analysis Agglomerative Hierarchical Clustering K-Means Procedure Medoid Cluster Analysis 8. Dimensionality Reduction Principal Component Analysis 9. Association rules Market Basket Analysis Apriori/Support/Confidence/Lift

Module:5 - R Programming (20hrs) 1. R Programming 2. Data Analytics Using R R Basics Module 1-4 demonstrated using R Numbers, Attributes programming Creating Vector Mixing Objects Explicit Coercion Formatting Data Values Matrices, List, Factors, Data Frames, Missing Values, Names Reading and Writing Data Using Dput/DDump Interface to the Outside world Sub setting R objects Vectorized Operations Dates and Times Managing Data Frames with the DPLYR package Control Structures Functions Lexical /Dynamic Scoping Loop Functions Debugging