EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 11

Size: px
Start display at page:

Download "EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 11"

Transcription

1 EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture Department of Electrical and Computer Engineering Cleveland State University (based on Dr. Raj Jain s lecture notes) Outline Move nd midterm to Nov.? Review of lecture 0 Other regression models Eperimental design

2 Midterm # 3 P P P3 P P5 P6 P7 P8 Total Workload Characterization Techniques Workload characterization: the process of studying the real-user environments, observe the key characteristics, and develop a workload model that can be used repeated The measured workload data consists of services requested or the resource demands of a number of users on the system The term user denotes the entity that makes the service requests at the SUT interface

3 5 Workload Characterization Techniques In workload characterization literature, the term workload component or workload unit is used instead of the user The workload component should be at the SUT interface Workload parameters or workload features Measured quantities, service requests, or resource demands For eample: transaction types, instructions, packet sizes, source-destinations of a packet, and page reference pattern Each component should represent as homogeneous a group as possible 6 Workload Characterization Techniques Averaging Single-parameter histograms Multiparameter histograms Principal component analysis Markov models Clustering 3

4 Principal Component Analysis 7 Key idea: use a weighted sum of parameters to classify the components Let ij denote the ith parameter for jth component y j = Principal component analysis assigns weights w i s such that y j s provide the maimum discrimination among the components The quantity y j is called the principal factor The factors are ordered. First factor eplains the highest percentage of the variance n w i ij Finding Principal Factors 8 Find the correlation matri Find the eigenvalues of the matri and sort them in the order of decreasing magnitude Find corresponding eigenvectors. These give the required loadings

5 Markov Models 9 Markov => the net request depends only on the last request Described by a transition matri Given the same relative frequency of requests of different types, it is possible to realize the frequency with several different transition matrices Clustering 0 Take a sample, that is, a subset of workload components Select workload parameters Select a distance measure Remove outliers Scale all observations Perform clustering Interpret results Change parameters, or number of clusters, and repeat 3-7 Select representative components from each cluster 5

6 Multiple Linear Regression Models A multiple linear regression model allows one to predict a response variable y as a function of k predictor variables,,, k using a linear model: Given a sample {(,,, k,y ),,( n, n,, kn,y n )} of n observations, the model consists of the following n equations: y = b0 + b + b+ + bkk+ e y = b + b + b + + b + e 0 k k y = b + b + b + + b + e n 0 n n n k kn n Multiple Linear Regression Models In vector notation, we have: y k b e y k b e..... = yn n n kn bn en or y = Xb+ e b= X X X y T T ( ) ( ) 6

7 Regression with Categorical Predictors 3 Eamples of categorical variables: CPU types To represent a categorical variable that can take only values, we can define a binary variable, j, that takes levels: + and -, i.e., first value j = + second value For a categorical variable that takes 3 values, cannot simply define a three-value variable because it implies an order type A = type B 3 type C Regression with Categorical Predictors The recommended coding is to use predictor variables: if type A if type B = = 0 otherwise 0 otherwise The 3 types can be represented by (, ) pairs: (, ) = (,0) type A (, ) = (0,) type B (, ) = (0,0) type C 7

8 Regression with Categorical Predictors The regression model for a 3-level categorical variable: y = b 0 + b + b + e The average responses for the three types are: ya = b0 + b yb = b0 + b yc = b0 Parameter b represents the difference between average responses with types A and C Parameter b represents the difference between average responses with types B and C Parameter b 0 represents average response with type C 5 Regression with Categorical Predictors 6 To represent a categorical variable with k levels (or k categories), we need to define k binary variables as follows: j if jth value = 0 otherwise The kth value is defined by = = = k- =0. The regression parameter b 0 represents the average response with the kth alternative The parameter b j represents the difference between the average responses with alternatives j and k 8

9 Curvilinear Regression 7 Curvilinear regression: if the nonlinear function can be converted into a linear form, then the regression can be carried out using the simple or multiple linear regression techniques Nonlinear Linear y = a + b/ y = a+ b(/) y = /(a+b) (/y) = a + b y = /(a+b) (/y) = a + b y = ab ln y = ln a + (ln b)ln y = a + b n y = a + b( n ) y = b a ln y = ln b + a ln Transformations 8 Transformation: when some function of the measured response variable y is used in place of y in a model E.g., squire root transformation: y = b0 + b + b + + bkk + e 9

10 When Transformations Are Needed 9 If it is known from physical considerations of the system that a function of the response rather than the response itself is a better variable to use in the model Measured the interarrival times y for requests and it is known that the number of requests per unit time (/y) has a linear relationship to a certain predictor If the range of the data covers several orders of magnitude and the sample size is small => use transformation to reduce the range of variability If the homogeneous variance assumption of the residuals is violated Eperimental Design and Analysis 0 Design a proper set of eperiments for measurement or simulation Develop a model that best describes the data obtained Estimate the contribution of each alternative to the performance Isolate the measurement errors Estimate confidence intervals for model parameters Check if the alternatives are significantly different Check if the model is adequate 0

11 Terminology Response Variables: outcome E.g., throughput, response time Factors: variables that affect the response variable E.g., CPU type, memory size, number of disk drives, workload used, and user s educational level Also called predictor variables or predictors Levels: the values that a factor can assume E.g., the CPU type has three levels: 68000, 8080, Z80 Also called treatment Terminology Primary Factors: the factors whose effects need to be quantified E.g., CPU type, memory size only, and number of disk drives Secondary Factors: factors whose impact need not be quantified Replication: repetition of all or some eperiments Design: the number of eperiments, the factor level and number of replications for each eperiment E.g., full factorial design with 5 replications: or 3 eperiments, each repeated five times

12 Terminology 3 Eperimental Unit: any entity that is used for eperiments. Usually only those eperimental units that are considered as one of the factors are of interest E.g., users hired to use the workstation while measurements are being performed can be considered as the eperimental unit. Generally, no interest in comparing the units Goal: minimize the impact of variation among the units Terminology Interaction: effect of one factor depends upon the level of the other Eample: two factors A and B, each has two levels

13 Types of Eperimental Designs 5 Given k factors, with ith factor having n i levels Simple Designs: vary one factor at a time Number of eperiments n = + Not statistically efficient Wrong conclusions if the factors have interaction Not recommended k ( n i ) Types of Eperimental Designs 6 Full Factorial Design: all combinations Number of eperiments = Can find the effect of all factors Too much time and money Ways to reduce the number of eperiments Reduce the number of levels for each factor, e.g., levels per factor Reduce the number of factors Use fractional factorial designs k n i 3

14 Types of Eperimental Designs 7 Fractional Factorial Designs: save time and epense Less information May not get all interactions Not a problem if negligible interactions Types of Eperimental Designs 8 A sample fractional factorial design 9 eprs (3 - design) instead of 8 (3 design)

15 k Factorial Designs 9 k factors, each at two levels Easy to analyze Helps in sorting out impact of factors Good at the beginning of a study Valid only if the effect of a factor is unidirectional, i.e., the performance either continuously decreases or continuously increases as the factor is increased from min to ma E.g., memory size, the number of disk drives Factorial Designs: Eample 30 Two factors, each at two levels 5

16 Factorial Designs: Model 3 y = q 0 + q A A + q B B + q AB A B 5 = q 0 q A q B + q AB 5 = q 0 + q A q B + q AB 5 = q 0 q A + q B q AB 75 = q 0 + q A + q B + q AB Unique solution for q A and q B : y = A + 0 B + 5 A B Interpretation: Mean performance = 0 MIPS Effect of memory = 0 MIPS Effect of cache = 0 MIPS Interaction between memory and cache = 5 MIPS Computation of Effects 3 Model: y = q 0 + q A A + q B B + q AB A B Substitution: y = q 0 q A q B + q AB y = q 0 + q A q B + q AB y 3 = q 0 q A + q B q AB y = q 0 + q A + q B + q AB 6

17 Computation of Effects 33 Solution: q 0 = ¼ (y +y +y 3 +y ) q A = ¼ (-y +y -y 3 +y ) q B = ¼ (-y +y +y 3 +y ) q AB = ¼ (y -y -y 3 +y ) Notice that effects are linear combinations of responses. Sum of the coefficients is zero => contrasts Notice: q A = Column A Column y q B = Column B Column y q AB = Column A Column B Column y Sign Table Method 3 For a design, the effects can be computed easily by preparing a sign matri as shown above Net, multiply the entries in column I by those in column y and put their sum under column I; perform similar calculation for other columns The sums under each column are divided by to give the corresponding coefficients of the regression model 7

18 Allocation of Variation 35 Importance of a factor = proportion of the variation eplained ( ) Sample Variance of = = yi y y sy Variation of y Δ Numerator = i = ( y i y) = sum of squares total ( SST ) Allocation of Variation 36 For a design SST = + qa + qb qab Variation due to A=SSA= Variation due to B=SSB= qa q B Variation due to interaction AB=SSAB= SST = SSA + SSB + SSAB Fraction eplained by A = SSA / SST Variation Variance qab 8

19 Derivation 37 Model: y q + q + q + q Notice i = 0 A The sum of entries in each column is zero The sum of the squares of entries in each column is : B = 0; = 0; = ; = ; ( AB = 0 ) = Derivation Notice (continued) The columns are orthogonal (inner product of any two columns is zero): Sample mean = 0; ( ) = 0; y = = = = q 0 y i ( q 0 + q A q0 + q A + q B ( + q + q AB B ) = 0 ) + q AB 38 9

20 Derivation 39 Variation of y = = = = q = q ( y A A i ( q A ( q A y) ( + q ) ) B + q + B + q + q B AB + q B AB ( q ( ) ) + ) + q ( q AB AB ( ) ) + product terms + 0 Eample 0 Memory-cache study: y = ( ) = 0 Total variation = Total variation = 00 Variation due to memory = 600 (76%) Variation due to cache = 00 (9%) Variation due to interaction = 00 (5%) = ( y i y) = ( ) = 00 0

21 Case Study: Interconnection Nets Memory interconnection networks: Omega and Crossbar Memory reference patterns Random (with uniform probability) Matri multiplication problem in which each process is doing a part of the multiplication Case Study: Interconnection Nets Fied factors: Number of processors was fied at 6 Queued requests were not buffered but blocked Circuit switching instead of packet switching Random arbitration instead of round robin Infinite interleaving of memory => no memory bank contention

22 Design for Interconnection Networks 3 Interpretation of Results Average throughput = More effective factor = B = reference pattern => The address patterns chosen are very different Reference pattern eplains ±0.57 (77%) of variation Effect of network type = Omega networks = average Crossbar networks = average Difference between the two = 0.9 Slight interaction (0.036) between reference pattern and network type

23 General k Factorial Designs 5 k factors at two levels each k eperiments k effects: k main effects k two factor interactions k three factor interactions 3 k Design Eample 6 Three factors in designing a machine Cache size Memory size Number of processors Factor A Memory Size B Cache Size C Number of Processors Level - MB kb Level 6MB kb 3

24 k Design Eample 7 k Design Eample - Analysis 8 3 SST = ( qa + qb + qc + qab + qbc + qac + qabc ) = 8( ) = = 8% + % + 7% + % + % + % + 0% = 00% Number of processors (C) is the most important factor

2 k Factorial Designs Raj Jain

2 k Factorial Designs Raj Jain 2 k Factorial Designs Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-06/ 17-1 Overview!

More information

2 k Factorial Designs Raj Jain Washington University in Saint Louis Saint Louis, MO These slides are available on-line at:

2 k Factorial Designs Raj Jain Washington University in Saint Louis Saint Louis, MO These slides are available on-line at: 2 k Factorial Designs Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: 17-1 Overview 2 2 Factorial Designs Model Computation

More information

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19

EEC 686/785 Modeling & Performance Evaluation of Computer Systems. Lecture 19 EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture 19 Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org (based on Dr. Raj Jain s lecture

More information

Outline. Simulation of a Single-Server Queueing System. EEC 686/785 Modeling & Performance Evaluation of Computer Systems.

Outline. Simulation of a Single-Server Queueing System. EEC 686/785 Modeling & Performance Evaluation of Computer Systems. EEC 686/785 Modeling & Performance Evaluation of Computer Systems Lecture 19 Outline Simulation of a Single-Server Queueing System Review of midterm # Department of Electrical and Computer Engineering

More information

2 k, 2 k r and 2 k-p Factorial Designs

2 k, 2 k r and 2 k-p Factorial Designs 2 k, 2 k r and 2 k-p Factorial Designs 1 Types of Experimental Designs! Full Factorial Design: " Uses all possible combinations of all levels of all factors. n=3*2*2=12 Too costly! 2 Types of Experimental

More information

CS 5014: Research Methods in Computer Science. Experimental Design. Potential Pitfalls. One-Factor (Again) Clifford A. Shaffer.

CS 5014: Research Methods in Computer Science. Experimental Design. Potential Pitfalls. One-Factor (Again) Clifford A. Shaffer. Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 254 Experimental

More information

Summarizing Measured Data

Summarizing Measured Data Summarizing Measured Data 12-1 Overview Basic Probability and Statistics Concepts: CDF, PDF, PMF, Mean, Variance, CoV, Normal Distribution Summarizing Data by a Single Number: Mean, Median, and Mode, Arithmetic,

More information

Analysis of Software Artifacts

Analysis of Software Artifacts Analysis of Software Artifacts System Performance I Shu-Ngai Yeung (with edits by Jeannette Wing) Department of Statistics Carnegie Mellon University Pittsburgh, PA 15213 2001 by Carnegie Mellon University

More information

Convolution Algorithm

Convolution Algorithm Convolution Algorithm Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse567-08/

More information

Summarizing Measured Data

Summarizing Measured Data Performance Evaluation: Summarizing Measured Data Hongwei Zhang http://www.cs.wayne.edu/~hzhang The object of statistics is to discover methods of condensing information concerning large groups of allied

More information

Fractional Factorial Designs

Fractional Factorial Designs k-p Fractional Factorial Designs Fractional Factorial Designs If we have 7 factors, a 7 factorial design will require 8 experiments How much information can we obtain from fewer experiments, e.g. 7-4 =

More information

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance

More information

Two Factor Full Factorial Design with Replications

Two Factor Full Factorial Design with Replications Two Factor Full Factorial Design with Replications Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-08/

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

One Factor Experiments

One Factor Experiments One Factor Experiments Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu These slides are available on-line at: http://www.cse.wustl.edu/~jain/cse567-06/ 20-1 Overview!

More information

Summarizing Measured Data

Summarizing Measured Data Summarizing Measured Data Dr. John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu COMP 528 Lecture 7 3 February 2005 Goals for Today Finish discussion of Normal Distribution

More information

CS 5014: Research Methods in Computer Science

CS 5014: Research Methods in Computer Science Computer Science Clifford A. Shaffer Department of Computer Science Virginia Tech Blacksburg, Virginia Fall 2010 Copyright c 2010 by Clifford A. Shaffer Computer Science Fall 2010 1 / 207 Correlation and

More information

Two-Factor Full Factorial Design with Replications

Two-Factor Full Factorial Design with Replications Two-Factor Full Factorial Design with Replications Dr. John Mellor-Crummey Department of Computer Science Rice University johnmc@cs.rice.edu COMP 58 Lecture 17 March 005 Goals for Today Understand Two-factor

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Advanced Regression Techniques CS 147: Computer Systems Performance Analysis Advanced Regression Techniques 1 / 31 Overview Overview Overview Common Transformations

More information

Module 5: CPU Scheduling

Module 5: CPU Scheduling Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 5.1 Basic Concepts Maximum CPU utilization obtained

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Correlation and simple linear regression S5

Correlation and simple linear regression S5 Basic medical statistics for clinical and eperimental research Correlation and simple linear regression S5 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/41 Introduction Eample: Brain size and

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis 1 / 34 Overview Overview Overview Adding Replications Adding Replications 2 / 34 Two-Factor Design Without Replications

More information

Categorical Predictor Variables

Categorical Predictor Variables Categorical Predictor Variables We often wish to use categorical (or qualitative) variables as covariates in a regression model. For binary variables (taking on only 2 values, e.g. sex), it is relatively

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Machine learning for pervasive systems Classification in high-dimensional spaces

Machine learning for pervasive systems Classification in high-dimensional spaces Machine learning for pervasive systems Classification in high-dimensional spaces Department of Communications and Networking Aalto University, School of Electrical Engineering stephan.sigg@aalto.fi Version

More information

TDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts.

TDDI04, K. Arvidsson, IDA, Linköpings universitet CPU Scheduling. Overview: CPU Scheduling. [SGG7] Chapter 5. Basic Concepts. TDDI4 Concurrent Programming, Operating Systems, and Real-time Operating Systems CPU Scheduling Overview: CPU Scheduling CPU bursts and I/O bursts Scheduling Criteria Scheduling Algorithms Multiprocessor

More information

ww.padasalai.net

ww.padasalai.net t w w ADHITHYA TRB- TET COACHING CENTRE KANCHIPURAM SUNDER MATRIC SCHOOL - 9786851468 TEST - 2 COMPUTER SCIENC PG - TRB DATE : 17. 03. 2019 t et t et t t t t UNIT 1 COMPUTER SYSTEM ARCHITECTURE t t t t

More information

Factorial designs. Experiments

Factorial designs. Experiments Chapter 5: Factorial designs Petter Mostad mostad@chalmers.se Experiments Actively making changes and observing the result, to find causal relationships. Many types of experimental plans Measuring response

More information

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University

Che-Wei Chang Department of Computer Science and Information Engineering, Chang Gung University Che-Wei Chang chewei@mail.cgu.edu.tw Department of Computer Science and Information Engineering, Chang Gung University } 2017/11/15 Midterm } 2017/11/22 Final Project Announcement 2 1. Introduction 2.

More information

Chapter 6: CPU Scheduling

Chapter 6: CPU Scheduling Chapter 6: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation 6.1 Basic Concepts Maximum CPU utilization obtained

More information

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E.

= main diagonal, in the order in which their corresponding eigenvectors appear as columns of E. 3.3 Diagonalization Let A = 4. Then and are eigenvectors of A, with corresponding eigenvalues 2 and 6 respectively (check). This means 4 = 2, 4 = 6. 2 2 2 2 Thus 4 = 2 2 6 2 = 2 6 4 2 We have 4 = 2 0 0

More information

3. Factorial Experiments (Ch.5. Factorial Experiments)

3. Factorial Experiments (Ch.5. Factorial Experiments) 3. Factorial Experiments (Ch.5. Factorial Experiments) Hae-Jin Choi School of Mechanical Engineering, Chung-Ang University DOE and Optimization 1 Introduction to Factorials Most experiments for process

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Unbalanced Data in Factorials Types I, II, III SS Part 1

Unbalanced Data in Factorials Types I, II, III SS Part 1 Unbalanced Data in Factorials Types I, II, III SS Part 1 Chapter 10 in Oehlert STAT:5201 Week 9 - Lecture 2 1 / 14 When we perform an ANOVA, we try to quantify the amount of variability in the data accounted

More information

ICS 233 Computer Architecture & Assembly Language

ICS 233 Computer Architecture & Assembly Language ICS 233 Computer Architecture & Assembly Language Assignment 6 Solution 1. Identify all of the RAW data dependencies in the following code. Which dependencies are data hazards that will be resolved by

More information

14 Random Variables and Simulation

14 Random Variables and Simulation 14 Random Variables and Simulation In this lecture note we consider the relationship between random variables and simulation models. Random variables play two important roles in simulation models. We assume

More information

Embedded Systems 23 BF - ES

Embedded Systems 23 BF - ES Embedded Systems 23-1 - Measurement vs. Analysis REVIEW Probability Best Case Execution Time Unsafe: Execution Time Measurement Worst Case Execution Time Upper bound Execution Time typically huge variations

More information

Operational Laws Raj Jain

Operational Laws Raj Jain Operational Laws 33-1 Overview What is an Operational Law? 1. Utilization Law 2. Forced Flow Law 3. Little s Law 4. General Response Time Law 5. Interactive Response Time Law 6. Bottleneck Analysis 33-2

More information

Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals.

Chapter 9 Regression. 9.1 Simple linear regression Linear models Least squares Predictions and residuals. 9.1 Simple linear regression 9.1.1 Linear models Response and eplanatory variables Chapter 9 Regression With bivariate data, it is often useful to predict the value of one variable (the response variable,

More information

Lecture 5: Linear Regression

Lecture 5: Linear Regression EAS31136/B9036: Statistics in Earth & Atmospheric Sciences Lecture 5: Linear Regression Instructor: Prof. Johnny Luo www.sci.ccny.cuny.edu/~luo Dates Topic Reading (Based on the 2 nd Edition of Wilks book)

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

7. Variable extraction and dimensionality reduction

7. Variable extraction and dimensionality reduction 7. Variable extraction and dimensionality reduction The goal of the variable selection in the preceding chapter was to find least useful variables so that it would be possible to reduce the dimensionality

More information

EIE 240 Electrical and Electronic Measurements Class 2: January 16, 2015 Werapon Chiracharit. Measurement

EIE 240 Electrical and Electronic Measurements Class 2: January 16, 2015 Werapon Chiracharit. Measurement EIE 240 Electrical and Electronic Measurements Class 2: January 16, 2015 Werapon Chiracharit Measurement Measurement is to determine the value or size of some quantity, e.g. a voltage or a current. Analogue

More information

Multiple Predictor Variables: ANOVA

Multiple Predictor Variables: ANOVA Multiple Predictor Variables: ANOVA 1/32 Linear Models with Many Predictors Multiple regression has many predictors BUT - so did 1-way ANOVA if treatments had 2 levels What if there are multiple treatment

More information

Queueing systems. Renato Lo Cigno. Simulation and Performance Evaluation Queueing systems - Renato Lo Cigno 1

Queueing systems. Renato Lo Cigno. Simulation and Performance Evaluation Queueing systems - Renato Lo Cigno 1 Queueing systems Renato Lo Cigno Simulation and Performance Evaluation 2014-15 Queueing systems - Renato Lo Cigno 1 Queues A Birth-Death process is well modeled by a queue Indeed queues can be used to

More information

COMP9334: Capacity Planning of Computer Systems and Networks

COMP9334: Capacity Planning of Computer Systems and Networks COMP9334: Capacity Planning of Computer Systems and Networks Week 2: Operational analysis Lecturer: Prof. Sanjay Jha NETWORKS RESEARCH GROUP, CSE, UNSW Operational analysis Operational: Collect performance

More information

If we have many sets of populations, we may compare the means of populations in each set with one experiment.

If we have many sets of populations, we may compare the means of populations in each set with one experiment. Statistical Methods in Business Lecture 3. Factorial Design: If we have many sets of populations we may compare the means of populations in each set with one experiment. Assume we have two factors with

More information

Analysis of Variance and Design of Experiments-I

Analysis of Variance and Design of Experiments-I Analysis of Variance and Design of Experiments-I MODULE VIII LECTURE - 35 ANALYSIS OF VARIANCE IN RANDOM-EFFECTS MODEL AND MIXED-EFFECTS MODEL Dr. Shalabh Department of Mathematics and Statistics Indian

More information

Queuing Networks. - Outline of queuing networks. - Mean Value Analisys (MVA) for open and closed queuing networks

Queuing Networks. - Outline of queuing networks. - Mean Value Analisys (MVA) for open and closed queuing networks Queuing Networks - Outline of queuing networks - Mean Value Analisys (MVA) for open and closed queuing networks 1 incoming requests Open queuing networks DISK CPU CD outgoing requests Closed queuing networks

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Statistics Toolbox 6. Apply statistical algorithms and probability models

Statistics Toolbox 6. Apply statistical algorithms and probability models Statistics Toolbox 6 Apply statistical algorithms and probability models Statistics Toolbox provides engineers, scientists, researchers, financial analysts, and statisticians with a comprehensive set of

More information

STAT 350: Geometry of Least Squares

STAT 350: Geometry of Least Squares The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t (AB) rt = s A rs B st Partitioned Matrices

More information

Chapter 11: Factorial Designs

Chapter 11: Factorial Designs Chapter : Factorial Designs. Two factor factorial designs ( levels factors ) This situation is similar to the randomized block design from the previous chapter. However, in addition to the effects within

More information

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN

I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN Introduction Edps/Psych/Stat/ 584 Applied Multivariate Statistics Carolyn J Anderson Department of Educational Psychology I L L I N O I S UNIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN c Board of Trustees,

More information

Introduction to Queueing Theory

Introduction to Queueing Theory Introduction to Queueing Theory Raj Jain Washington University in Saint Louis Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse567-11/

More information

High Performance Computing

High Performance Computing Master Degree Program in Computer Science and Networking, 2014-15 High Performance Computing 2 nd appello February 11, 2015 Write your name, surname, student identification number (numero di matricola),

More information

Review of Linear Algebra

Review of Linear Algebra Review of Linear Algebra Definitions An m n (read "m by n") matrix, is a rectangular array of entries, where m is the number of rows and n the number of columns. 2 Definitions (Con t) A is square if m=

More information

Principal Component Analysis

Principal Component Analysis Principal Component Analysis Motivation Principal Component Analysis (PCA) is a multivariate statistical technique that is often useful in reducing dimensionality of a collection of unstructured random

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Factor Analysis. Qian-Li Xue

Factor Analysis. Qian-Li Xue Factor Analysis Qian-Li Xue Biostatistics Program Harvard Catalyst The Harvard Clinical & Translational Science Center Short course, October 7, 06 Well-used latent variable models Latent variable scale

More information

Bayesian Analysis of Massive Datasets Via Particle Filters

Bayesian Analysis of Massive Datasets Via Particle Filters Bayesian Analysis of Massive Datasets Via Particle Filters Bayesian Analysis Use Bayes theorem to learn about model parameters from data Examples: Clustered data: hospitals, schools Spatial models: public

More information

CSC 411 Lecture 12: Principal Component Analysis

CSC 411 Lecture 12: Principal Component Analysis CSC 411 Lecture 12: Principal Component Analysis Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 12-PCA 1 / 23 Overview Today we ll cover the first unsupervised

More information

Lecture #11: Classification & Logistic Regression

Lecture #11: Classification & Logistic Regression Lecture #11: Classification & Logistic Regression CS 109A, STAT 121A, AC 209A: Data Science Weiwei Pan, Pavlos Protopapas, Kevin Rader Fall 2016 Harvard University 1 Announcements Midterm: will be graded

More information

Linear Algebraic Equations

Linear Algebraic Equations Linear Algebraic Equations Linear Equations: a + a + a + a +... + a = c 11 1 12 2 13 3 14 4 1n n 1 a + a + a + a +... + a = c 21 2 2 23 3 24 4 2n n 2 a + a + a + a +... + a = c 31 1 32 2 33 3 34 4 3n n

More information

CE 601: Numerical Methods Lecture 7. Course Coordinator: Dr. Suresh A. Kartha, Associate Professor, Department of Civil Engineering, IIT Guwahati.

CE 601: Numerical Methods Lecture 7. Course Coordinator: Dr. Suresh A. Kartha, Associate Professor, Department of Civil Engineering, IIT Guwahati. CE 60: Numerical Methods Lecture 7 Course Coordinator: Dr. Suresh A. Kartha, Associate Professor, Department of Civil Engineering, IIT Guwahati. Drawback in Elimination Methods There are various drawbacks

More information

FAQ: Linear and Multiple Regression Analysis: Coefficients

FAQ: Linear and Multiple Regression Analysis: Coefficients Question 1: How do I calculate a least squares regression line? Answer 1: Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable

More information

QUESTION ONE Let 7C = Total Cost MC = Marginal Cost AC = Average Cost

QUESTION ONE Let 7C = Total Cost MC = Marginal Cost AC = Average Cost ANSWER QUESTION ONE Let 7C = Total Cost MC = Marginal Cost AC = Average Cost Q = Number of units AC = 7C MC = Q d7c d7c 7C Q Derivation of average cost with respect to quantity is different from marginal

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Glossary availability cellular manufacturing closed queueing network coefficient of variation (CV) conditional probability CONWIP

Glossary availability cellular manufacturing closed queueing network coefficient of variation (CV) conditional probability CONWIP Glossary availability The long-run average fraction of time that the processor is available for processing jobs, denoted by a (p. 113). cellular manufacturing The concept of organizing the factory into

More information

Remedial Measures, Brown-Forsythe test, F test

Remedial Measures, Brown-Forsythe test, F test Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function

More information

CSE 140 Lecture 11 Standard Combinational Modules. CK Cheng and Diba Mirza CSE Dept. UC San Diego

CSE 140 Lecture 11 Standard Combinational Modules. CK Cheng and Diba Mirza CSE Dept. UC San Diego CSE 4 Lecture Standard Combinational Modules CK Cheng and Diba Mirza CSE Dept. UC San Diego Part III - Standard Combinational Modules (Harris: 2.8, 5) Signal Transport Decoder: Decode address Encoder:

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

Online Supplement to Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes?

Online Supplement to Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes? Online Supplement to Are Call Center and Hospital Arrivals Well Modeled by Nonhomogeneous Poisson Processes? Song-Hee Kim and Ward Whitt Industrial Engineering and Operations Research Columbia University

More information

Correlation Preserving Unsupervised Discretization. Outline

Correlation Preserving Unsupervised Discretization. Outline Correlation Preserving Unsupervised Discretization Jee Vang Outline Paper References What is discretization? Motivation Principal Component Analysis (PCA) Association Mining Correlation Preserving Discretization

More information

Pharmaceutical Experimental Design and Interpretation

Pharmaceutical Experimental Design and Interpretation Pharmaceutical Experimental Design and Interpretation N. ANTHONY ARMSTRONG, B. Pharm., Ph.D., F.R.Pharm.S., MCPP. KENNETH C. JAMES, M. Pharm., Ph.D., D.Sc, FRSC, F.R.Pharm.S., C.Chem. Welsh School of Pharmacy,

More information

Applied Statistics Qualifier Examination (Part II of the STAT AREA EXAM) January 25, 2017; 11:00AM-1:00PM

Applied Statistics Qualifier Examination (Part II of the STAT AREA EXAM) January 25, 2017; 11:00AM-1:00PM Instructions: Applied Statistics Qualifier Examination (Part II of the STAT AREA EXAM) January 5, 017; 11:00AM-1:00PM (1) The examination contains 4 Questions. You are to answer 3 out of 4 of them. ***

More information

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0

NEC PerforCache. Influence on M-Series Disk Array Behavior and Performance. Version 1.0 NEC PerforCache Influence on M-Series Disk Array Behavior and Performance. Version 1.0 Preface This document describes L2 (Level 2) Cache Technology which is a feature of NEC M-Series Disk Array implemented

More information

Uniform random numbers generators

Uniform random numbers generators Uniform random numbers generators Lecturer: Dmitri A. Moltchanov E-mail: moltchan@cs.tut.fi http://www.cs.tut.fi/kurssit/tlt-2707/ OUTLINE: The need for random numbers; Basic steps in generation; Uniformly

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Unsupervised Learning: Dimensionality Reduction

Unsupervised Learning: Dimensionality Reduction Unsupervised Learning: Dimensionality Reduction CMPSCI 689 Fall 2015 Sridhar Mahadevan Lecture 3 Outline In this lecture, we set about to solve the problem posed in the previous lecture Given a dataset,

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1

MATH 56A: STOCHASTIC PROCESSES CHAPTER 1 MATH 56A: STOCHASTIC PROCESSES CHAPTER. Finite Markov chains For the sake of completeness of these notes I decided to write a summary of the basic concepts of finite Markov chains. The topics in this chapter

More information

Stochastic calculus for summable processes 1

Stochastic calculus for summable processes 1 Stochastic calculus for summable processes 1 Lecture I Definition 1. Statistics is the science of collecting, organizing, summarizing and analyzing the information in order to draw conclusions. It is a

More information

Conjugate Gradient (CG) Method

Conjugate Gradient (CG) Method Conjugate Gradient (CG) Method by K. Ozawa 1 Introduction In the series of this lecture, I will introduce the conjugate gradient method, which solves efficiently large scale sparse linear simultaneous

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

CHAPTER 6 A STUDY ON DISC BRAKE SQUEAL USING DESIGN OF EXPERIMENTS

CHAPTER 6 A STUDY ON DISC BRAKE SQUEAL USING DESIGN OF EXPERIMENTS 134 CHAPTER 6 A STUDY ON DISC BRAKE SQUEAL USING DESIGN OF EXPERIMENTS 6.1 INTRODUCTION In spite of the large amount of research work that has been carried out to solve the squeal problem during the last

More information

Linear Algebra Practice Problems

Linear Algebra Practice Problems Math 7, Professor Ramras Linear Algebra Practice Problems () Consider the following system of linear equations in the variables x, y, and z, in which the constants a and b are real numbers. x y + z = a

More information

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling

Scheduling I. Today. Next Time. ! Introduction to scheduling! Classical algorithms. ! Advanced topics on scheduling Scheduling I Today! Introduction to scheduling! Classical algorithms Next Time! Advanced topics on scheduling Scheduling out there! You are the manager of a supermarket (ok, things don t always turn out

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS STAT 512 MidTerm I (2/21/2013) Spring 2013 Name: Key INSTRUCTIONS 1. This exam is open book/open notes. All papers (but no electronic devices except for calculators) are allowed. 2. There are 5 pages in

More information

Theorems. Least squares regression

Theorems. Least squares regression Theorems In this assignment we are trying to classify AML and ALL samples by use of penalized logistic regression. Before we indulge on the adventure of classification we should first explain the most

More information

8 Matrices and operations on matrices

8 Matrices and operations on matrices AAC - Business Mathematics I Lecture #8, December 1, 007 Katarína Kálovcová 8 Matrices and operations on matrices Matrices: In mathematics, a matrix (plural matrices is a rectangular table of elements

More information

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A.

CS 5014: Research Methods in Computer Science. Statistics: The Basic Idea. Statistics Questions (1) Statistics Questions (2) Clifford A. Department of Computer Science Virginia Tech Blacksburg, Virginia Copyright c 2015 by Clifford A. Shaffer Computer Science Title page Computer Science Clifford A. Shaffer Fall 2015 Clifford A. Shaffer

More information

Principal Component Analysis

Principal Component Analysis I.T. Jolliffe Principal Component Analysis Second Edition With 28 Illustrations Springer Contents Preface to the Second Edition Preface to the First Edition Acknowledgments List of Figures List of Tables

More information

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis

Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Intensity Analysis of Spatial Point Patterns Geog 210C Introduction to Spatial Data Analysis Chris Funk Lecture 5 Topic Overview 1) Introduction/Unvariate Statistics 2) Bootstrapping/Monte Carlo Simulation/Kernel

More information

Multivariate Statistical Analysis

Multivariate Statistical Analysis Multivariate Statistical Analysis Fall 2011 C. L. Williams, Ph.D. Lecture 4 for Applied Multivariate Analysis Outline 1 Eigen values and eigen vectors Characteristic equation Some properties of eigendecompositions

More information

Computer Systems Modelling

Computer Systems Modelling Computer Systems Modelling Computer Laboratory Computer Science Tripos, Part II Michaelmas Term 2003 R. J. Gibbens Problem sheet William Gates Building JJ Thomson Avenue Cambridge CB3 0FD http://www.cl.cam.ac.uk/

More information

Revenue Maximization in a Cloud Federation

Revenue Maximization in a Cloud Federation Revenue Maximization in a Cloud Federation Makhlouf Hadji and Djamal Zeghlache September 14th, 2015 IRT SystemX/ Telecom SudParis Makhlouf Hadji Outline of the presentation 01 Introduction 02 03 04 05

More information