MCM3C BUSINESS MATHEMATICS AND STATISTICS Unit : 1-5

Size: px

Start display at page:

Download "MCM3C BUSINESS MATHEMATICS AND STATISTICS Unit : 1-5"

Donald Jordan
5 years ago
Views:

1 MCM3C BUSINESS MATHEMATICS AND STATISTICS Unit : 1-5

2 Unit I: Syllabus 1. Introduction 2. meaning and definition of statistics 3. collection of statistical data 4. tabulation of statistical data 5. presentation of statistical data 6. graphs and diagram 7. measures of central tendency 8. Arithmetic mean, median, mode, harmonic mean and geometric mean. 2

3 Introduction 1. Data are everywhere 2. Statistical techniques are used to make many decisions that affect our lives 3. No matter what your career, you will make professional decisions that involve data. An understanding of statistical methods will help you make these decisions efectively Statistics is a science of collectiong, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions Statistical analysis used to manipulate summarize, and investigate data, so that useful decision-making information results. 3

4 Meaning and Definition of Statistics Definition: The term Statistics has been defined in two senses A. Singular B. Plural sense A. In the Plural Sense: Statistics are numerical statements of facts in any department of enquiry placed in relation to each other. A.L. Bowley The classified facts respecting the condition of the people in a state especially those facts which can be stated in numbers or in tables of numbers or in any tabular or classified arrangement. Webster a systematic collection of numerical facts and in singular sense; it is the science of collecting, classifying and using statistics. 4

5 Meaning Continued By statistics we mean aggregates of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standard of accuracy, collected in a systematic manner for a predetermined purpose, and placed in relation to each other. Horace Sacrist B. In the Singular Sense: Statistics refers to the body of technique or methodology, which has been developed for the collection, presentation and analysis of quantitative data and for the use of such data in decision making. Ncttor and Washerman. Statistics may rightly be called the science of averages. Bowleg Statistics may be defined as the collection, presentation, analysis, and interpretation of numerical data. Croxton and Cowden These definitions given above give a narrow meaning to the statistics as they do not indicate its various aspects as are witnessed in its practical applications. 5

6 Sources of Statistical Data Statistical data as we have seen can be either primary or secondary. Primary data are those which are collected for the first time and so are in crude form. But secondary data are those which have already been collected. Primary data are always collected from the source. It is collected either by the investigator himself or through his agents. There are different methods of collecting primary data. The choice to a large extent depends on the preliminaries to data collection some of the commonly used methods are given below. 1. Direct Personal observation: 2. Indirect Oral Interviews : 3. Mailed Questionnaire method: 4. Schedule Method: 5. From Local Agents: 6

7 Secondary data The data that have been already collected by and readily available from other sources. Such data are cheaper and more quickly obtainable than the primary data and also may be available when primary data can not be obtained at all. Advantages of Secondary data 1. It is economical. It saves efforts and expenses. 2. It is time saving. 3. It helps to make primary data collection more specific since with the help of secondary data, we are able to make out what are the gaps and deficiencies and what additional information needs to be collected. 4. It helps to improve the understanding of the problem. 5. It provides a basis for comparison for the data that is collected by the researcher. 7

8 Disadvantages of Secondary Data Secondary data is something that seldom fits in the framework of the marketing research factors. Reasons for its non-fitting are:-unit of secondary data collection- Suppose you want information on disposable income, but the data is available on gross income. The information may not be same as we require. Class Boundaries may be different when units are same. 8

9 Types of statistics: Descriptive & Inferential Descriptive statistics Methods of organizing, summarizing, and presenting data in an informative way Inferential statistics The methods used to determine something about a population on the basis of a sample Population The entire set of individuals or objects of interest or the measurements obtained from all individuals or objects of interest Sample A portion, or part, of the population of interest 9

10 Descriptive Statistics Collect data e.g., Survey Present data e.g., Tables and graphs Summarize data e.g., Sample mean = n X i Statistical data are usually obtained by counting or measuring items. Most data can be put into the following categories: Qualitative - data are measurements that each fail into one of several categories. (hair color, ethnic groups and other attributes of the population) Quantitative - data are observations that are measured on a numerical scale (distance traveled to college, number of children in a family, etc.) 10

11 Inferential Statistics We have seen that descriptive statistics provide information about our immediate group of data. For example, we could calculate the mean and standard deviation of the exam marks for the 100 students and this could provide valuable information about this group of 100 students. Any group of data like this, which includes all the data you are interested in, is called a population. A population can be small or large, as long as it includes all the data you are interested in. For example, if you were only interested in the exam marks of 100 students, the 100 students would represent your population. Descriptive statistics are applied to populations, and the properties of populations, like the mean or standard deviation, are called parameters as they represent the whole population(i.e., everybody) 11

12 Inferential Statistics - Continued You might be interested in the exam marks of all students in the UK. It is not feasible to measure all exam marks of all students in the whole of the UK so you have to measure a smaller sample of students (e.g., 100 students), which are used to represent the larger population of all UK students. Properties of samples, such as the mean or standard deviation, are not called parameters, but statistics. Inferential statistics are techniques that allow us to use these samples to make generalizations about the populations from which the samples were drawn. It is, therefore, important that the sample accurately represents the population. The process of achieving this is called sampling. Inferential statistics arise out of the fact that sampling naturally incurs sampling error and thus a sample is not expected to perfectly represent the population. The methods of inferential statistics are (1) the estimation of parameter(s) and (2) testing of statistical hypotheses. 12

13 Quantitative data Quantitative data are always numbers and are the result of counting or measuring attributes of a population. Quantitative data can be separated into two subgroups: discrete (if it is the result of counting (the number of students of a given ethnic group in a class, the number of books on a shelf,...) continuous (if it is the result of measuring (distance traveled, weight of luggage, ) Numerical presentation of qualitative data pivot table (qualitative dichotomic statistical attributes) contingency table (qualitative statistical attributes from which at least one of them is polynomic) You should know how to convert absolute values to relative ones (%). 13

14 Frequency distributions (quantitative) data Frequency distribution shows the frequency, or number of occurences, in each of several categories. Frequency distributions are used to summarize large volumes of data values. When the raw data are measured on a qunatitative scale, either interval or ration, categories or classes must be designed for the data values before a frequency distribution can be formulated. Steps for constructing a frequency distribution 1. Determine the number of classes 2. Determine the size of each class 3. Determine the starting point for the first class 4. Tally the number of values that occur in each class 5. Prepare a table of the distribution using actual counts and/ or percentages (relative frequencies) m = n ( max min ) 14 h = m

Simple bar diagrams : These are those bar diagrams which are based on a single set of numerical data.

15 Diagrammatic Representation Bar Diagrams are those diagrams in which data are presented in the form of bars or rectangles. Types of bar diagrams :- 1. Simple bar diagrams : These are those bar diagrams which are based on a single set of numerical data. The different items or values are represented by different bars. 2. Multiple bar diagrams : These are those bar diagrams which show two or more sets of data simultaneously. E.g. Multiple and Stacked bar diagram. 15

16 Pie Chart The pie chart is an effective way of displaying the percentage breakdown of data by category. Useful if the relative sizes of the data components are to be emphasized Pie charts also provide an effective way of presenting ratio- or interval-scaled data after they have been organized into categories Source of borrowing Amounts of loan USA 1800 USSR 1200 UK 800 Japan 600 Germany

17 Histogram & Frequency Polygon Frequently used to graphically present interval and ratio data Is often used for interval and ratio data The adjacent bars indicate that a numerical range is being summarized by indicating the frequencies in arbitrarily chosen classes Histogram frquency polygon Another common method for graphically presenting interval and ratio data To construct a frequency polygon mark the frequencies on the vertical axis and the values of the variable being measured on the horizontal axis, as with the histogram. If the purpose of presenting is comparation with other distributions, the frequency polygon provides a good summary of the data

18 Objectives of Data Tabulation 1. To carry out investigations 2. To do comparisons 3. To locate omissions and errors in the data 4. To use space economically 5. To study the trends 6. To simplify data 7. To use it as future references Parts of an Ideal Table Table number Title Stubs or Row designations Column headings or Captions: Body of the table: Unit of measurement Source Footnotes and references 18

19 Rules of Tabulation The following general rules should be observed while tabulating statistical data. 1. The table should suit the size of the paper and, therefore, the width of the column should be decided before hand. 2. Number of columns and rows should neither be too large nor too small. 3. As far as possible figures should be approximated before tabulation. This would reduce unnecessary details. 4. Items should be arranged either in alphabetical, chronological or geographical order or according to size. 5. The sub-total and total of the items of the table must be written. 6. Percentages are given in the tables if necessary. 7. Ditto marks should not be used in a table because sometimes it create confusion. 8. Table should be simple and attractive. 9. A table should be logical, well-balanced in length and breadth and the comparable columns should be placed side by side. 10. Light/heavy/thick or double rulings may be used to distinguish sub columns, main columns and totals. 11. For large data more than one table may be used. 19

20 Methods of Tabulation 1. Simple tabulation Simple tabulation is when the data are tabulated to one characteristic. 2. Two way table Double tabulation is when two characteristics of data are tabulated. 3. Complex Table Complex tabulation of data that includes more than two characteristics. Parts of an Ideal Table Tabulation makes complex data into simple one and as a result of this, it becomes easy to understand the data. Tabulation makes the data brief. Therefore, it can be easily presented in the form of graphs. It is helpful in finding the mistakes. It is useful in condensing the collected data. It presents the numerical figures in an attractive form. A table with a title and a number can be easily identified and used for the required purpose. Relationship between different parts can be easily known. 20

21 OGIVE Wages No. of Workers A graph of a cumulative frequency distribution Ogive is used when one wants to determine how many observations lie above or below a certain value in a distribution. First cumulative frequency distribution is constructed Cumulative frequencies are plotted at the upper class limit of each category Ogive can also be constructed for a relative frequency distribution. 21

22 Mean, Median and Mode i) Mean by direct method where f is the frequency of the class and x is the size or midpoint of the class i) Median iii) Mode 22

23 Harmonic Mean The harmonic mean can be expressed as the reciprocal of the arithmetic mean of the reciprocals of the given set of observations. HM for raw and discrete data are calculated as follows. x 1/x Total HM=(5/0.3417)=

24 Mean, Median and Mode Calculation of Mean, Median and Mode for Continuous Frequency Distribution i) Mean by direct method =59920/390 = Median, Modal Class Md=150+(((390/2)-151) *10/116) = Class x f xf cf f f f L N/2 390/2= Mean median Mode

25 Mode Using graphical Method Class Frequency 140 Frequency Frequency

26 Unit II: Syllabus 1. Measures of variation 2. Mean deviation 3. Standard deviation, 4. Quartile deviation 5. Skewness and kurtosis 6. Lorenz curve 7. Simple correlation 8. Scatter diagram 9. Karl Pearson s correlation 10. Rank correlations regressions. 26

27 Measures of Dispersion Central tendency measures do not reveal the variability present in the data. 1. Dispersion is the scatteredness of the data series around its averages. 2. It is the extent to which values in a distribution differ from the average of the distribution. Importance of dispersion: 1. Determine the reliability of an average 2. Serve as a basis for the control of the variability 3. To compare the variability of two or more series and 4. Facilitate the use of other statistical measures. 27

28 How are the dispersions measured? The following measures of dispersion are used to study the variation. 1. Absolute Measures of Dispersion 2. Relative Measures of Dispersion Absolute measures of Dispersion are expressed in same units in which original data is presented but these measures cannot be used to compare the variations between the two series. Relative measures are not expressed in units but it is a pure number. It is the ratios of absolute dispersion to an appropriate average such as co-efficient of Standard Deviation or Co-efficient of Mean Deviation. Absolute Measures Range Quartile Deviation Mean Deviation Standard Deviation Relative Measure Co-efficient of Range Co-efficient of Quartile Deviation Co-efficient of mean Deviation Co-efficient of Variation. 28

29 Mean Deviation Average of absolute differences (differences expressed without plus or minus sign) between each value in a set of values, and the average of all values of that set. For example, the average (arithmetic mean or mean) of the set of values 1, 2, 3, 4, and 5 is (15 5) or 3. The difference between this average (3) and the values in the set is 2, 1, 0, -1, and -2; the absolute difference being 2, 1, 0, 1, and 2. The average of these numbers (6 5) is 1.2 which is the mean deviation. Also called mean absolute deviation, it is used as a measure of dispersion where the number of values or quantities is small, otherwise standard deviation is used. 29

30 Standard deviation Standard deviation is a measure of the dispersion of a set of data from its mean. It is calculated as the square root of variance by determining the variation between each data point relative to the mean. If the data points are further from the mean, there is higher deviation within the data set. Example, In finance, standard deviation is a statistical measurement; when applied to the annual rate of return of an investment, it sheds light on the historical volatility of that investment. The greater the standard deviation of a security, the greater the variance between each price and the mean, indicating a larger price range. 30

31 Quartile Deviation Quartile deviation is based on the lower quartile Q1Q1 and the upper quartile Q3Q3. The difference Q3 Q1Q3 Q1 is called the inter quartile range. The difference Q3 Q1 divided by 2 is called semi-inter-quartile range or the quartile deviation. The quartile deviation is a slightly better measure of absolute dispersion than the range, but it ignores the observations on the tails. If we take difference samples from a population and calculate their quartile deviations, their values are quite likely to be sufficiently different. This is called sampling fluctuation, and it is not a popular measure of dispersion. The quartile deviation calculated from the sample data does not help us to draw any conclusion (inference) about the quartile deviation in the population. Coefficient of Quartile Deviation A relative measure of dispersion based on the quartile deviation is called the coefficient of quartile deviation. It is defined as: (Q3 Q1)/(Q3+Q1) 31

32 Skewness and Kurtosis The histogram can give you a general idea of the shape, but two numerical measures of shape give a more precise evaluation: Skewness tells you the amount and direction of skew (departure from horizontal symmetry), and kurtosis tells you how tall and sharp the central peak is, relative to a standard bell curve. One application is testing for normality: many statistics inferences require that a distribution be normal or nearly normal. A normal distribution has skewness and excess kurtosis of 0, so if your distribution is close to those values then it is probably close to normal. The other common measure of shape is called the kurtosis. As skewness involves the third moment of the distribution, kurtosis involves the fourth moment. The outliers in a sample, therefore, have even more effect on the kurtosis than they do on the skewness and in a symmetric distribution both tails increase the kurtosis, unlike skewness where they offset each other. 32

33 Kurtosis Continued mean and standard deviation have the same units as the original data, and the variance has the square of those units. However, the kurtosis, like skewness, has no units: it s a pure number, like a z-score. Traditionally, kurtosis has been explained in terms of the central peak. You ll see statements like this one: Higher values indicate a higher, sharper peak; lower values indicate a lower, less distinct peak, increasing kurtosis is associated with the movement of probability mass from the shoulders of a distribution into its center and tails. higher kurtosis means more of the variance is the result of infrequent extreme deviations, as opposed to frequent modestly sized deviations. In other words, it s the tails that mostly account for kurtosis, not the central peak. 33

34 Mean Absolute Deviation TM (MAD) For a series of n observations the mean deviation about mean is computed by MD For a discrete frequency distribution, the mean deviation about mean and median are computed respectively by = n i= 1 X i n X MD = n i= 1 f X i n i= 1 i f md i and MD = n i= 1 f X i n i= 1 i f i X The Coefficient of MD =MD/ Average 34

35 Lorenz curve TM The Lorenz curve is a graphical representation of income inequality or wealth inequality developed by American economist Max Lorenz in The graph plots percentiles of the population according to income or wealth on the horizontal axis. It plots cumulative income or wealth on the vertical axis, so that an x-value of 45 and a y-value of 14.2 would mean that the bottom 45% of the population controls 14.2% of the total income or wealth. The Lorenz curve is often accompanied by a straight diagonal line with a slope of 1, which represents perfect equality in income or wealth distribution; the Lorenz curve lies beneath it, showing the actual distribution. The area between the straight line and the curved line, expressed as a ratio of the area under the straight line, is the Gini coefficient, a measurement of inequality. 35

36 CORRELATION Correlation coefficient is a statistical measure of the degree to which changes to the value of one variable predict change to the value of another. When the fluctuation of one variable reliably predicts a similar fluctuation in another variable, there s often a tendency to think that means that the change in one causes the change in the other. Properties of Correlation Coefficient Coefficient of Correlation lies between -1 and +1. Coefficients of Correlation are independent of Change of Origin. Coefficients of Correlation possess the property of symmetry. Coefficient of Correlation is independent of Change of Scale. Co-efficient of correlation measures only linear correlation between X and Y. 36

37 Types of correlation Correlation are of three types: Positive Correlation Negative Correlation No correlation In correlation, when values of one variable increase with the increase in another variable, it is supposed to be a positive correlation. On the other hand, if the values of one variable decrease with the decrease in another variable, then it would be a negative correlation. There might be the case when there is no change in a variable with any change in another variable. In this case, it is defined as no correlation between the two. Methods of studying Correlation Scatter diagram. Karl pearson's coefficient of correlation. Spearman's Rank correlation coefficient. 37

38 Scatter Diagram The scatter diagram graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them. If the variables are correlated, the points will fall along a line or curve. The better the correlation, the tighter the points will hug the line. The following video explains the scatter diagram with an example 38

39 Karl Pearson's coefficient of correlation The formula to calculate Karl Pearson s coefficient correlation is as follows: r = n( x 2 ) n ( xy The following video explains the correlation problem. x) ( 2 x)( n( Spearman s Rank correlation co-efficient Rank correlation is any of several statistics that measure an ordinal association the relationship between rankings of different ordinal variables or different rankings of the same variable y y) 2 ) ( y) 2 39

40 Regression In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships among variables. It includes many techniques for modeling and analysing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables (or 'predictors'). Uses of Regression Anyone who wants to make predictions or inferences based on a particular data set. Companies, for example, use regression to determine their profit, revenue, and cost functions and scientists use regression to analyze and predict all sorts of time series phenomenon including the population of certain bacteria, crop data, planetary orbits, and projectile tracking. These days, regression is the most used method for data analysis. 40

41 Formula for Regression lines The following link explains the regression problem in detail. MCM3C Business Mathematics and 41

42 Unit III: Syllabus 1. Analysis of Time series 2. Methods of measuring trend 3. Seasonal variations. 42

43 Components of Time Series Seasonal effect (Seasonal Variation or Seasonal Fluctuations) Many of the time series data exhibits a seasonal variation which is annual period, such as sales and temperature readings. It is easy to understand and can be easily measured or removed from the data to give de-seasonalized data. Seasonal Fluctuations describes any regular variation (fluctuation) with a period of less than one year for example cost of variation types of fruits and vegetables, cloths, unemployment figures, average daily rainfall, increase in sale of tea in winter, increase in sale of ice cream in summer etc., all show seasonal variations. The changes which repeat themselves within a fixed period, are also called seasonal variations, for example, traffic on roads in morning and evening hours, Sales at festivals like EID etc., increase in the number of passengers at weekend etc. Seasonal variations are caused by climate, social customs, religious activities etc. 43

44 Cyclical Variation Cyclical Variations at a fixed period due to some other physical cause, such as daily variation in temperature. It is a non-seasonal component which varies in recognizable cycle. sometime series exhibits oscillation which do not have a fixed period but are predictable to some extent. For example, economic data affected by business cycles with a period varying between about 5 and 7 years. In weekly or monthly data, the cyclical component may describes any regular variation (fluctuations) in time series data. They are periodic in nature and repeat themselves like business cycle, which has four phases (i) Peak (ii) Recession (iii) Trough/Depression (iv) Expansion. Trend (Secular Trend or Long Term Variation) It is a longer term change. Here we take into account the number of observations available and make a subjective assessment of what is long term. For example climate variables sometimes exhibit cyclic variation over a very long time period such as 50 years. 44

45 These movements are systematic in nature where the movements are broad, steady, showing slow rise or fall in the same direction. The trend may be linear or non-linear (curvilinear). Some examples of secular trend are: Increase in prices, Increase in pollution, increase in the need of wheat, increase in literacy rate, decrease in deaths due to advances in science. Taking averages over a certain period is a simple way of detecting trend in seasonal data. Change in averages with time is evidence of a trend in the given series, though there are more formal tests for detecting trend in time series. Irregular Variation (Irregular Fluctuations) When trend and cyclical variations are removed from a set of time series data, the residual left, which may or may not be random. Various techniques for analyzing series of this type examine to see if irregular variation may be explained in terms of probability models such 45

46 as moving average or autoregressive models, i.e. we can see if any cyclical variation is still left in the residuals. These variation occur due to sudden causes are called residual variation (irregular variation or accidental or erratic fluctuations) and are unpredictable, for example rise in prices of steel due to strike in the factory, accident due to failure of break, flood, earth quick, war etc. Methods of Measuring Trend Trend can be determined : (i) Free hand curve method ; (ii) moving averages method ; (iii) semiaverages method; and (iv) least-squares method. Each of these methods is described below : (i) Freehand Curve Method : The term freehand is used to any non-mathematical curve in statistical analysis even if it is drawn with the aid of drafting instruments. This is the simplest method of studying trend of a time series. 46

47 Procedure for Free Hand Curve The procedure for drawing free hand curve is an follows : (i) The original data are first plotted on a graph paper. (ii) The direction of the plotted data is carefully observed. (iii) A smooth line is drawn through the plotted points. While fitting a trend line by the freehand method, an attempt should be made that the fitted curve conforms to these conditions. (i) The curve should be smooth either a straight line or a combination of long gradual curves. (ii) The trend line or curve should be drawn through the graph of the data in such a way that the areas below and above the trend line are equal to each other. (iii) The vertical deviations of the data above the trend line must equal to the deviations below the line. (iv) Sum of the squares of the vertical deviations of the observations from the trend should be minimum. 47

48 Trend by the Method of Semi -averages This method can be used if a straight line trend is to be obtained. Since the location of only two points is necessary to obtain a straight line equation, it is obvious that we may select two representative points and connect them by a straight line. Data are divided into two halves and an average is obtained for each half. Each such average is shown against the mid-point of the half period, we obtain two points on a graph paper. By joining these points, a straight line trend is obtained. The method is to be commended for its simplicity and used to some extent in practical work. This method is also flexible, for it is permissible to select representative periods to determine the two points.unrepresentative years may be ignored. 48

49 Method of Least Squares If a straight line is fitted to the data it will serve as a satisfactory trend, perhaps the most accurate method of fitting is that of least squares. This method is designed to accomplish two results. (i) The sum of the vertical deviations from the straight line must equal zero. (ii) The sum of the squares of all deviations must be less than the sum of the squares for any other conceivable straight line. There will be many straight lines which can meet the first condition. Among all different lines, only one line will satisfy the second condition. It is because of this second condition that this method is known as the method of least squares. It may be mentioned that a line fitted to satisfy the second condition, will automatically satisfy the first condition

50 The formula for a straight-line trend can most simply be expressed as Y c = a + bx where X represents time variable, Yc is the dependent variable for which trend values are to be calculated and a and b are the constants of the straight tine to be found by the method of least squares. Constant is the Y-intercept. This is the difference between the point of the origin (O) and the point of the trend line and Y-axis intersect. It shows the value of Y when X = 0, constant b indicates the slope which is the change in Y for each unit change in X. Let us assume that we are given observations of Y for n number of years. If we wish to find the values of constants a and b in such a manner that the two conditions laid down above are satisfied by the fitted equation. 50

51 Mathematical reasoning suggests that, to obtain the values of constants a and b according to the Principle of Least Squares, we have to solve simultaneously the following two equations. Y = na + b Y...(i) XY = a X + b X 2...(ii) On solving (i) and (ii), we get the values of a and b. Methods of Measuring Seasonal Variations 1. Method of Simple Averages (Weekly, Monthly or Quarterly). 2. Ratio-to-Trend Method. 3. Ratio-to-Moving Average Method. 4. Link Relatives Method. Methods of Simple Average This is the simplest method of obtaining a seasonal index. The following steps are necessary for calculating the index : (i) Average the unadjusted date by years and months or quarters if quarterly data are given. (ii) Find totals of January, February etc. 51

52 (iii) Divide each total by the number of years for which data are given. For example, if we are given monthly data for five years then we shall first obtain total for each month for five years and divide each total by 5 to obtain an average. (iv) Obtain an average of monthly averages by dividing the total of monthly averages by 12. (v) Taking the average of monthly average as 100, compute the percentage of various monthly averages as follows: Seasonal Index for January = (average of January month/overall average)*100. Ratio-to-moving average method The method of monthly totals or monthly averages does not give any consideration to the trend which may be present in the data. The ratio-tomoving-average method is one of the simplest of the commonly used devices for measuring seasonal variation. 52

53 STEPS (i) Arrange the unadjusted data by years and months. (ii) Compute the trend values by the method of moving averages. For this purpose take 12 month moving average followed by a two-month moving average to recentre the trend values. (iii) Express the data for each month as a percentage ratio of the corresponding moving-average trend value. (iv) Arrange these ratios by months and years. (v) Aggregate the ratios for January, February etc. (vi) Find the average ratio for each month. (vii) Adjust the average monthly ratios found in step (vi) so that they will themselves average 100 percent. These adjusted ratios will be the seasonal indices for various months. A seasonal index computed by the ratios-to-moving-average method ordinarily does not fluctuate so much as the index based on straight-line trends. This is because the 12-month moving average follows the cyclical course of the actual data quite closely. Therefore the index ratios obtained by this method are often more representative of the data. 53

54 Ratio-to-trend method The ratio-to-trend method is similar to ratio-to-moving-average method. The only difference is the way of obtaining the trend values. Whereas in the ratio-to-moving-average method, the trend values are obtained by the method of moving averages, in the ratio-to-trend method, the corresponding trend is obtained by the method of least squares. The steps in the calculation of seasonal variation are as follows : (i) Arrange the unadjusted data by years and months. (ii) Compute the trend values for each month with the help of least squares equation. (iii) Express the data for each month as a percentage ratio of the corresponding trend value. (iv) Aggregate the January s ratios, February s ratios, etc., computed previously (v) Find the average ratio for each month. (vi) Adjust the average ratios found in step (v) so that they will themselves average 100 per cent. 54

55 Link Relatives Method Among all the methods of measuring seasonal variation, link relatives method is the most difficult one. When this method is adopted the following steps are taken to calculate the seasonal variation indices : (i) Calculate the link relatives of the seasonal figures. (ii) Calculating the average of the link relatives for each season. (iii) Convert these averages into chain relatives on the base of the first season. (iv) Calculate the chain relatives of the first season on the basis of the last season. There will be some difference between the chain relative of the first season and the chain relative calculated by the previous method. This difference will be due to long-term changes. It is therefore necessary to correct these chain relatives. 55

56 (v) For correction, the chain relative of the first season calculated by first method is deducted from the chain relative (of the first season) calculated by the second method. The difference is divided by the number of seasons. The resulting figure multiplied by 1,2,3 (and so on) is deducted respectively from the chain relatives of the 2nd, 3rd, 4th (and so on) seasons. These are correct chain relatives. (vi) Express the corrected chain relatives as percentage of their averages. These provide the required seasonal indices by the method of link relatives. 56

57 Unit IV: Syllabus 1. Index number s 2. Consumer s price index 3. Cost of living indices 4. Statistical quality control. 57

58 Index Numbers Index numbers are statistical devices designed to measure the relative change in the level of variable or group of variables with respect to time, geographical location etc. In other words these are the numbers which express the value of a variable at any given period called current period as a percentage of the value of that variable at some standard period called base period. 58

59 Uses of Index Numbers Index numbers are used in the fields of commerce, meteorology, labour, industry, etc. Index numbers measure fluctuations during intervals of time, group differences of geographical position of degree, etc. They are used to compare the total variations in the prices of different commodities in which the unit of measurements differs with time and price, etc. They measure the purchasing power of money. They are helpful in forecasting future economic trends 59

60 Types of Index Numbers The following types of index numbers are usually used: price index numbers and quantity index numbers. Price index numbers measure the relative changes in the price of a commodity between two periods. Prices can be either retail or wholesale. Quantity index numbers are considered to measure changes in the physical quantity of goods produced, consumed or sold for an item or a group of items. 60

61 Methods of Index Numbers 61

62 Simple aggregative method This is the simplest method of constructing index numbers. When this method is used to construct a price index number the total of current year prices for the various commodities in question is divided by the total of the base year prices and the quotient is multiplied by 100. P1 `Symbolically P01 = 100 P Where P 0 are the base year prices P 1 are the current year prices P 01 is the price index number for the current year with reference to the base year. 0 62

63 Calculate the index number for 1995 taking 1991 as the base to the following data using Simple aggregative method Commodit Prices Prices Unit y 1991 (P 0 ) 1995 (P 1 ) A Kilogram B Dozen C Meter D Quintal E Liter Total Price index number = P = = P0 = P There is a net increase of 32.93% in 1995 as compared to

64 Simple average of relatives When this method is used to construct a price index number, first of all price relatives are obtained for the various items included in the index and then the average of these relatives is obtained using any one of the averages i.e. mean or median etc. When A.M is used for averaging the relatives the formula for computing the index is P 01 1 P 1 = 100 n P0 64

65 Calculate the index number for 1995 taking 1991 as the base for the following data using Simple average of relatives Prices Prices (P Commo 1 /P 0 )x100 Unit dity (P 0 ) (P 1 ) A Kilogram = 140 B Dozen C Meter D Quintal E Liter Total Price index number = 1 P 1 1 P01 = 100 = 612 = n P 0 5 There is a net increase of 22.4% in 1995 as compared to

66 There are various methods of assigning weights and consequently a large number of formulae for constructing weighted index number have been designed. Some important methods are as follows: Lasperey s method: P 01 La = p p 1 0 q q Paasche s method P 01 Pa = p p 1 0 q q P 01 + P Dorbish-Bowley s method P DB 01 = 2 La Pa 01 Fisher s ideal method P 01 F = La Pa P01 P01 66

67 Example: Compute price index numbers for the following data by i)laspayre s method ii)paache s method iii) Dorbish-Bowley s method iv)fisher s ideal method 67

68 68

69 Weighted average of Price Relatives Weighted average of price relatives can be calculated by taking values of the base year (p 0 q 0 ) as the weights. The formula is given by When A.M is used When G.M is used P 01 P 01 = = PV V Anti log V log P p = 1 where P 100 and V = p p 0 q0 i.e. base year value 0 V Sub Code - Sub Name 69

70 Fisher s Time reversal test when the data for any two years are treated by the same method, but with the bases reversed, the two index numbers secured should be reciprocals to each other, so that their product is unity. Symbolically P P = Fisher s Factor reversal test This test holds that the product of a price index number and the quantity index number should be equal to the corresponding value index. In other words the test is that the change in price multiplied by the change in quantity should be equal to change in value. p1q1 Symbolically P01 Q01 = V01 = p q

71 The following link explains the fisher s time and factor reversal test in detail. Cost Of Living Index Numbers The cost of living index numbers measures the changes in the level of prices of commodities which directly affects the cost of living of a specified group of persons at a specified place. V = p 0 q 0 71

72 Statistical quality control Statistical quality control (SQC): the term used to describe the set of statistical tools used by quality professionals; SQC encompasses three broad categories of: 1. Statistical process control (SPC) 2. Descriptive statistics include the mean, standard deviation, and range. Involve inspecting the output from a process. Quality characteristics are measured and charted Helps identify in-process variations. 3. Acceptance sampling used to randomly inspect a batch of goods to determine acceptance/rejection. Does not help to catch in-process problems. 72

73 Sources of Variation Variation exists in all processes. Variation can be categorized as either: Common or Random causes of variation, or Random causes that we cannot identify Unavoidable, e.g. slight differences in process variables like diameter, weight, service time, temperature Assignable causes of variation Causes can be identified and eliminated: poor employee training, worn tool, machine needing repair 73

74 Important uses of the control chart Most processes do not operate in a state of statistical control. Consequently, the routine and attentive use of control charts will identify assignable causes. If these causes can be eliminated from the process, variability will be reduced and the process will be improved. The control chart only detects assignable causes. Management, operator, and engineering action will be necessary to eliminate the assignable causes. 74

75 SPC Methods-Developing Control Charts Control Charts (aka process or QC charts) show sample data plotted on a graph with CL, UCL, and LCL Control chart for variables are used to monitor characteristics that can be measured, e.g. length, weight, diameter, time Control charts for attributes are used to monitor characteristics that have discrete values and can be counted, e.g. % defective, # of flaws in a shirt, etc. 75

76 Unit V: Syllabus 1. Sampling procedures 2. Simple Random Sampling 3. Systematic Random Sampling 4. Stratified Random Sampling 5. Hypothesis test and its Fundamental ideas 6. Large sample Test 7. Small sample test 8. t test 9. F test 10. Chi Square Test 76

77 Sampling Procedures Sampling is a process or technique of choosing a sub-group from a population to participate in the study; it is the process of selecting a number of individuals for a study in such a way that the individuals selected represent the large group from which they were selected. There are two major sampling procedures in research. These include probability and non probability sampling. Probability Sampling Procedures In probability sampling, everyone has an equal chance of being selected. This scheme is one in which every unit in the population has a chance (greater than zero) of being selected in the sample. There are four basic types of sampling procedures associated with probability samples. These include simple random, systematic sampling, stratified and cluster. Simple Random Sampling Procedure Simple random sampling provides the base from which the other more complex sampling methodologies are derived. To conduct a simple random sample, the researcher must first prepare an exhaustive list (sampling frame) of all members of the population of interest. 77

78 SRS - continued From this list, the sample is drawn so that each person or item has an equal chance of being drawn during each selection round. To draw a simple random sample without introducing researcher bias, computerized sampling programs and random number tables are used to impartially select the members of the population to be sampled. Subjects in the population are sampled by a random process, using either a random number generator or a random number table, so that each person remaining in the population has the same probability of being selected for the sample. Systematic Sampling Procedure Systematic sampling procedure often used in place of simple random sampling. In systematic sampling, the researcher selects every nth member after randomly selecting the first through nth element as the starting point. For example, if the researcher decides to sample 20 respondents from a sample of 100, every 5th member of the population will systematically be selected. 78

79 Systematic Sampling A researcher may choose to conduct a systematic sample instead of a simple random sample for several reasons. Firstly, systematic samples tend to be easier to draw and execute. Secondly, the researcher does not have to go back and forth through the sampling frame to draw the members to be sampled, thirdly, a systematic sample may spread the members selected for measurement more evenly across the entire population than simple random sampling. Therefore, in some cases, systematic sampling may be more representative of the population and more precise. Stratified Sampling Procedure Stratified sampling procedure is the most effective method of sampling when a researcher wants to get a representative sample of a population. It involves categorizing the members of the population into mutually exclusive and collectively exhaustive groups. An independent simple random sample is then drawn from each group. Stratified sampling techniques can provide more precise estimates if the population being surveyed is more heterogeneous than the categorized groups. 79

80 Stratified Sampling - Continued This technique can enable the researcher to determine desired levels of sampling precision for each group, and can provide administrative efficiency. The main advantage of the approach is that it s able to give the most representative sample of a population Hypothesis test and its Fundamental ideas A population is any large collection of objects or individuals, such as Americans, students, or trees about which information is desired. A parameter is any summary number, like an average or percentage, that describes the entire population. The general idea of hypothesis testing involves: 1. Making an initial assumption. 2. Collecting evidence (data). 3. Based on the available evidence (data), deciding whether to reject or not reject the initial assumption. 80

81 Hypothesis testing -continued A sample is a representative group drawn from the population. A statistic is any summary number, like an average or percentage, that describes the sample. Example: Is normal body temperature really 98.6 degrees F? Consider the population of many, many adults. A researcher hypothesized that the average adult body temperature is lower than the often-advertised 98.6 degrees F. That is, the researcher wants an answer to the question: "Is the average adult body temperature 98.6 degrees? Or is it lower?" To answer his research question, the researcher starts by assuming that the average adult body temperature was 98.6 degrees F. Then, the researcher went out and tried to find evidence that refutes his initial assumption. In doing so, he selects a random sample of 130 adults. The average body temperature of the 130 sampled adults is degrees. Then, the researcher uses the data he collected to make a decision about his initial assumption. 81

82 Hypothesis testing continued It is either likely or unlikely that the researcher would collect the evidence he did given his initial assumption that the average adult body temperature is 98.6 degrees: If it is likely, then the researcher does not reject his initial assumption that the average adult body temperature is 98.6 degrees. There is not enough evidence to do otherwise. If it is unlikely, then: either the researcher's initial assumption is correct and he experienced a very unusual event; or the researcher's initial assumption is incorrect. A null hypothesis is a hypothesis that says there is no statistical significance between the two variables. It is usually the hypothesis a researcher or experimenter will try to disprove or discredit. E.g. in the previous example, average adult body temperature is 98.6 degrees. An alternative hypothesis is one that states there is a statistically significant relationship between two variables. E.g. average adult body temperature is not 98.6 degrees. 82

83 Hypothesis testing continued Let's imagine that μ is the average grade point average of all students in Chennai who s major in mathematics. We can do the following three hypothesis tests about the population mean µ and the steps involved in testing the hypothesis are given below. 1. Make assumptions and meet test requirements. 2. State the H Select the Sampling Distribution and Determine the Critical Region. 4. Calculate the test statistic. 5. Make a Decision and Interpret Results. 83

84 Large Sample Test In this section we describe and demonstrate the procedure for conducting a test of hypotheses about the mean of a population in the case that the sample size n is at least 30. The test statistic has the standard normal distribution. 84

Example: Large sample test The critical value is 2.576.

85 Example: Large sample test The critical value is The data do not provide sufficient evidence, at the 1% level of significance, to conclude that the average amount of product dispensed is different from 8.1 ounce. We conclude that the machine does not need to be recalibrated. 85

86 Hypothesis Testing in the Two Sample Case H 0 : μ1 = μ2 The Null asserts there is no significant difference between the populations. H 1 : μ1 μ2 The research hypothesis contradicts the H 0 and asserts there is a significant difference between the populations. If the obtained test statistic falls in the Critical Region then reject the null hypothesis. The difference between the sample means is so large that we can conclude (at α = 0.05) that a difference exists between the populations represented by the samples. 86

Small Sample test The statistical validity of the tests was insured by the Central Limit Theorem, with essentially no assumptions on the distribution of the population.

87 Small Sample test The statistical validity of the tests was insured by the Central Limit Theorem, with essentially no assumptions on the distribution of the population. When sample sizes are small, as is often the case in practice, the Central Limit Theorem does not apply. One must then impose stricter assumptions on the population to give statistical validity to the test procedure. One common assumption is that the population from which the sample is taken has a normal probability distribution to begin with. Under such circumstances, If σ is unknown Z has the standard normal distribution. If σ is unknown and is approximated by the sample standard deviation s, then the resulting test statistic follows Student s t-distribution with n 1 degrees of freedom. 87

BUSINESS STATISTICS AND OPERATIONS RESEARCH-BPH4A Unit : 1-5

BUSINESS STATISTICS AND OPERATIONS RESEARCH-BPH4A Unit : 1-5 Unit I: Syllabus 1. Introduction 2. Meaning and definition of statistics 3. Collection of statistical data 4. Tabulation of statistical data