Analysis of Environmental Data Problem Set Conceptual Foundations: Pro b ab ility d istrib u tio n s Answers

Size: px
Start display at page:

Download "Analysis of Environmental Data Problem Set Conceptual Foundations: Pro b ab ility d istrib u tio n s Answers"

Transcription

1 Analysis of Environmental Data Problem Set Conceptual Foundations: Pro b ab ility d istrib u tio n s Answers Note, to answer some of these questions you will likely need to very carefully review the bestiary of probability distributions presented in the lecture notes and more thoroughly in Bolker. 1. Consider a study on the timing of alewife spawning migration runs up a coastal stream. At a fishway at the mouth of the stream, you establish a digital video monitoring station that continuously videos fish movement through the fishway and then sample the video after the fact to detect and count fish moving through the fishway. For practical reasons, you sample the video for 9 randomly selected 10 minute intervals in each of 7 randomly selected days every 14 days over the course of the spawning season. For each sample you count the number alewife seen swimming upstream through the fishway. Fish count is the dependent variable and time, expressed as ordinal date of the run (day of the year) plus the time of day (in fractions of days), is the independent variable. You hypothesize fish counts to have a concave relationship over time (i.e., rise, peak and then fall); consequently, you specify the deterministic model to be a quadratic polynomial function of 2 time (i.e., fish count=a+b*time+c*time ). Based on this data set, answer the following questions: a. What are the parameters of the deterministic model? a, b, and c b. Identify at least two (preferably more) potential sources of error in fish counts? Potential sources of measurement error include: Failure to accurately detect and count all fish that move through the fishway on the video; e.g., when a fish is blocked from view by another fish. Accidentally counting the same fish twice; e.g., when a fish moves back and forth through the fishway. Potential sources of process error include: Random behavior of fish regarding their decision to move through the fishway on any given minute. Other factors beside time that are governing the fishes decision to move through the fishway on any given 10 minute interval such as fluctuations in water temperature and flow or the density of fish below the fishway. c. Specify a suitable probability distribution for the stochastic component of the model. A suitable probability distribution for this type of data is the Poisson, which is designed for effectively unlimited count data, or the negative binomial, which is similar to the Poisson except that it allows the variance to be larger than the mean (i.e., overdispersed data).

2 Probability Distributions: Problem Set Answers 2 2. Consider a hypothetical study of wood frog larval abundance in vernal pools. Let s say you sample 100 vernal pools (observational units) and at each pool you take 10 dip net sweeps of the water column in random locations throughout the pool, and you record the presence/absence of wood frog tadpoles in each sweep. Let s assume that the probability of tadpole capture is the same for every sweep. For now, let s also assume that all pools are the same. Given the probability distributions shown here for a binomial distribution with a trial size=10 (#dips) and a per trial probability of success prob=0.3, answer the following questions: a. What is the probability of observing tadpoles in 6 out of 10 sweeps in a pond? This value can be obtained from the probability mass distribution shown in the upper right subplot of the figure provided by reading off the probability on the y-axis for #successes=6, which is approximately p=0.03. This value can be obtained mathematically from the binomial probability mass function with parameters size=10 (trial size; # dips in this case) and prob=0.3 (per trial probability of success, capture in this case), which is specified as follows: p(x)=choose(n,x) p^x (1-p)^(n-x) where n=trial size, x=#successes, p=per trial probability of success (prob), and choose(n,x) is the binomial coefficient for x success out of a trial of size n, which is how many different combinations of n trials can produce x successes. In this case there are 210 different ways 10 trials can produce 6 success (e.g., , , etc.). Plugging in the appropriate numbers yields the following result: p(x)=210*(0.3^6)*(1-0.3)^(10-6)= b. What is the probability of observing tadpoles in 2 or fewer of the 10 sweeps in a pool? This value can be obtained from the cumulative probability distribution shown in the upper left subplot of the figure provided by reading off the probability on the y-axis for #successes=2, which is approximately p=0.4. This value can be obtained mathematically using the probability mass function as above (a) by summing the probabilities of observing 0, 1 and 2 successes. This value can be computed in R directly using the pbinom () function as follows: pbinom(2,size=10,prob=0.3)=0.3827

3 Probability Distributions: Problem Set Answers 3 c. What is the probability of observing tadpoles in at least 4 of the 10 sweeps in a pool? This value can be obtained from the cumulative probability distribution shown in the upper left subplot of the figure provided by reading off the probability on the y-axis for #successes=3, which is approximately p=0.65, and then taking the compliment (1-0.65=0.35). Note, since we are interested in 4 or more successes, we need to know the cumulative probability of observing 3 (not 4) or less. This value can also be obtained mathematically using the probability mass function as above (a) by summing the probabilities of observing 0-3 successes and taking the compliment, or summing the probability of observing 4-10 successes. This value can be computed in R directly using the pbinom () function as follows: 1-pbinom(3,size=10,prob=0.3) = d. If the per trial probability of capture was 0.5 instead of 0.3, how would the probability mass function (pmf) change? The probability mass distribution would shift to the right such that the most likely (highest probability) #successes would be 0.5 instead of 0.3, as shown here in the barplot. 3. Consider a hypothetical study on the affect of road crossing mortality on the age structure of spadefoot toad populations. Spadefoot toads typically undergo annual migrations to and from their breeding sites (seasonal ponds) and have extremely high fidelity to their breeding site (i.e., local populations are relatively independent). You hypothesize that road mortality is sufficient to affect population age structure, since at least a portion of the local population adjacent to a road would be subject to increased mortality rates due to road kill during migration to and from the breeding pond. Let s say you sample three local populations at their breeding ponds; one pond is adjacent to a busy highway, another one is next to a secondary road with moderate traffic rates, and another one is without any nearby roads. For each population, you randomly sample 100 individuals and determine how many years they each survive. Thus, the data represent for each individual (observation) the age at death. The observed data are plotted

4 Probability Distributions: Problem Set Answers 4 here as a bar chart depicting the number of individuals surviving to each age for each of the populations. You are interested in knowing if the annual survival rate differs among the populations and estimating the probability of individuals in each population surviving for 10 years. Consequently, you specify the deterministic model to be an indexed vector of mean annual mortality rates (i.e., a vector consisting of three different mean mortality rates corresponding to the three different populations) or, alternatively, an indexed vector of expected probability of survival for 10 years. Based on this information, answer the following questions: a. What are some of the potential sources of error in the final statistical model? Potential sources of measurement error include: Error in determining the age at death, due to the difficulty of determining the age of individuals which will depend on aging method. Error in recording the age correctly in your data log. Potential sources of process error include: Randomness in getting killed by a vehicle crossing a road. Subject the same 100 individuals to the same road and traffic conditions and you will get a different number of roadkills just by pure chance of individuals getting hit or not. Other factors beside roadkill that are governing the age structure of the local populations that is unaccounted for in this study and that may be causing variations in age structure unrelated to roads. b. Does this data warrant a discrete or continuous distribution for the stochastic component of the model, and why? This data is clearly discrete as it represents the counts of individuals in each age class. Individuals are indivisible (discrete) units. c. What is a suitable probability distribution for the stochastic component of this model? The geometric probability distribution (at least one form of it) gives the number of trials (in this case, years) with a constant probability of failure (in this case, death) until you get a single failure (in this case, the individual dies). Thus, the geometric distribution is mechanistically ideally suited to deal with data representing the number of survived breeding seasons for a seasonally reproducing organism such as the spadefoot toad. Note, the geometric distribution is also the special case of the negative binomial when the overdispersion parameter k=1, and thus the negative binomial distribution would be a viable alternative if the variance ended up being too great for the geometric. d. Looking ahead to hypothesis testing, how might you go about determining whether roads have a significant affect on population age structure and/or toad longevity; i.e. whether survival rates differ significantly among the highway, road and none populations? Here, the basic hypothesis is that the age structure differs among populations, which

5 Probability Distributions: Problem Set Answers 5 can be restated more specifically as the annual survival rates differ among populations presumably as a result of differential road mortality rates. Consequently, the null hypothesis is that the age structure, and thus the annual survival rates, are the same among populations. If we can specify two models, one representing the null hypothesis (no difference) and the other representing the alternative hypothesis (they differ) and we can find an objective criterion from which to quantitatively assess how well each model fits the data, then we can determine whether the alternative model is better than the null model. In addition, we can determine the likelihood of observing the differences among populations that we observed if in fact these population samples were drawn from the same underlying distribution (i.e., that they in fact are identical and the differences we observed were simply due to chance associated with drawing a sample from the population). 4. Consider a hypothetical study on the affect of fire size on the severity of the fire. There is a general belief among land managers in the west that larger fires are more severe in terms of their ecological impacts. Fire severity is generally defined in terms of the proportion of the overstory vegetation killed by the fire and is often categorized into high severity, mixed severity and low severity. The belief is that as fires get larger the proportion of the fire (inside the fire perimeter) that is classified as high severity increases. Some claim this to be a myth. You decide to test this hypothesis. Specifically, you decide to test the hypothesis that the proportional extent of high severity burn increases logistically as fire size increases; i.e., you specify the deterministic model to be a logistic function of fire size (i.e., proportion high (c*(d-x)) severity=a+((b-a)/(1+e ))). Note, this is a 4 parameter logistic function that has parameters that control the asymptotes at the left-(a) and right-hand (b) ends of the x axis and scales (c) the response to x about the midpoint (d) where the curve has its inflection. To confront this model with data, you compile data on 100 randomly selected fires that occurred during the past 10 years in the Rocky Mountains region. For each fire you determine the size (ha) and proportion of high severity burn (via analysis of pre- and postfire satellite images). Actually, these data already exist for fires greater than 100 acres, so all you have to do is download the data and conduct the analysis. The data are shown here as a scatter plot. Based on this data set, answer the following questions: a. What are some of the potential sources of error in the final statistical model? Potential sources of measurement error include: Error in measuring the true size of the fire, since this depends on how you define the perimeter of the fire and the spatial resolution of the measuring device. Error in classifying locations to burn severity classes, since this involves deciding

6 Probability Distributions: Problem Set Answers 6 how much difference between pre- and post-fire satellite images is necessary to call something high severity and there is uncertainty in the choice of where to make the break between high and low severity and there is error in the spectral data recorded by the satellite sensor. Potential sources of process error include: Random variation among fires in the extent of high severity due to random fluctuations in the weather that drives fire behavior and fuel conditions (e.g., moisture levels). Other factors beside fire size that influence the distribution and extent of high severity within the fire perimeter, such as fuel loads and terrain that influence fire behavior. b. Does this data warrant a discrete or continuous distribution for the stochastic component of the model, and why? This data warrants a continuous distribution because the dependent variable is the proportional abundance of high severity burn within the fire perimeter, which is a continuous quantity bounded by 0-1. c. What is a suitable probability distribution for the stochastic component of this model? The beta probability distribution is phenomenologically ideally suited for data on a proportion scale, and it is the only continuous distribution that is bounded 0-1. Note, if the data were proportions but did not approach either 0 or 1, then some of the other continuous distributions such as the normal and gamma might also work. Also, the classical approach for dealing with data on a proportion scale was to apply the arcsine square root transformation and then use the normal distribution, but this is not really justifiable anymore given the availability of the beta distribution, and ultimately does not solve the problem of the data being bounded 0-1 and the normal distribution being unbounded. Lastly, note the important difference between continuous data measured on a proportion scale, such as fire severity, and discrete proportional data, such as the number of successes out of a given number of trials. Discrete proportional data is handled with the binomial distribution, whereas the continuous proportional data is handled with the beta distribution. The key distinction is that with discrete proportional data there is a well-defined trial and trial size (number of trials per sample unit) for which the trial outcome is binary and the number of successful outcomes are counted. d. Looking ahead to model selection, what other deterministic models might you propose as plausible alternatives to the 4 parameter logistic function based on the scatter plot? And are these likely to be mechanistic or phenomenological models? Since we do not have a mechanistic basis for the deterministic relationship between fire size and fire severity, there are a wide variety of monotonic functions that could be used to phenomenologically describe the apparent monotonic relationship. A simple linear or quadratic polynomial, or any of the saturating response functions that have an upper

7 Probability Distributions: Problem Set Answers 7 asymptote like the monomolecular, Beverton-Holt, Holling type III, and Von Bertalanffy would be suitable alternatives. 5. Consider a hypothetical study on the willingness of automobile owners to pay a gas tax to reduce carbon emissions in an effort to combat global warming. Specifically, policy makers are considering an additional tax on gasoline in which the revenue generated would be used to develop alternative renewable energy sources and they would like to know how much people would be willing to pay to reduce carbon emissions by 50% in an aggressive effort to control global warming and the factors influencing people s willingness to pay. In particular, they believe that the amount people would be willing to pay is going to be linearly related to income level; i.e., the more you make the more you would be willing to pay. However, some policy makers disagree because they think the rich are much less willing to pay for common goods such as clean air. So, you decide to test this hypothesis. Specifically, you test the hypothesis that the amount a person is willing to pay in additional gasoline tax is linearly related to their income level; consequently, you specify the deterministic model to be a simple linear function of income level (i.e., amount willing to pay=a+b*income level). You conduct a random survey of 100 automobile owners in Amherst, Massachusetts and record the income level and the amount they are willing to pay in additional gasoline tax. The dependent variable, amount willing to pay, is an integer (cents). The data are shown here as a scatterplot. You fit a simple linear model with normally distributed errors (i.e., normal probability distribution for the stochastic component of the model). The fitted model is depicted here as the solid line in the scatterplot (note, this is simply a straight line with the best estimates of the intercept a and slope b). Figuring out how to compute these best estimates is a subject for future consideration: parameter estimation. After fitting the model, you plot a histogram of the residuals: the deviations between the fitted values and the observed values, as shown here. Based on this information, answer the following questions: a. Based on the information provided, does this model appear to be properly specified model in terms of both the deterministic and stochastic components? If not, why? No it does not. The linear model for the deterministic component does not capture the apparent curvilinear relationship between income and willingness to pay. As suspected by some politicians, the wealthier individuals appear to be willing to pay less than the middle income people. Also, the dependent variable is presumably measured on a discrete scale, since cents are indivisible monetary units, and thus the normal

8 Probability Distributions: Problem Set Answers 8 distribution is not the most appropriate. b. Given the scatterplot, what is a reasonable alternative deterministic function for this relationship? Any hump-shaped distribution might suffice, for example a quadratic polynomial would be logical choice. Other functions that rise, peak and then decline might also work, but the parabolic nature of the curvilinear relationship might not lend itself well to many of these. c. Given the dependent variable (amount willing to pay in additional cents/gallon), what is a more suitable probability distribution for the stochastic component of this model, and justify your choice? The Poisson probability distribution is phenomenologically ideally suited for discrete data that is non-negative integers and effectively unbounded on the upper end, which is the case here since cents can take on any non-negative integer, including zero, and could go as high as someone was willing to pay. The mean and the variance of the Poisson distribution are the same, which allows the variance to increase as the mean increases, but if the variance is much larger than the mean, then the negative binomial would be more appropriate, since it is similar to the Poisson except that it allows the variance to be larger than the mean (i.e., overdispersed data). 6. Consider a hypothetical study on the affect of bedrock geology on the calcium concentration of second and third-order streams in western Massachusetts. Calcium rich waters are especially important for certain organisms, such as bivalves. You sample 100 streams in the study area and for each stream you measure the percent of the watershed underlain by calcareous bedrock from a GIS data layer available from USGS. In addition, you collect water samples from each stream and measure the calcium concentration, which is measured on a continuous scale and ranges from trace amounts (0.01) to a little over 1.4. You are unsure what to expect for the relationship between percent calcareous and stream calcium concentration, so you plot the data. After seeing the data, you decide to fit a simple linear model (i.e., calcium=a+b*calcareous) with normally distributed errors. The linear fit is shown in the figure (solid line), as is the residuals of the model. Based on this information, answer the following questions: a. What are some of the potential sources of error in the final statistical model?

9 Probability Distributions: Problem Set Answers 9 Potential sources of measurement error include: Error in measuring the true calcium concentration of the water sample, since the assays are not perfect. Error in measuring the true percentage of the watershed underlain by calcareous bedrock, since the mapped geological data are very coarse approximations. Potential sources of process error include: Random variation in calcium concentration over time and space within the stream, such that any single water sample has a varying amount of calcium in it. Other factors beside bedrock geology that influence the calcium concentration of the water. b. What are three problems with the specification of this statistical model (i.e., the linear deterministic function and the normal probability distribution) with this dataset? The linear model does not capture the apparent curvilinear relationship. Perhaps a function that allows for curvature would be a better fit. Also, the intercept of the fitted linear model is negative, which means that calcium concentration is predicted to be negative when percent calcareous is very low, which is an impossible outcome since calcium concentration can never be negative. The normal distribution allows for negative values, but calcium concentration cannot be negative. Thus, if the mean calcium concentration is near zero, e.g., when percent calcareous is near zero, the normal distribution will predict some observations to be negative. A more appropriate distribution would be the gamma which does not allow negative or zero values. Note, even if we fit a zerointercept linear model for the deterministic component to account for the problem with the fitted negative intercept, which forces the intercept to go through zero, the normal distribution will still allow impossible negative values. In the normal distribution the mean and the variance are independent, which means that the variance should not change with the mean; i.e., that it remains constant as the mean changes. In this example, the variance clearly increases as the mean increases. The gamma distribution has a more complicated relationship between the mean and variance, but one that allows the variance to increase with the mean. c. Looking ahead to parameter estimation, can you think of a way to assess how well the specified linear model fits the data (or the lack of fit) that makes use of the probability distribution in an explicit way? If we are willing to assume that the data were drawn from a normal distribution with a mean (expected value) calcium concentration linearly dependent on percent calcareous bedrock., then we can determine the probability (or likelihood actually, but we will return to this distinction later) of observing any particular value of calcium concentration given the mean (expected) value for any value of calcareous bedrock. Specifically, for any single observation, the linear model gives us the expected value (or mean) of calcium concentration for the measured value of percent

10 Probability Distributions: Problem Set Answers 10 calcareous. The normal probability distribution gives us the probability of any outcome (calcium concentration) given the mean (expected value determined from the linear model) and standard deviation. So, if we are willing to assume a particular standard deviation for the normal probability distribution, then we can use the normal probability density function to determine the probability of the observed value of calcium concentration for the given value of percent calcareous. If we calculate the probability of observing each observation and multiply them together, we will get the probability of observing the entire dataset, which is a measure of how well the specified model fits the data. By trying different combinations of values for model parameters, we can search for the combination that gives us the greatest probability, which becomes our best estimate of the parameters the maximum likelihood estimates. 7. Consider a hypothetical dataset on the time between major flooding events in a river floodplain. Let s say you record the length of time (in say days or years) between flooding events that exceed a specified threshold in magnitude, say a 5 year flood event, under the assumption that the probability of a 5 year flood event is the same every year. What probability distribution would be appropriate as a mechanistic description of the distribution of time between events? There are at least two possibilities depending on whether time between events is considered a discrete variable or a continuous variable. If time between events is considered discrete, for example if time is measured in years and each year is considered a discrete unit, then the geometric distribution provides a mechanistic description of the data, because it is the number of trials (years) until you get a single failure (5 year flood event), given that there is a constant probability of a 5 year flood event every year. On the other hand, if time between events is considered continuous, for example if time is measured in days and days which are discrete units are merely an arbitrary measurement scale for an intrinsically continuous variable, then the exponential distribution provides a mechanistic description of the data, because it is the distribution of waiting times for a single event to happen, given that there is a constant probability per unit time that it will happen. Thus, the exponential distribution is the continuous counterpart of the geometric distribution. 8. Consider a hypothetical dataset on gypsy moth abundance in oak stands in the Quabbin watershed. Let s say that you put out pheromone traps in 100 locations for a 1 week period during the flight period and count the number of moths collected in each trap. The distribution of counts are shown here in the histogram. The computed mean count is 5.09 (moths/trap) and the variance is Based on this information, what probability distribution would be most appropriate for this data? This example represents classic simple count data

11 Probability Distributions: Problem Set Answers 11 for which the Poisson distribution is ideal, because it gives the distribution of the number of events or counts in a given sampling unit of counting effort if each event is independent of all the others. The assumption of independence of events may be problematic in this case, but there are statistical methods for dealing with this lack of independence. However, the Poisson distribution has a single parameter, lambda, and assumes that the mean and variance are equal to lambda. Given the computed mean and variance, this assumption does not hold for this dataset. Fortunately, the negative binomial distribution is well suited, at least phenomenologically, to deal with count data in which the variance is greater than the mean. So, in this case, the negative binomial is the preferred distribution. 9. Consider a hypothetical study on the energy performance of three different window types. You experimentally expose 20 window panes each of 3 different window types to direct sunlight under the same conditions (e.g., ambient air temperature) and measure the BTU s on the inside of the window. I don t know if this at all makes sense but it doesn t matter for the point of this exercise. The data are shown here as a box-and-whisker plot. Let s say that you hypothesize that the mean BTU differs among window type (i.e., the deterministic part of the model) and that you are willing to assume that the data were derived from a normal distribution (i.e., the stochastic part of the model) with a mean equal to the sample mean of the window type and a standard deviation equal to the pooled sample standard deviation; in other words, that the data were drawn from 3 normal distributions that differ in their means, but have the same spread. Note, this is the classical statistical model for this dataset. Given the sample means and pooled standard deviation below (note, these are the parameters of the normal probability distribution), answer the following questions: Mean BTU: type 1=1.28; type 2=1.84; type 3=3.09 Pooled standard deviation=1.61 a. What is the probability density of observing a btu=3 for a window of type=1? What about for a window of type=3? These values are easily computed from the normal probability density function given the specified means and standard deviation. See the lecture notes for the mathematical function. Here are the computations: window type 1: (1/sqrt(2*pi*1.61^2))*exp(-((3-1.28)^2)/(2*1.61^2))=0.14

12 Probability Distributions: Problem Set Answers 12 window type 3: (1/sqrt(2*pi*1.61^2))*exp(-((3-3.09)^2)/(2*1.61^2))=0.25 b. Are there any problems with the use of the normal probability distribution with this dataset? If so, is there a better alternative? Yes, there are at least two related problems. First, the normal distribution is unbounded. Thus, it allows for negative values, which are illogical in this case. This may not be a practical issue if the means are >>0, since negative values may be so unlikely as to not affect anything. However, if the means are close to 0, as in this case, this can be a real issue. In this case, it is apparent that the distributions of btu values for window types 1 and 2 are being truncated at zero, resulting in positively skewed distributions, which brings us to the second issue. The normal distribution is symmetrical about the mean. Due to the zero truncation problem, the distributions, at least for window types 1 and 2, are clearly not symmetrical. This is a very common situation with environmental data. Fortunately, the gamma distribution can be used, at least phenomenologically, with positively skewed distributions of positive real numbers, which is exactly the case here. Note, the gamma distribution does not allow for zeros; the data must be positive numbers. This limits the use of the gamma to situations in which the data must take on a positive value, or else it requires an minor adjustment to the data (e.g., adding a small value to each observation) so that the gamma can be used.

Analysis of Environmental Data Problem Set Conceptual Foundations: De te rm in istic fu n c tio n s Answers

Analysis of Environmental Data Problem Set Conceptual Foundations: De te rm in istic fu n c tio n s Answers Analysis of Environmental Data Problem Set Conceptual Foundations: De te rm in istic fu n c tio n s Answers 1. The following real data set contains data on marbled salamander abundance (abund=mean number

More information

Analysis of Environmental Data Conceptual Foundations: De te rm in istic Fu n c tio n s

Analysis of Environmental Data Conceptual Foundations: De te rm in istic Fu n c tio n s Analysis of Environmental Data Conceptual Foundations: De te rm in istic Fu n c tio n s 1. What is a deterministic (mathematical) function..................................... 2 2. Examples of deterministic

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 33 Probability Models using Gamma and Extreme Value

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institution of Technology, Kharagpur Lecture No. # 36 Sampling Distribution and Parameter Estimation

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 12 Probability Distribution of Continuous RVs (Contd.)

More information

Sampling Populations limited in the scope enumerate

Sampling Populations limited in the scope enumerate Sampling Populations Typically, when we collect data, we are somewhat limited in the scope of what information we can reasonably collect Ideally, we would enumerate each and every member of a population

More information

STT 315 Problem Set #3

STT 315 Problem Set #3 1. A student is asked to calculate the probability that x = 3.5 when x is chosen from a normal distribution with the following parameters: mean=3, sd=5. To calculate the answer, he uses this command: >

More information

Lecture 14: Introduction to Poisson Regression

Lecture 14: Introduction to Poisson Regression Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu 8 May 2007 1 / 52 Overview Modelling counts Contingency tables Poisson regression models 2 / 52 Modelling counts I Why

More information

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview

Modelling counts. Lecture 14: Introduction to Poisson Regression. Overview Modelling counts I Lecture 14: Introduction to Poisson Regression Ani Manichaikul amanicha@jhsph.edu Why count data? Number of traffic accidents per day Mortality counts in a given neighborhood, per week

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology Kharagpur Lecture No. #13 Probability Distribution of Continuous RVs (Contd

More information

14.2 THREE IMPORTANT DISCRETE PROBABILITY MODELS

14.2 THREE IMPORTANT DISCRETE PROBABILITY MODELS 14.2 THREE IMPORTANT DISCRETE PROBABILITY MODELS In Section 14.1 the idea of a discrete probability model was introduced. In the examples of that section the probability of each basic outcome of the experiment

More information

Experimental Uncertainty (Error) and Data Analysis

Experimental Uncertainty (Error) and Data Analysis Experimental Uncertainty (Error) and Data Analysis Advance Study Assignment Please contact Dr. Reuven at yreuven@mhrd.org if you have any questions Read the Theory part of the experiment (pages 2-14) and

More information

Lesson 3: Advanced Factoring Strategies for Quadratic Expressions

Lesson 3: Advanced Factoring Strategies for Quadratic Expressions Advanced Factoring Strategies for Quadratic Expressions Student Outcomes Students develop strategies for factoring quadratic expressions that are not easily factorable, making use of the structure of the

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 24, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

3. DISCRETE PROBABILITY DISTRIBUTIONS

3. DISCRETE PROBABILITY DISTRIBUTIONS 1 3. DISCRETE PROBABILITY DISTRIBUTIONS Probability distributions may be discrete or continuous. This week we examine two discrete distributions commonly used in biology: the binomial and Poisson distributions.

More information

Bus 216: Business Statistics II Introduction Business statistics II is purely inferential or applied statistics.

Bus 216: Business Statistics II Introduction Business statistics II is purely inferential or applied statistics. Bus 216: Business Statistics II Introduction Business statistics II is purely inferential or applied statistics. Study Session 1 1. Random Variable A random variable is a variable that assumes numerical

More information

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur

Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Probability Methods in Civil Engineering Prof. Dr. Rajib Maity Department of Civil Engineering Indian Institute of Technology, Kharagpur Lecture No. # 38 Goodness - of fit tests Hello and welcome to this

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Topic 1. Definitions

Topic 1. Definitions S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...

More information

Introducing GIS analysis

Introducing GIS analysis 1 Introducing GIS analysis GIS analysis lets you see patterns and relationships in your geographic data. The results of your analysis will give you insight into a place, help you focus your actions, or

More information

Statistics 100A Homework 5 Solutions

Statistics 100A Homework 5 Solutions Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to

More information

Using Microsoft Excel

Using Microsoft Excel Using Microsoft Excel Objective: Students will gain familiarity with using Excel to record data, display data properly, use built-in formulae to do calculations, and plot and fit data with linear functions.

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2018 Examinations Subject CT3 Probability and Mathematical Statistics Core Technical Syllabus 1 June 2017 Aim The

More information

Sampling The World. presented by: Tim Haithcoat University of Missouri Columbia

Sampling The World. presented by: Tim Haithcoat University of Missouri Columbia Sampling The World presented by: Tim Haithcoat University of Missouri Columbia Compiled with materials from: Charles Parson, Bemidji State University and Timothy Nyerges, University of Washington Introduction

More information

Lecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo

Lecture 1. Behavioral Models Multinomial Logit: Power and limitations. Cinzia Cirillo Lecture 1 Behavioral Models Multinomial Logit: Power and limitations Cinzia Cirillo 1 Overview 1. Choice Probabilities 2. Power and Limitations of Logit 1. Taste variation 2. Substitution patterns 3. Repeated

More information

Math Review Sheet, Fall 2008

Math Review Sheet, Fall 2008 1 Descriptive Statistics Math 3070-5 Review Sheet, Fall 2008 First we need to know about the relationship among Population Samples Objects The distribution of the population can be given in one of the

More information

This gives us an upper and lower bound that capture our population mean.

This gives us an upper and lower bound that capture our population mean. Confidence Intervals Critical Values Practice Problems 1 Estimation 1.1 Confidence Intervals Definition 1.1 Margin of error. The margin of error of a distribution is the amount of error we predict when

More information

Announcements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45

Announcements. J. Parman (UC-Davis) Analysis of Economic Data, Winter 2011 February 8, / 45 Announcements Solutions to Problem Set 3 are posted Problem Set 4 is posted, It will be graded and is due a week from Friday You already know everything you need to work on Problem Set 4 Professor Miller

More information

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form:

Review: what is a linear model. Y = β 0 + β 1 X 1 + β 2 X 2 + A model of the following form: Outline for today What is a generalized linear model Linear predictors and link functions Example: fit a constant (the proportion) Analysis of deviance table Example: fit dose-response data using logistic

More information

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds

Chapter 6. Logistic Regression. 6.1 A linear model for the log odds Chapter 6 Logistic Regression In logistic regression, there is a categorical response variables, often coded 1=Yes and 0=No. Many important phenomena fit this framework. The patient survives the operation,

More information

Prentice Hall Algebra Correlated to: Hawaii Mathematics Content and Performances Standards (HCPS) II (Grades 9-12)

Prentice Hall Algebra Correlated to: Hawaii Mathematics Content and Performances Standards (HCPS) II (Grades 9-12) Prentice Hall Hawaii Mathematics Content and Performances Standards (HCPS) II (Grades 9-12) Hawaii Content and Performance Standards II* NUMBER AND OPERATIONS STANDARD 1: Students understand numbers, ways

More information

Machine Learning, Midterm Exam

Machine Learning, Midterm Exam 10-601 Machine Learning, Midterm Exam Instructors: Tom Mitchell, Ziv Bar-Joseph Wednesday 12 th December, 2012 There are 9 questions, for a total of 100 points. This exam has 20 pages, make sure you have

More information

Use symbolic algebra to represent unknown quantities in expressions or equations and solve linear equations with one variable (below, keep)

Use symbolic algebra to represent unknown quantities in expressions or equations and solve linear equations with one variable (below, keep) Algebraic Relationships GLE s Remaining/CCSS standards (Bold-face highlighted does not align with GLEs) (Illustrations examples in blue) CCSS Vertical Movement to 7th 2. Represent and analyze mathematical

More information

Chapter 7: Hypothesis Testing - Solutions

Chapter 7: Hypothesis Testing - Solutions Chapter 7: Hypothesis Testing - Solutions 7.1 Introduction to Hypothesis Testing The problem with applying the techniques learned in Chapter 5 is that typically, the population mean (µ) and standard deviation

More information

1 Functions, Graphs and Limits

1 Functions, Graphs and Limits 1 Functions, Graphs and Limits 1.1 The Cartesian Plane In this course we will be dealing a lot with the Cartesian plane (also called the xy-plane), so this section should serve as a review of it and its

More information

STA 218: Statistics for Management

STA 218: Statistics for Management Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Problem How much do people with a bachelor s degree

More information

Probability Distribution

Probability Distribution Economic Risk and Decision Analysis for Oil and Gas Industry CE81.98 School of Engineering and Technology Asian Institute of Technology January Semester Presented by Dr. Thitisak Boonpramote Department

More information

Lecture-20: Discrete Choice Modeling-I

Lecture-20: Discrete Choice Modeling-I Lecture-20: Discrete Choice Modeling-I 1 In Today s Class Introduction to discrete choice models General formulation Binary choice models Specification Model estimation Application Case Study 2 Discrete

More information

Algebra 1 Mathematics: to Hoover City Schools

Algebra 1 Mathematics: to Hoover City Schools Jump to Scope and Sequence Map Units of Study Correlation of Standards Special Notes Scope and Sequence Map Conceptual Categories, Domains, Content Clusters, & Standard Numbers NUMBER AND QUANTITY (N)

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

KEYSTONE ALGEBRA CURRICULUM Course 17905

KEYSTONE ALGEBRA CURRICULUM Course 17905 KEYSTONE ALGEBRA CURRICULUM Course 17905 This course is designed to complete the study of Algebra I. Mastery of basic computation is expected. The course will continue the development of skills and concepts

More information

Algebra , Martin-Gay

Algebra , Martin-Gay A Correlation of Algebra 1 2016, to the Common Core State Standards for Mathematics - Algebra I Introduction This document demonstrates how Pearson s High School Series by Elayn, 2016, meets the standards

More information

Exam Applied Statistical Regression. Good Luck!

Exam Applied Statistical Regression. Good Luck! Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.

More information

LECTURE 15: SIMPLE LINEAR REGRESSION I

LECTURE 15: SIMPLE LINEAR REGRESSION I David Youngberg BSAD 20 Montgomery College LECTURE 5: SIMPLE LINEAR REGRESSION I I. From Correlation to Regression a. Recall last class when we discussed two basic types of correlation (positive and negative).

More information

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides

Chapter 7. Inference for Distributions. Introduction to the Practice of STATISTICS SEVENTH. Moore / McCabe / Craig. Lecture Presentation Slides Chapter 7 Inference for Distributions Introduction to the Practice of STATISTICS SEVENTH EDITION Moore / McCabe / Craig Lecture Presentation Slides Chapter 7 Inference for Distributions 7.1 Inference for

More information

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues

Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Trendlines Simple Linear Regression Multiple Linear Regression Systematic Model Building Practical Issues Overfitting Categorical Variables Interaction Terms Non-linear Terms Linear Logarithmic y = a +

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression:

Lecture Outline. Biost 518 Applied Biostatistics II. Choice of Model for Analysis. Choice of Model. Choice of Model. Lecture 10: Multiple Regression: Biost 518 Applied Biostatistics II Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture utline Choice of Model Alternative Models Effect of data driven selection of

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1

Answer Key. 9.1 Scatter Plots and Linear Correlation. Chapter 9 Regression and Correlation. CK-12 Advanced Probability and Statistics Concepts 1 9.1 Scatter Plots and Linear Correlation Answers 1. A high school psychologist wants to conduct a survey to answer the question: Is there a relationship between a student s athletic ability and his/her

More information

Introduction to ecosystem modelling (continued)

Introduction to ecosystem modelling (continued) NGEN02 Ecosystem Modelling 2015 Introduction to ecosystem modelling (continued) Uses of models in science and research System dynamics modelling The modelling process Recommended reading: Smith & Smith

More information

Objective Experiments Glossary of Statistical Terms

Objective Experiments Glossary of Statistical Terms Objective Experiments Glossary of Statistical Terms This glossary is intended to provide friendly definitions for terms used commonly in engineering and science. It is not intended to be absolutely precise.

More information

Chapter 9. Correlation and Regression

Chapter 9. Correlation and Regression Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in

More information

Chapter 4 Part 3. Sections Poisson Distribution October 2, 2008

Chapter 4 Part 3. Sections Poisson Distribution October 2, 2008 Chapter 4 Part 3 Sections 4.10-4.12 Poisson Distribution October 2, 2008 Goal: To develop an understanding of discrete distributions by considering the binomial (last lecture) and the Poisson distributions.

More information

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474

Index I-1. in one variable, solution set of, 474 solving by factoring, 473 cubic function definition, 394 graphs of, 394 x-intercepts on, 474 Index A Absolute value explanation of, 40, 81 82 of slope of lines, 453 addition applications involving, 43 associative law for, 506 508, 570 commutative law for, 238, 505 509, 570 English phrases for,

More information

The Components of a Statistical Hypothesis Testing Problem

The Components of a Statistical Hypothesis Testing Problem Statistical Inference: Recall from chapter 5 that statistical inference is the use of a subset of a population (the sample) to draw conclusions about the entire population. In chapter 5 we studied one

More information

Generalized Linear Models for Non-Normal Data

Generalized Linear Models for Non-Normal Data Generalized Linear Models for Non-Normal Data Today s Class: 3 parts of a generalized model Models for binary outcomes Complications for generalized multivariate or multilevel models SPLH 861: Lecture

More information

Midterm 2 - Solutions

Midterm 2 - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis February 23, 2010 Instructor: John Parman Midterm 2 - Solutions You have until 10:20am to complete this exam. Please remember to put

More information

Geology for Engineers Sediment Size Distribution, Sedimentary Environments, and Stream Transport

Geology for Engineers Sediment Size Distribution, Sedimentary Environments, and Stream Transport Name 89.325 Geology for Engineers Sediment Size Distribution, Sedimentary Environments, and Stream Transport I. Introduction The study of sediments is concerned with 1. the physical conditions of a sediment,

More information

Glades Middle School Summer Math Program

Glades Middle School Summer Math Program Summer Math Program Attention Cougars, It s time for SUMMER MATH!! Research studies have shown that during an extended summer vacation, children can lose an average of 2.6 months of knowledge. This is

More information

14.75: Leaders and Democratic Institutions

14.75: Leaders and Democratic Institutions 14.75: Leaders and Democratic Institutions Ben Olken Olken () Leaders 1 / 23 Do Leaders Matter? One view about leaders: The historians, from an old habit of acknowledging divine intervention in human affairs,

More information

Probability and Discrete Distributions

Probability and Discrete Distributions AMS 7L LAB #3 Fall, 2007 Objectives: Probability and Discrete Distributions 1. To explore relative frequency and the Law of Large Numbers 2. To practice the basic rules of probability 3. To work with the

More information

Probability and Probability Distributions. Dr. Mohammed Alahmed

Probability and Probability Distributions. Dr. Mohammed Alahmed Probability and Probability Distributions 1 Probability and Probability Distributions Usually we want to do more with data than just describing them! We might want to test certain specific inferences about

More information

Stat 587: Key points and formulae Week 15

Stat 587: Key points and formulae Week 15 Odds ratios to compare two proportions: Difference, p 1 p 2, has issues when applied to many populations Vit. C: P[cold Placebo] = 0.82, P[cold Vit. C] = 0.74, Estimated diff. is 8% What if a year or place

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Probability Distributions

Probability Distributions CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14

PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14 PHYSICS 15a, Fall 2006 SPEED OF SOUND LAB Due: Tuesday, November 14 GENERAL INFO The goal of this lab is to determine the speed of sound in air, by making measurements and taking into consideration the

More information

Chapter 26: Comparing Counts (Chi Square)

Chapter 26: Comparing Counts (Chi Square) Chapter 6: Comparing Counts (Chi Square) We ve seen that you can turn a qualitative variable into a quantitative one (by counting the number of successes and failures), but that s a compromise it forces

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

MATH W81: Problem Set 2 Presentations begin Thurs., Jan. 15

MATH W81: Problem Set 2 Presentations begin Thurs., Jan. 15 MATH W81: Problem Set 2 Presentations begin Thurs., Jan. 15 All Groups: Let x n denote the size of the sandhill crane population in a certain region at year n. 1 Let b n, d n denote the birth and death

More information

Logistic Regression for Distribution Modeling

Logistic Regression for Distribution Modeling Logistic Regression for Distribution Modeling GIS5306 GIS Applications in Environmental Systems Presented by: Andrea Palmiotto John Perry Theory Familiar Territory Linear Regression Relevant Assumptions

More information

Module 2: Reflecting on One s Problems

Module 2: Reflecting on One s Problems MATH55 Module : Reflecting on One s Problems Main Math concepts: Translations, Reflections, Graphs of Equations, Symmetry Auxiliary ideas: Working with quadratics, Mobius maps, Calculus, Inverses I. Transformations

More information

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7

EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 Introduction to Generalized Univariate Models: Models for Binary Outcomes EPSY 905: Fundamentals of Multivariate Modeling Online Lecture #7 EPSY 905: Intro to Generalized In This Lecture A short review

More information

Math K-1 CCRS Level A Alignment College & Career Readiness Standards Version: April 2017

Math K-1 CCRS Level A Alignment College & Career Readiness Standards Version: April 2017 Math K-1 CCRS Level A Alignment Standard Math K Lessons Math 1 Lessons Number and Operations: Base Ten Understand place value 6 Compare 1, 26 Compare 50, 33 Skip Count 5s and 10s, 35 Group 10s, 36 Compare

More information

Session-Based Queueing Systems

Session-Based Queueing Systems Session-Based Queueing Systems Modelling, Simulation, and Approximation Jeroen Horters Supervisor VU: Sandjai Bhulai Executive Summary Companies often offer services that require multiple steps on the

More information

Archdiocese of Washington Catholic Schools Academic Standards Mathematics

Archdiocese of Washington Catholic Schools Academic Standards Mathematics 8 th GRADE Archdiocese of Washington Catholic Schools Standard 1 - Number Sense Students know the properties of rational* and irrational* numbers expressed in a variety of forms. They understand and use

More information

Markov Chains and Pandemics

Markov Chains and Pandemics Markov Chains and Pandemics Caleb Dedmore and Brad Smith December 8, 2016 Page 1 of 16 Abstract Markov Chain Theory is a powerful tool used in statistical analysis to make predictions about future events

More information

Urban Transportation Planning Prof. Dr.V.Thamizh Arasan Department of Civil Engineering Indian Institute of Technology Madras

Urban Transportation Planning Prof. Dr.V.Thamizh Arasan Department of Civil Engineering Indian Institute of Technology Madras Urban Transportation Planning Prof. Dr.V.Thamizh Arasan Department of Civil Engineering Indian Institute of Technology Madras Module #03 Lecture #12 Trip Generation Analysis Contd. This is lecture 12 on

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output

y = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation y = a + bx y = dependent variable a = intercept b = slope x = independent variable Section 12.1 Inference for Linear

More information

BLAST: Target frequencies and information content Dannie Durand

BLAST: Target frequencies and information content Dannie Durand Computational Genomics and Molecular Biology, Fall 2016 1 BLAST: Target frequencies and information content Dannie Durand BLAST has two components: a fast heuristic for searching for similar sequences

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or

More information

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 41 Pulse Code Modulation (PCM) So, if you remember we have been talking

More information

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight

AP Statistics. Chapter 9 Re-Expressing data: Get it Straight AP Statistics Chapter 9 Re-Expressing data: Get it Straight Objectives: Re-expression of data Ladder of powers Straight to the Point We cannot use a linear model unless the relationship between the two

More information

CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE

CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE CHAPTER 3. THE IMPERFECT CUMULATIVE SCALE 3.1 Model Violations If a set of items does not form a perfect Guttman scale but contains a few wrong responses, we do not necessarily need to discard it. A wrong

More information

CIVL 7012/8012. Collection and Analysis of Information

CIVL 7012/8012. Collection and Analysis of Information CIVL 7012/8012 Collection and Analysis of Information Uncertainty in Engineering Statistics deals with the collection and analysis of data to solve real-world problems. Uncertainty is inherent in all real

More information

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke

BIOL 51A - Biostatistics 1 1. Lecture 1: Intro to Biostatistics. Smoking: hazardous? FEV (l) Smoke BIOL 51A - Biostatistics 1 1 Lecture 1: Intro to Biostatistics Smoking: hazardous? FEV (l) 1 2 3 4 5 No Yes Smoke BIOL 51A - Biostatistics 1 2 Box Plot a.k.a box-and-whisker diagram or candlestick chart

More information

Predicting AGI: What can we say when we know so little?

Predicting AGI: What can we say when we know so little? Predicting AGI: What can we say when we know so little? Fallenstein, Benja Mennen, Alex December 2, 2013 (Working Paper) 1 Time to taxi Our situation now looks fairly similar to our situation 20 years

More information

Mapping Common Core State Standard Clusters and. Ohio Grade Level Indicator. Grade 7 Mathematics

Mapping Common Core State Standard Clusters and. Ohio Grade Level Indicator. Grade 7 Mathematics Mapping Common Core State Clusters and Ohio s Grade Level Indicators: Grade 7 Mathematics Ratios and Proportional Relationships: Analyze proportional relationships and use them to solve realworld and mathematical

More information

1 Measurement Uncertainties

1 Measurement Uncertainties 1 Measurement Uncertainties (Adapted stolen, really from work by Amin Jaziri) 1.1 Introduction No measurement can be perfectly certain. No measuring device is infinitely sensitive or infinitely precise.

More information

Distribution Fitting (Censored Data)

Distribution Fitting (Censored Data) Distribution Fitting (Censored Data) Summary... 1 Data Input... 2 Analysis Summary... 3 Analysis Options... 4 Goodness-of-Fit Tests... 6 Frequency Histogram... 8 Comparison of Alternative Distributions...

More information

Management Programme. MS-08: Quantitative Analysis for Managerial Applications

Management Programme. MS-08: Quantitative Analysis for Managerial Applications MS-08 Management Programme ASSIGNMENT SECOND SEMESTER 2013 MS-08: Quantitative Analysis for Managerial Applications School of Management Studies INDIRA GANDHI NATIONAL OPEN UNIVERSITY MAIDAN GARHI, NEW

More information

Varieties of Count Data

Varieties of Count Data CHAPTER 1 Varieties of Count Data SOME POINTS OF DISCUSSION What are counts? What are count data? What is a linear statistical model? What is the relationship between a probability distribution function

More information

West Windsor-Plainsboro Regional School District Algebra Grade 8

West Windsor-Plainsboro Regional School District Algebra Grade 8 West Windsor-Plainsboro Regional School District Algebra Grade 8 Content Area: Mathematics Unit 1: Foundations of Algebra This unit involves the study of real numbers and the language of algebra. Using

More information