California State Science Fair

Similar documents
Section 5.4. Ken Ueda

Lesson 6: Algebra. Chapter 2, Video 1: "Variables"

MITOCW ocw f99-lec30_300k

MITOCW big_picture_derivatives_512kb-mp4

MITOCW ocw f99-lec01_300k

Electricity Show Outline

The general topic for today is going to be oscillations, which are extremely important in the applications and in

Note: Please use the actual date you accessed this material in your citation.

MITOCW MITRES18_005S10_DerivOfSinXCosX_300k_512kb-mp4

MITOCW watch?v=ko0vmalkgj8

MITOCW ocw f99-lec09_300k

Physics 509: Propagating Systematic Uncertainties. Scott Oser Lecture #12

MITOCW MITRES18_005S10_DiffEqnsGrowth_300k_512kb-mp4

Chapter 1 Review of Equations and Inequalities

Uncertainty: A Reading Guide and Self-Paced Tutorial

September 12, Math Analysis Ch 1 Review Solutions. #1. 8x + 10 = 4x 30 4x 4x 4x + 10 = x = x = 10.

MITOCW ocw-18_02-f07-lec17_220k

Chapter 3 ALGEBRA. Overview. Algebra. 3.1 Linear Equations and Applications 3.2 More Linear Equations 3.3 Equations with Exponents. Section 3.

But, there is always a certain amount of mystery that hangs around it. People scratch their heads and can't figure

Module 3 Study Guide. GCF Method: Notice that a polynomial like 2x 2 8 xy+9 y 2 can't be factored by this method.

MITOCW ocw f99-lec05_300k

PROFESSOR: WELCOME BACK TO THE LAST LECTURE OF THE SEMESTER. PLANNING TO DO TODAY WAS FINISH THE BOOK. FINISH SECTION 6.5

EQ: How do I convert between standard form and scientific notation?

MITOCW ocw f99-lec17_300k

3: Linear Systems. Examples. [1.] Solve. The first equation is in blue; the second is in red. Here's the graph: The solution is ( 0.8,3.4 ).

Sequences and infinite series

MITOCW ocw f99-lec23_300k

MITOCW watch?v=ed_xr1bzuqs

Recall, we solved the system below in a previous section. Here, we learn another method. x + 4y = 14 5x + 3y = 2

MITOCW ocw-18_02-f07-lec02_220k

MITOCW watch?v=fkfsmwatddy

Take the Anxiety Out of Word Problems

MITOCW MIT18_01SCF10Rec_24_300k

PHY 123 Lab 1 - Error and Uncertainty and the Simple Pendulum

2. Limits at Infinity

TEACHER NOTES MATH NSPIRED

MITOCW 8. Electromagnetic Waves in a Vacuum

We're in interested in Pr{three sixes when throwing a single dice 8 times}. => Y has a binomial distribution, or in official notation, Y ~ BIN(n,p).

Solving with Absolute Value

Fog Chamber Testing the Label: Photo of Fog. Joshua Gutwill 10/29/1999

Properties of Arithmetic

MITOCW MIT8_01F16_w02s05v06_360p

Science Literacy: Reading and Writing Diagrams Video Transcript

MITOCW 6. Standing Waves Part I

MITOCW Investigation 4, Part 3

MITOCW watch?v=0usje5vtiks

MITOCW ocw-18_02-f07-lec25_220k

MITOCW ocw lec8

MITOCW ocw f07-lec36_300k

MITOCW MITRES18_006F10_26_0602_300k-mp4

18.02SC Multivariable Calculus, Fall 2010 Transcript Recitation 34, Integration in Polar Coordinates

Communication Engineering Prof. Surendra Prasad Department of Electrical Engineering Indian Institute of Technology, Delhi

Dialog on Simple Derivatives

6: Polynomials and Polynomial Functions

Page 1. These are all fairly simple functions in that wherever the variable appears it is by itself. What about functions like the following, ( ) ( )

value of the sum standard units

Math 2 Variable Manipulation Part 1 Algebraic Equations

MITOCW watch?v=ztnnigvy5iq

For those of you who are taking Calculus AB concurrently with AP Physics, I have developed a

Fibonacci mod k. In this section, we examine the question of which terms of the Fibonacci sequence have a given divisor k.

MITOCW watch?v=rwzg8ieoc8s

Algebra: Linear UNIT 16 Equations Lesson Plan 1

Spectral Lines. I've done that with sunlight. You see the whole rainbow because the prism breaks the light into all of its separate colors.

Solving Quadratic & Higher Degree Equations

MITOCW watch?v=4q0t9c7jotw

An introduction to plotting data

The topic is a special kind of differential equation, which occurs a lot. It's one in which the right-hand side doesn't

Sequences and Series

Module 03 Lecture 14 Inferential Statistics ANOVA and TOI

Introduction to Algebra: The First Week

What is Crater Number Density?

MITOCW MIT18_02SCF10Rec_61_300k

MITOCW Investigation 3, Part 1

MITOCW Lec 15 MIT 6.042J Mathematics for Computer Science, Fall 2010

Conversation with Tom Bailey about how a photon can have momentum even though it has zero mass 9 September 2012 at 17:57

MITOCW watch?v=i0yacday-ww

Hi, my name is Dr. Ann Weaver of Argosy University. This WebEx is about something in statistics called z-

MITOCW ocw f07-lec39_300k

Grade 7/8 Math Circles November 27 & 28 &

Counting Out πr 2. Teacher Lab Discussion. Overview. Picture, Data Table, and Graph. Part I Middle Counting Length/Area Out πrinvestigation

Name Date Block LESSON CLUSTER 6: Expansion and Contraction

Math Fundamentals for Statistics I (Math 52) Unit 7: Connections (Graphs, Equations and Inequalities)

WEEK 7 NOTES AND EXERCISES

Student: We have to buy a new access code? I'm afraid you have to buy a new one. Talk to the bookstore about that.

Experiment 1: The Same or Not The Same?

Physics 509: Non-Parametric Statistics and Correlation Testing

Part I. Experimental Error

Today. life the university & everything. Reminders: Review Wed & Fri Eyes to the web Final Exam Tues May 3 Check in on accomodations

MITOCW watch?v=vu_of9tcjaa

Slope Fields: Graphing Solutions Without the Solutions

Chapter 14: Finding the Equilibrium Solution and Exploring the Nature of the Equilibration Process

Newton s Wagon. Materials. friends rocks wagon balloon fishing line tape stopwatch measuring tape. Lab Time Part 1

Mon Jan Improved acceleration models: linear and quadratic drag forces. Announcements: Warm-up Exercise:

Physics 6720 Introduction to Statistics April 4, 2017

Physics 509: Bootstrap and Robust Parameter Estimation

Take the measurement of a person's height as an example. Assuming that her height has been determined to be 5' 8", how accurate is our result?

PHYSICS 107. Lecture 27 What s Next?

Trinity Web Site Design. Today we will learn about the technology of numbering systems.

Introduction to Thermodynamic States Gases

Contingency Tables. Contingency tables are used when we want to looking at two (or more) factors. Each factor might have two more or levels.

Transcription:

California State Science Fair How to Estimate the Experimental Uncertainty in Your Science Fair Project Part 2 -- The Gaussian Distribution: What the Heck is it Good For Anyway? Edward Ruth drruth6617@aol.com In Part 1 we discussed the two types of experimental uncertainty: systematic and random error. We said that systematic error affected the accuracy of your experiment and was removed by calibration. We said that random error affected the precision of your experiment and could by studied with statistics. We discussed how to apply this concept to your data by calculating the mean and the standard deviation. Students who are uncomfortable with these terms should go back and review Part 1 before preceding with this lesson. For those of you who think you are ready: fasten your seat belts because we are about to boldly go into the wild universe of statistics. (What? The chair at your computer doesn't have a seat belt? How do you keep from falling out when playing Flight Simulator?) We are going to take a closer look at what the mean and standard deviation are really telling us about our data and to do this we will need to learn a new concept: the distribution. Suppose that I used Mathcad's built-in random number generator to a create a sample of N random numbers, x between 0 and 1: N 1000 i 1.. N x i rnd( 1) Let's plot these numbers and have a look at them. Ï Ì Ó Note #1: You can experiment here by using a different number for N. 1 Fig.1 Plot of Random Number 0.75 x i 0.5.5 0.25 0 200 400 600 800 1000 i

Notice that there are about the same number of points above as below 0.5. Recall that our sample of random numbers was for numbers between 0.0 and 1.0 and that 0.5 is the midpoint between these two numbers. Why don't you try counting all of the data points below 0.5 and then count all of the data points above 0.5. Were the two counts close? At this point I want to introduce the concept of a bin. A bin is just an interval between two numbers in which we count the number of data points. In our example one bin is between 0.0 and 0.5. A second bin is between 0.5 and 1.0. You can have as many bins as you want. For example we could have used 4 bins between 0.0 and 1.0 with each bin being 0.25 wide. As it happens Mathcad has a built-in function for counting the number of points in each bin. Let's use it to find the count for bins that are 0.1 wide. After we get the counts we will plot them in a special plot called a histogram. In a histogram we plot the bin intervals on the x-axis and on the y-axis we plot the counts in each bin. This will give us a picture of how the data in the sample falls into the bins or the distribution of the sample. k 1.. 10 bin. k.1 ( k 1) dist hist( bin, x) j 1.. 9 200 Fig.2 Histogram of Uniform Distribution dist j 100 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 bin j The first thing you should notice about the histogram is that all of the bins have about the same number of points in them and the bars we've plotted are of (more or less) uniform height. This is a uniform distribution and is just what you would expect if our sample was truly random (it is beyond our current discussion to go into too much detail; but, you should be aware that some random number generators are not very random at all. There are computer scientists who loose sleep over this! When you use random number generators it pays to be careful.) Will our experimental data have a uniform distribution? The answer is no. Most experimental data does not have a uniform distribution. To see what kind of distribution to expect in our data let's stop and think about what random errors really are. The random error in your experiment is the result of a lot of little things that you can't see or control, that are different each time you repeat the experiment, and combine together to be the error that you can notice. These little errors are all about the same size and they randomly add or subtract from the true value. We can use this knowledge to create a math model of random error. Suppose that the true value of our measurement is x true and that there are some tiny random errors,e, that are all of about the same size and that they randomly add or subtract to the true value to produce the modeled observed values x i.

Math Model of Random Error Number of Samples => N 1000 i 1.. N True value => x true.5 Error size function (returns small random number) => η.1 size( r ) η. rnd( 1) Error sign function (returns ±1) => sign( r) if( r >.5, 1, 1) Error term => ε( r ) size( r). sign( r) Modeled observed values => x i x true j = 1 10 ε( rnd( 1) ) Fig.3 Random Error Model Results 1 x i 0.5 0 200 400 600 800 1000 i Even by eyeball these results don't look quite as uniformly distributed as the first example. We can study this new distribution better by once again plotting a histogram: Set up histogram of math model results => k max 20 k 1.. k max j 1.. k max 1 bin. k.05 ( k 1) h hist( bin, x)

150 Fig.4 Histogram of Random Error Model 100 h j 50 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 bin j Now here is something completely different. This distribution is not uniform at all. The bins are different sizes. The bigger bins are all around the true value with the bins getting smaller the farther we are away from the true value. The distribution that we see is an important one. Since our math model accurately portrays how random errors work in a real world experiment we can expect that real data will follow a similar distribution. This is in fact what is observed in real experiments. The distribution that we have plotted is for only N samples. All real world samples will have some finite size. However we can think of the mathematical concept of the limiting distribution. The limiting distribution is the theoretical limit that we can only approach as the sample size becomes infinitely large. For the limiting distribution we can use very, very small bins (infinitesimally narrow bins). The result will be a smooth curve. Experiments can't have infinite sample sizes; but for even a modestly large sample size the distribution begins to look like the limiting distribution case. The special limiting distribution for random error is called the Gaussian or normal distribution. The Gaussian distribution is given by the following function: 1 f( y, µ, σ ). exp.5. y µ σ. 2. σ π 2 If you are unfamiliar with the function exp(x) it is just: exp( x) e x = where e = 2.718281828 This a common function in math and physics. You are sure to see it again in your studies.

What are m and s? Why they are our old friends the mean and standard deviation! We can now see how it all comes together. Real world random errors have Gaussian distributions. By calculating the mean and standard deviation we can get the shape of the distribution. And knowing that the distribution is Gaussian means we can use the well developed statistics for Gaussian distributions to study our data (of course the reason that the statistics for Gaussian distributions are well developed is because people knew that they could use it to study their data). Let's put it all together with a sample problem: A group of students have a model rocket club. They have been experimenting with an array of 25 microphones that they are using to try to determine the maximum altitude of their rockets. Each microphone in the array gives them an independent measurement of altitude (the number of microphones is the sample size). The altitude data in meters from their first flight using the array is given below: Number of samples (microphones) => N 25 i 1.. N Altitude data in meters => altitude i 305.12 293.65 297.93 305.21 307.7 289.77 291.23 325.45 298.34 289.75 306.29 318.61 307.73 296.76 304.7 281.8 316.99 310.48 295.24 310.26 310.41 311.95 300.77 296.56 306.04 Each data point is one microphone reading. The number of reading equals the number of microphones. Using Mathcad's built-in functions we quickly find the mean and standard deviation of these results: The mean => µ mean( altitude ) µ = 303.15 The standard deviation => σ stdev( altitude ) σ = 9.96

The results of the 25 measurements are plotted in Fig. 5 along with the mean and two limit lines given by: limits = m± 3s Notice that none of the points fall outside of these limits. The statistics of the Gaussian distribution tells us that 99.7% of all of the points will fall between these limits. This is very useful information. The 3 s value is often used to define limits in engineering. For example we design space vehicles to be strong enough to withstand the m + 3 s environments knowing that the space craft will be strong enough for almost any situation. Fig. 5 Rocket Altitude Mesurements 350 µ. 3 σ Altitude (m) altitude i µ 300 µ. 3 σ 250 5 10 15 20 25 i Microphone Number In Fig. 6 we plot the histogram for the microphone experiment. On the same plot we'll include the Gaussian for the same mean and standard deviation as the experiment. Note that even through we have only 25 samples the Gaussian shape of the distribution of our data is clearly apparent. k max 12 k 1.. k max j 1.. k max 1 x µ 6. σ.. µ 6. σ 12. σ bins. k k ( µ 6. σ ) k max h hist( bins, altitude) 12. σ N. k max Notice that I've done something different here. I have scaled the histogram results by the sample size and the width of the bins. I did this so that the histogram and the Gaussian will have the same scale for the plot. Do you know why this worked? Stay tuned we give you some more information to figure this out.

Fig. 6 Histogram of Rocket Experiment 0.04 h j f( x, µ, σ) 0.02 0 260 280 300 320 340 360 bins j, x So the data really are Gaussian. How can we use this knowledge to further understand our experiment? Here is one question you could ask: for a given m and s how many samples will fall into a given bin? Statistics tells us the answer is the area under the Gaussian curve times the sample size. We can find the area easily by integrating the curve (if you have not had calculus just think of the integral as finding the area under the curve. We'll use a built-in Mathcad function for the integral so the whole thing will be done for you anyway). So for the rocket example we find that for a bin between 295 and 300 we would expect this many samples:. N 300 295 f( y, µ, σ) dy = 4.23 Using the histogram function we find for our experiment that there were this many samples: 295 hist, altitude = 5 300 Now that's pretty good agreement for a sample size of only 25 (as we say in the aerospace business "it's close enough for government work"). If the sample size were larger we would find that our sample distribution agreed even more with the Gaussian. You should do Prob. 1 and see the results of a larger sample. We can answer how I knew that 99.7% of all values are within the 3 s limits by just integrating: µ. 3 σ µ. 3 σ f( y, µ, σ) dy = 99.73 %

This is the end of Part 2. I know that this is a complicated subject but by now you should have enough tools in your math tool kit to analyze the error in any science fair experiment. If you have questions or problems just send me an e-mail and I'll try to help. So long for now and may all of your distributions be Gaussian! Suggested Problems 1) We can create a example set of points that will simulate real world data with a Gaussian distribution. Use the following function to generate your data set (just cut and paste into the document at Note #4): N 1000 i 1.. N µ new 300 σ new 10 altitude i µ.. new σ. new 2 ln( rnd( 1) ) cos( 2.. π. rnd( 1) ) Using this function makes it easy to study the impact of things like sample size on your data set. Try different values for N, m, and s. 2) If 99.7% of the samples in a data set fall between m + 3 s and m - 3 s then what percentage of samples fall between m + 2 s and m - 2 s? Between m + s and m - s? 3) How did I know how to scale the histogram in Fig. 6? Suggested Further Reading Taylor, John R., An Introduction to Error Analysis, University Science Books, 1982. (If you are interested in doing experiments then you have got to read this book. Besides it has a way cool picture on the front cover.)