Deciphering Math Notation. Billy Skorupski Associate Professor, School of Education

Deciphering Math Notation Billy Skorupski Associate Professor, School of Education

Agenda General overview of data, variables Greek and Roman characters in math and statistics Parameters vs. Statistics Common operators and how they work Particular focus on Summation and Product operators, and the associated use of SUBSCRIPTS Miscellaneous statistical symbols and terminology List a few common symbols from Set Theory (to be discussed in Probability)

An Example Data Set i is an indexing variable: i= 1, 2,, N N= 9 Let s say contains measurements for a numeric variable Let s say Y indicates designations for a categorical variable i Y 1 15 1 2 27 1 3 32 1 4 11 1 5 23 2 6 21 2 7 9 2 8 44 3 9 26 3

An Example Data Set and Y are column vectors of length 9. Together, and Y make a 9 x 2 matrix In most cases, your data will have N rows, one for every subject, and one column per variable. i Y 1 15 1 2 27 1 3 32 1 4 11 1 5 23 2 6 21 2 7 9 2 8 44 3 9 26 3

Repeated Measures has been observed on 3 occasions (e.g., at Time1, Time2, Time3). We call this wide format Long format would need 27 rows: N people x 3 observations per. i Y 1 2 3 1 1 12 15 25 2 1 17 27 32 3 1 30 32 27 4 1 11 11 25 5 2 13 23 33 6 2 15 21 29 7 2 11 9 11 8 3 39 44 46 9 3 25 26 32

Repeated Measures long format (only first 3 subjects) Y is repeated 3 times for each subject. T indicates which observation of The 9 values are the first 3 rows of from the previous slide i Y T 1 1 1 12 1 1 2 15 1 1 3 25 2 1 1 17 2 1 2 27 2 1 3 32 3 1 1 30 3 1 2 32 3 1 3 27

Greek and Roman letters The purpose of most (all?) data analysis is to make an inference about population PARAMETERS that exist as part of the POPULATION. We can t directly observe them, so we make educated guesses by collecting a SAMPLE of data and calculating STATISTICS.

Greek and Roman letters Parameters are almost always indicated as Greek letters. Corresponding statistics (parameter estimates) are indicated in one of two ways: 1. 2. Using the Roman letter that corresponds to the Greek : Using a "hat"over the Greek letter b : ˆ

Greek and Roman letters So, Greek letters (e.g., ) are used to indicate population parameters, fixed constants out there in the world (things we are trying to estimate). Parameter estimates come from samples, (that s the job of inferential statistics) and are indicated by Roman letters or Greek letters with hats

Greek and Roman letters Articles using such symbols will either adopt standard practice (e.g., use 0, 1,, p as population regression coefficients), or they will establish the notation to be used in the paper. For example, if more than one regression model is presented, one model may use 0, 1,, p as coefficients, the next may use 0, 1,, p, and the next may use 0, 1,, p, and so on.

Greek and Roman letters Check out the 1 st table in the Handout

Operators Check out the 2 nd and 3 rd tables Most symbols are quite familiar, but and as operators can be confusing at first... (a Greek upper case S ) is for Summation (Add them up) (a Greek upper case P ) is for Product (Multiply them)

Subscripts Subscripts are variables that index other variables. For example, the variable i in our example data set, whose only meaning is the serial position of the subjects in the data set. N i 1 N i When you see it means, add up the variables that appear to the right. The i = 1 at the bottom of and the N at the top are instructions. i will be an indexing variable that starts at 1 and goes to N.

Subscripts Often, if the instructions are to add up all N of the values, the summation will be presented in a shorter form without subscripts: N i 1 N i or N

Another Example Population and Sample Variance have no subscripts...why? and value has a subscript to indicate each 1 ) ( ) ( 1 2 2 1 2 2 N s N i N i i N i i

ANOVA Example Let s say we ve conducted an experiment after randomly assigning participants to one of three treatment conditions. For each subject in each group, we measure the dependent variable, Each person s score can be notated as ij (or sometimes [i,j]), the score for person i in group j. i will go from 1 to n j while j goes from 1 to M, the number of groups (M=3, in this case)

A one way ANOVA table Source SS (Sum of Squares) df MS F p Between M j1 n j ( j 2 ) M-1 Sig. SS df B B MS MS B W Within M n j j1 i1 ( ij j 2 ) N-M = M j1 ( n j 1) SS df W W Total N i1 ( i 2 ) N-1 MS = Mean Square which is just another name for variance

One more (trickier) example Let s say I am describing the population Variance Covariance matrix,, among P variables (P=5). The elements are referred to as ij. What if I want to add up just the elements in the lower triangle?

Population Variance Covariance Matrix,

Sum down the rows, across the columns Say i are the rows, and j are the columns: Sum of lower triangle P i1 ji ij

Set Theory The notation presented in the final table on Set Theory will be very useful for various probability statements. This notation will also sometimes appear in Summation and Product notation when creating subsets of members for aggregating data.

Thanks! Any Questions, Discussion?