Education Production Functions April 7, 2009
Outline I Production Functions for Education Hanushek Paper Card and Krueger Tennesee Star Experiment Maimonides Rule
What do I mean by Production Function? I mean just the standard thing In general Y = F(X 1, X 2 ) where Y is output (measured in dollars) and X 1 and X 2 are inputs. A firm then chooses the level of inputs to maximize profit: π = F(X 1, X 2 ) P 1 X 1 P 2 X 2
Is this a useful concept for thinking about education? Problem 1: Do schools really maximize profit? Maybe not, but it probably doesn t matter. Suppose that F was indeed the true relations hip between X 1, X 2, and Y. If we had data on X 1, X 2 and Y we could learn F. Then since we understand this we could choose the optimal levels of X 1 and X 2 Thus knowing F is certainly useful, but...
Problem 2: Is F really knowable? Can we really conceptualize Y, X 1,and X 2? Even if it was all measurable, its not clear that you and I could agree on what Y actually is. Presumably even if this isn t perfect, we can learn something from it.
Y So what is Y? Ideally it would be utility (In fact, with externalities, ideally it would be everybody s utility not just the guy who gets it) This is great but not particularly practical
Other things we might want to get are Wage levels Employment Job Satisfaction Patriotism Knowledge of Politics Technical Ability Creativity Intellectual Depth Kindness Criminal Activity
Some of things are possible to get, but most aren t Instead we need to use what we have Often we don t have long time spans, we need to use things that are measured while students are in school or shortly thereafter Main one is Test Scores Others are high school graduation college attendance
How do we use test scores? With test scores there multiple ways run regressions let T ig be the test score for child f in grade g Levels: T ig = X ig β + u ig Pure Gains: T ig T ig 1 = X ig δ + v ig New level conditional on old level: T ig = αt ig 1 + X ig γ + ε ig There are advantages and disadvantages of each
Inputs Now what are the inputs? Ideally things like Curriculum and how a teacher teaches different subjects How much time she spends with each student in the class How kids in the class interact How classrom materials help kids learn Ultimately these things are hard to measure
Whether this is what you want may actually depend on who you are Suppose you are the head of the school board: you can control the different type of resources that go to schools, but you do not have that much influence over precisely how they are used In this case we would imagine things like: Teacher salaries Teacher qualifications (clearly there is an interaction between the first two) Number of teachers per child Number of administrators Money for classroom resources (e.g. books) Money for school structures (e.g. gyms)
What Data can we get? Getting data on all of this is very difficult, but often we can get data on: Teacher/student ratio Per Pupil Expenditure Fraction of teachers with a Master s degree Teacher Experience Teacher Salary
Outline I Production Functions for Education Hanushek Paper Card and Krueger Tennesee Star Experiment Maimonides Rule
Hanushek paper in Journal of Economic Literature Hanushek s paper is mostly about education production functions, but he discusses a number of other things as well I think this reads very well as a discussion of many of the most important issues in the economics of education literature so I will go through all of his tables
He starts with some raw descriptive information about how things have changed over time in the U.S.
Next he summarizes the results from a lot of studies in the following way
Hedges et. al. Critique of Hanushek The way he did it is somewhat controversial and informal There is a more formal way to combine studies using more sophisticated method Hedges, Land, and Greenwald reanalyze Hanushek s data in a paper entitled Does Money Matter? A Meta-Analysis of Studies of the Effects of Differential School Inputs on Student Outcomes I am not an expert on meta-analysis and this is not a technique that is common in economics, so I do not want to get into too much detail However, I will give the basic flavor.
First a fact suppose that Y F which means that Pr(Y y) = F (y). Now suppose we construct a random variable in the following way, we randomly draw Y from the distribution F and then we construct F(Y ). Assume that F is continuous. What is the distribution of F(Y )? To see this let y 0.5 be the median. By definition 0.5 = Pr(Y y 0.5 ) = Pr(F(Y ) F(y 0.5 )) = Pr(F(Y ) 0.5)
But this is not just true at the median, it is true anywhere so that if y q is the q th quantile q = Pr(Y y q ) = Pr(F(Y ) F(y q )) = Pr(F(Y ) q) but this means that F(Y ) is uniform.
Notice that a p-value is just a special case of this. If our null is β 1 = 0, then under the null β 1 se( β 1 ) N(0, 1) so will be uniform. Φ ( ) β1 se( β 1 ) Now suppose we have a bunch of different samples that all test whether β 1 = 0. As long as the samples are independent we have a bunch of independent p-values (under the null). This is the basis of a test that combines information across studies. Hedges and coauthors use this basic approach:
Further we might want to get effect sizes. Again if we have a bunch of estimators of β 1 with standard errors, as long as they are independent we can combine them. I will not talk about the details of this.
It really isn t so obvious how to put this together Hedges et. al. argue that there is real evidence that money matters" Hanushek responds that he is really looking at a different question There null hypothesis is that there is no study for which money matters His null is that the empirical literature does not reach a strong conclusion All of these studies suffer from a big problem They all essentially run outcomes on these inputs But, inputs are not randomly assigned
Outline I Production Functions for Education Hanushek Paper Card and Krueger Tennesee Star Experiment Maimonides Rule
Card and Krueger used a fixed effect type approach They propose the framework where y ijkc = δ jc + µ kc + X ijkc β c + E ijkc ( γjc + ρ rc ) ) + ɛ ijkc i: individual j: state of birth c: cohort k: current state r: region X: observable stuff E: Education They interpret γ jc as a measure of school quality.
They then regress γ jc = a j + Q jc b, where Q picks up measures of quality of education. They estimate the model in two steps. They do more than just this, but lets focus on the main results
Two more recent papers deal with this is a better way Krueger uses data from an experiment with random assignment Angrist and Lavy come up with a creative instrument
Outline I Production Functions for Education Hanushek Paper Card and Krueger Tennesee Star Experiment Maimonides Rule
Random Assignment Using notation similar to the Krueger paper we can write where Y is = as s + bf i + α s + ε i i represents student i s represents school s S s is observed characteristics of school s F i is observed family background variables of student i α s represents unobserved characteristics of school s ε i represents unobserved characteristics of student i Thus the error term is α s + ε i
What do we need for OLS to be consistent? We need 0 = cov(s s, α s + ε i ) = cov(s s, α s ) + cov(s s, ε i ) and 0 = cov(f i, α s + ε i ) = cov(f i, α s ) + cov(f i, ε i )
I am worried about all of these things but I am particularly worried about: cov(s s, ε i ) = 0 One would think that more highly motivated or richer parents would tend to send their kids to better schools Thus does finding that school resources are associated with positive outputs indicated that the school inputs are valuable or that good kids go to good schools?
Social Experiments One solution to this problem is random assignment If we could randomly assign kids to schools then by construction S s would be uncorrelated with ε i. Randomly assigning kids to schools is almost impossible, but randomizing kids to classes within schools is not A famous experiment in Tennessee did just that
The Tennessee Student/Teacher Achievement Ratio (STAR) Experiment Began during the 1985-1986 school year in Tennessee Kindergarten classes were divided into three types: Small classes: 13-17 students Regular classes: 22-25 students Regular classes with Aide: 22-25 students and an aide Only schools big enough to allow at least one of each type were eligible Students were randomly assigned to classes Teachers were randomly assigned to classes Kids stay in the same type of class for four years Test kids at the end of each year Total of 11,600 students and 80 schools were used
Unfortunately people are not like test tubes They do not necessarily do what the people running the experiment intended them to do like in the Milwaukee case We see a number of problems: Random assignment after kindergarden was done again for Regular vs. Regular/Aide (although small schools were OK) Some kids switched from small to Regular or vise versa (because of problems with kids or because parents complained) Kindergarten was not required so many students begin school in first grade
As in Milwaukee, Students leave school (either move or go to private school) Would not be that big a deal if it happened at random Probably not random, parents might be angry that their kid was assigned to a regular classroom This is called nonrandom attrition Not a small deal: about 1/2 of kids present in kindergarten were not there in at least 1 subsequent year Lets look at some of the raw data
OLS Estimation Hw do we use this to estimate the effect? It is not quite as clean as one might like because it is random assignment by class not random assignment by school Krueger estimates the following model Y ics = β 0 + β 1 Small cs + β 2 Reg/A cs + β 3 X ics + α s + ε ics
where now i represents student i c represents class s represents school s S s is observed characteristics of school s X ics is other observed stuff α s represents unobserved characteristics of school s ε i represents unobserved characteristics of student i A major idea here is that α s is allowing for a school specific shock Thus everything that is used comes from within the school
Class Size Effect This still doesn t answer precisely what we are interested in What does small class mean. We want the effect of adding one more student, the magnitude here is hard to interpret People switched classes and were able to take other things after kindergarten How can we deal with that? Lets think about the model Y ics = β 0 + β 1 C cs + β 2 X ics + α s + ε ics Can we just run OLS with the experimental sample?
No, C cs is not randomly assigned for a number of different reasons. This wouldn t really use the experiment. However, we can do IV (somewhat like in Milwaukee case) Use kindergarten assignment as instrument It will be correlated with class size for sure It will be uncorrelated with everything else for sure Thus, it solves both problems!
Outline I Production Functions for Education Hanushek Paper Card and Krueger Tennesee Star Experiment Maimonides Rule
Maimonides Rule Is there anything we can do if we don t totally trust the experiment Angrist and Lavy found a very clever way to estimate the effects of class size Maimonides was a twelfth century Rabbinic scholar He interpreted the Talmud in the following way: Twenty-five children may be put it charge of one teacher. If the number in the class exceeds twenty-five but is not more than forty, he should have an assistant to help with the instruction. If there are more than forty, two teachers must be appointed.
This rule has had a major impact on education in Israel They try to follow this rule so that no class has more than 40 kids But this means that If you have 80 kids in a grade, you have two classes with 40 each if you have 81 kids in a grade, you have three classes with 27 each
That sounds like something we can use as an instrument We can write the rule as f sc = [ int e ( s es 1 40 ) ] + 1 Ideally we could condition on grades with either 80 or 81 kids More generally there are two ways to do this condition on people close to the cutoff and use f sc as an instrument Control for class size in a smooth way and use f sc as an instrument
To estimate the model we want to use an econometric framework similar to Krueger Y ics = β 0 + β 1 C cs + β 2 X ics + α s + ε ics Now we can t just put in a school effect because we will loose too much variation so think of α s as part of the error term Their data is a bit different because it is by class rather than by individual-but for this that isn t a big deal Angrist and Lavy first estimate this model by OLS to show what we would get
Next, they want to worry about the fact that C cs is correlated with α s + ε ics They run instrumental variables using f sc as an instrument.