- Original article
- Open Access
- Published:

# Publicizing the results of standardized external tests: does it have an effect on school outcomes?

*IZA Journal of European Labor Studies*
**volume 4**, Article number: 7 (2015)

## Abstract

We study the effect of standardized external tests on students’ academic outcomes. We exploit the fact that only one of the 17 Spanish regions started doing and publishing the results of standardized tests in 2005 and apply a difference-in-difference methodology using outcomes of the PISA study from 2000 to 2009. We later confirm our results using synthetic control methods. Employing data from a single country allows us to minimize biases arising from differences in legal frameworks, social or cultural environments. Our econometric analysis lends plausibility to the hypothesis that this type of test significantly improves student outcomes. A key novelty is that our exams do not have academic consequences for the students, so effects have to come directly from the impact on teachers and administrators.

**JEL codes:** I20, I21

## Introduction

External standardized tests allow the administration to better monitor the education process and outcomes of the schools. In most of the countries that have these tests, the results of the exam are public and can be used by parents to make decisions.^{1} This closer monitoring by parents and administrators provides an additional motivation for teachers and principals to improve the education results of their students. This potential for improvement has encouraged an increasing number of countries to use external examinations as a tool to increase accountability. The Program for International Student Assessment (PISA) report (OECD 2010) documents the fact that 22 out of the 34 OECD countries have introduced standards-based external examinations in a majority of their schools. Two more countries, Germany and the US, have this type of tests only in some of their Ländern and states. All in all, two-thirds of the 15-years old OECD students attend schools in which there is an external and standardized test.

The existing empirical evidence is supportive of the hypothesis that countries with external exit-exam systems have a better performance in international student achievement tests. The first evidence for this was given by Bishop (1997) for students doing the 1991 IAEP math, science, and geography tests and Bishop (2006) with the PISA 2000 results. Overall, the existing cross-country evidence suggests that the effect of external exit exams on student achievement may well be half or more of a grade-level equivalent, or between 20 and 40 percent of a standard deviation of the respective international tests (OECD 2010 and 2012 and Hanushek and Woessmann 2011).

This evidence has been criticized on two grounds. First, these studies use cross-sectional data, which does not allow to account for unobserved heterogeneity that could be correlated with both the introduction of an external test and test outcomes. With panel data or with the synthetic control method that we use, it becomes possible to deal with the fact that adoption of testing by a country may be endogenous, and unobserved heterogeneity could bias the results. Second, the introduction of external tests may lead to “teaching to the test.” However, some studies have found the same positive association between central exams and student achievement within countries where some regions have external exam systems and others do not have them.^{2} This evidence rules out the possibility that unobserved national-level factors correlated with the existence of tests drive the observed positive correlation between those tests and students’ outcomes. In addition, students in countries with national external exams have been found to achieve better results in other international tests such as PISA, PIRLS or TIMMS. To the extent that those tests are different in nature from national ones, it may rule out whether “teaching to the test” is a main factor driving the better outcomes of students in countries or regions with national external exams.

A different critique of earlier studies is that they are not very clear on what are the channels through which exit exams are effective. This is because, for the most part, these exams have academic consequences for the students, thereby providing reasons for improvement both to the professionals and to the students. The present study uses a special feature of the Spanish education system to tease out school and student incentives, while at the same time controlling for biases arising from unobserved national-level heterogeneity and arguably also “teaching to the test.”

The special feature to which we refer is that the main Spanish education law (Ley Orgánica de la Educación, LOE 2006) allows regions to conduct education system assessments *as long as the results are not used for grading students or ranking schools* (article 140). That means Spanish exams are not “Curriculum-Based External Exit Examination (CBEEE)” as defined by Bishop (1997) because such examinations should “offer signals of student accomplishments that have real consequences for the student and define achievement relative to an external standard, not relative to other students in the classroom or the school.” This means that the effects of such exams in Spain, if any, have to come directly only from changes in incentives for schools, although in the end those can, and probably will, have an impact on the students’ efforts.

The region of Madrid introduced a standard external test called *“prueba de Conocimientos y Destrezas Indispensables”* (also known in short as the *CDI test*), which means “Indispensable Knowledge and Skills exam,” in the academic year 2004/05. The grade achieved by the student in this exam does not have “real academic consequences” for most students, so it cannot be considered a CBEEE.^{3} So the effects of this initiative will necessarily only go directly through changes in teacher motivation. The region of Madrid is also the only one that publishes and makes available to the public the average results of the external test of each of the schools. Other regions have external standardized exams where all schools are tested, but Madrid is the only one publishing the results.

All the regions in Spain operate under the same legal framework regulating the principles, objectives, and organization of the different school levels (pre-primary, primary, compulsory secondary, post-compulsory secondary) as well as up to 65% (55% in historical regions) of the content and subjects studied. Hence, the other main observable difference in education between Spanish regions is the appearance in the period of study of this standardized external exam in Madrid, whose results are published.

This feature allows us to conduct a difference-in-difference (diff-in-diff) analysis comparing the PISA results of the treated region (Madrid) before and after the CDI test was introduced with the rest of Spanish regions before and after the treatment. This diff-in-diff approach allows us to control for the unobservable time-invariant factors affecting Madrid. By working with regions of the same country, we also exclude some unobservable effects that appear in cross-country studies with different legislations and cultures.

The fact that we are dealing with a single country also allows us to apply the new inferential methods of synthetic control for comparative case studies proposed by Abadie and Gardeazabal (2003) and Abadie et al. (2010). We use a combination of other Spanish regions to construct a *synthetic control region*, which resembles similar education characteristics to Madrid before the introduction of the CDI test. The subsequent educational outcome evolution of this “counterfactual” Madrid without CDI is compared to the actual experience of Madrid. The idea behind the synthetic control approach is that a combination of units often provides a better comparison for the unit exposed to the intervention than any single unit alone. Transparency and safeguards against extrapolation are two attractive features of the synthetic control method relative to traditional regression methods.

Our results are also more protected than others from the critique that they are achieved by “teaching to the test.” This is because our measure of outcome, namely, the results in the PISA exam, have somewhat distinct objectives and measure different things than the CDI exam in whose effect we are interested. The Madrid CDI exam questions evaluate knowledge, and they are directly related to material seen in language and mathematics classes during the academic year. In contrast, the PISA exam questions (called stimulus) are more related to cognitive processes (access and retrieve; integrate and interpret; reflect and evaluate) and on how to use knowledge in particular contexts. That is, the PISA evaluation is more related to competencies, whereas the Madrid CDI is more related to knowledge.

The paper is organized as follows. Section 2 describes in some detail the institutional setup and the external and standard CDI test. Section 3 discusses the data. Section 4 discusses the econometric methodology, and it contains the main results of the paper. Section 5 shows the results of the synthetic control methods. Section 6 concludes.

## Institutional setup

The Madrid regional government has been conducting since the academic year 2004/05 a standardized external exam for all 6th grade students in the region, who are hence in the final year of primary school (around 11–12 years old). Three years later, the region introduced another standardized and external exam in the 9th grade (the third year of secondary school, which is the last common academic year for the students). These exams are compulsory for all primary and secondary schools (public or private). The exam measures what the authorities consider basic knowledge in mathematics (exercises and problems) and language (dictation, reading, general knowledge and questions related to a text).

Our aim is to test whether the introduction of these exams has improved the academic outcomes of the students in Madrid. We use as a measure of student achievement the scores of the exams conducted for the OECD Program for International Student Assessment (PISA). PISA analyses the key competencies in reading, mathematics and science of 15-year-old students in OECD member countries and partner countries/economies through its triennial surveys. The metric for the overall scale in one of the subjects is based on a mean for OECD countries set at 500, with a standard deviation of 100. PISA conducted its first tests in 2000, covering reading as a major assessment area and providing a summary profile of the skills of mathematics and science. In 2003, mathematics was the main focus, and in 2006 it was science. In 2009, PISA started another cycle, focusing on reading again and in 2012 focusing on mathematics. When an area is the main focus of the exam, two-thirds of the exam time is devoted to this area, allowing for its deeper analysis. Since both PISA 2000 and PISA 2009 focused on reading, and both PISA 2003 and PISA 2012 focused on mathematics, it is possible to obtain very detailed comparisons of how student performance in those areas changed over that period. Comparisons over time in the area of science are somewhat more limited.

In the PISA test, each participating student spends two hours carrying out pencil-and-paper tasks in reading, mathematics and science. The assessment includes tasks requiring students to construct their own answers as well as multiple-choice questions. In addition, students also answer a survey that takes about 30 minutes to complete and that includes questions about their personal background.

## Description of the data

The first CDI exam took place in the academic year 2004/05, so we consider this as the year in which the treatment (the introduction of a standardized exam) was first implemented. For this reason, we compare the results of students of the region of Madrid in (i) reading, using PISA 2000 and PISA 2009, and (ii) in mathematics, using PISA 2003 and PISA 2012.^{4}

Our first methodology for analysis will be a diff-in-diff regression approach. We construct the treatment and the control groups in the following way: the treatment group before the treatment (the introduction of the CDI exam) is the group of students from the region of Madrid who took the PISA exam in 2000 for reading or 2003 for mathematics, the treatment group after the treatment is the group of students who took the PISA exam in 2009 for reading or 2012 for mathematics, and the control group is formed by students from the other regions of Spain before (PISA 2000 or 2003) and after the treatment (PISA 2009 or 2012).

The PISA questionnaire allows us to control for various student, family and school characteristics. The student and family characteristics are: gender, age, nationality (immigrant or Spanish), parents’ nationality, languages other than Spanish spoken at home, structure of the family (single parent family, nuclear family, mixed family), learning time in hours per week in reading and mathematics (hours per week in Language or mathematics courses), the index of economic, and social and cultural status (ESCS index) calculated by OECD.^{5} The school characteristics are the type of school (public, charter or private), the location of the school (village, small town, town, city or large city), student/teacher ratio, school size, whether the school uses assessments to compare to district/national performance, whether the school uses assessments to make judgments about teacher’s effectiveness, the proportion of girls in the school, the school average of ESCS index, the percentage of immigrant students in school, and school average learning time in reading and mathematics.

The tables below contain the descriptive statistics of these four groups for the most relevant characteristics of students and schools. Table 1 describes the treatment and the control groups in PISA 2000 and PISA 2009 in reading, and Table 2 describes the two groups in PISA 2003 and PISA 2012 in mathematics.

The two tables show a very similar evolution of the characteristics of students and schools when we compare PISA 2000 and PISA 2009 in reading and when we compare PISA 2003 and PISA 2012 in mathematics.

If we compare the treatment group and the control group before and after the change in reading, we can see patterns that are very similar across both groups: the proportion of girls and students coming from single parent families decreases slightly, whereas age, immigrants, learning time, and the Index of Economic, Social and Cultural Status (ESCS) increases. This is consistent with the fact that Spain has experienced a large inflow of immigrants in the last decade and has converged towards the EU and OECD GDP per capita, a process that has, since 2009, reversed^{6} Nevertheless, the rise in the share of immigrants between 2000 and 2009 was higher in the region of Madrid (from 3% to 16%) than in the rest of regions (from 2% to 9%). In addition, the share of students speaking foreign languages other than Spanish increased in Madrid over the period (from 1% to 5%), whereas it remained constant in the control group (17% versus 16%).

If we look at the school characteristics, we observe a decrease in the number of private schools and an increase in the number of charter schools over the period 2000–2009. This could be due to the fact that some private schools have demanded and achieved from the public administration their transformation into charter schools, thus lowering the fees to be paid by the student’s families and avoiding losing enrolment. Nevertheless, the official data from the Statistical Office of the Spanish Ministry of Education, Culture and Sports shows that the rise in the students of charter schools has come from a reduction in the number of students in the public schools. This is in contrast with the PISA sample, which shows that the rise of the students in charter schools come from a reduction in the private schools. That is, the PISA coverage of private schools decreased from 2000 to 2009, whereas the coverage of public schools increased. This could be due to the fact that the sample of schools in cities or large cities in 2009 decreased, whereas those in towns and villages increased.

Student/teacher and school size ratio decreased in both groups. We also observe that the percentage of schools that declare that they carry out assessments used to compare the school to district/national performance or assessments used to make judgments about teacher effectiveness increased in both the control and the treatment group. In summary, the descriptive statistics show that the trends during the period 2000–2009 that we observe when we compare the treatment and control group are similar.

The control group and the treatment group also have similar patterns in the PISA 2003 and PISA 2012, the years we are using for the mathematics analysis. The only exceptions are the proportion of private and charter schools. In the case of private schools, the indicator of the region of Madrid increased, whereas the one of the control group slightly decreased. In the case of charter schools, their proportion decreased in both the control and the treatment group.

## Econometric methodology and results

In order to estimate the impact of the introduction of a standardized exam in the region of Madrid on students’ outcomes, we propose a diff-in-diff approach. We use as the outcome for student performance the PISA reported scores of students. These are calculated using imputation methods, denoting plausible values (OECD 2009). Thus, for a given year *t*, the score of student *i* in reading in school *j* is given by:

and the score of student *i* in mathematics in school *j* is given by:

where *x*
_{
ijt
} are observable characteristics of students and their families described above; *x*
_{
jt
} are observable characteristics of schools; *Madrid*
_{
j
} is a dummy variable for the schools located in the region of Madrid (i.e., it takes the value 1 for the treated group); *IPSA2009*
_{
t
} and *PISA2012*
_{
t
} are dummy variables for students who took the PISA reading exam in 2009 and the PISA mathematics exam in 2012, respectively (after the introduction of the standardized exam in the region of Madrid); *Madrid*
_{
j
}
*PISA2009*
_{
t
} and *Madrid*
_{
j
}
*PISA2012*
_{
t
} indicate whether school j is in the region of Madrid and participated in PISA exam 2009 and in PISA exam in 2012, respectively (i.e., it takes the value 1 for the treated group after the treatment); and *ε*
_{
ijt
} is a random shock.

Our parameter of interest is δ, corresponding to the variable *Madrid*
_{
j
}
*PISA2009*
_{
t
} or *Madrid*
_{
j
}
*PISA2012*
_{
t
}, which coincides with the introduction of a standardized exam (the CDI exam) in the region of Madrid.

Tables 3 and 4 below show the results of the diff-in-diff estimation for reading using the samples of students from Spain in PISA 2000 and PISA 2009, and for mathematics, using the samples of students from Spain in PISA 2003 and PISA 2012. Since the PISA database provides five plausible values, which are allocated to each student, we use the methodology proposed by the OECD for the computation of regression coefficients and their respective standard errors. According to OECD (2009), statistical analyses should be performed independently on each of these five plausible values, and results should be aggregated to obtain the final estimates of the statistics and their respective standard errors.

The first column of the tables shows the estimation results without any control variables. This would be the raw average effect of our treatment. The second column includes individual characteristics of the students, and the third and the forth columns gradually add school characteristics.

When we estimate the diff-in-diff without any covariates, the coefficient for the treatment is not statistically significant for both reading and mathematics.

However, results of the diff-in-diff estimation for reading in Table 3, columns (2)-(4) show a positive and statistically significant effect of our treatment on the PISA scores. In the second column, when we control for individual characteristics of students, the coefficient of the treatment variable is positive and significant^{7}. The inclusion of school characteristics in columns (3) and (4) does not change this result. We find a relative improvement in PISA scores in reading in the region of Madrid between 2000 and 2009 of a magnitude of 14 to 17 PISA points that cannot be explained by observable variables.^{8} In 2009, Spain was significantly below the OECD average in reading by 12 points. If the results are totally explained by the introduction and publication of external exams, this could imply that generalizing those exams would raise the level of Spain in reading above the OECD average.

In Table 4, we run the same estimations but for mathematics using the scores in PISA 2003 and PISA 2012. Here, we find a positive impact of our treatment on students’ performance; however, it is statistically significant only in the last specification.

The data seem to indicate that something differential has happened in Madrid between 2000 and 2009 with respect to other Spanish regions. A natural hypothesis in this context is that the introduction and publication of the results of the standardized exam played a major role in this change. It is very hard to provide definitive proof with these data, but we can discard some alternative explanations.

Public spending in education per pupil affects to some extent students’ outcomes (OECD 2010). The Spanish Ministry of Education provides data on public spending on education per pupil by regions starting from 2004. During the period 2004–2009, Madrid increased public education spending per pupil by 21%, less than the Spanish average of 33%. More importantly, Madrid has been the region increasing the least education spending per pupil among all the 17 Spanish regions. So, education expenditure cannot explain the better behavior of Madrid PISA scores.

Spain received a large amount of immigrants between 2000 and 2009 (according to data from the Spanish National Institute of Statistics (INE), immigrants went from 2% to 12% of the population over the period), and Madrid was a major place of destination (it has about 18% of the immigrants and about 13% of the population). But our data can identify whether the student is an immigrant, and the number of immigrants vary enough between schools that their effect is probably captured at the school level. This was also a period of rapid economic growth, which was not identical between regions, but the ESCS index has enough information about this variable at the individual level to properly control for the effect of economic data. Some other factors affect schools more directly. Madrid has a larger number of charter schools than other regions. Madrid also has increased the share of charter schools, but this trend has been similar to the rest of regions, if anything a little bit smaller. In any case, since the identity of the schools is observable, its effect can be controlled.

The only other important institutional reform in Madrid schools in this period, beyond the introduction and publication of external exams, is the introduction of *bilingual schools* in the region, where English is a medium of instruction for at least one third of the time^{9} Although this is clearly an important reform, it has only been implemented gradually starting from first grade, and the oldest students exposed to the program are now 13 years old. In addition, Anghel et al. (2012) have not found significant effects of the program in either language or mathematics, and possibly a negative effect on natural and social science (the subjects taught in English).

## Synthetic control method

In this section, we use the methodology proposed by Abadie and Gardeazabal (2003) and Abadie et al. (2010), which applies synthetic control methods to comparative case studies. Their methodology is motivated by the fact that in comparative case studies, the researcher is usually forced to find similarities between treated and non-treated units using observable characteristics, something that it is often difficult in practice. To solve this problem they propose constructing a combination of units for comparison purposes, since the combination will typically resemble the treated unit much better than any single unit alone.

In our case, we have to construct a combination of Spanish regions that resembles the region of Madrid in terms of various characteristics before the treatment, and we observe the evolution of this combination in the absence of treatment. This combination is called a synthetic control group. It is constructed by searching for a weighted combination of the untreated Spanish regions, in terms of various predictor variables, which are averaged over the entire pre-intervention period.

Abadie et al. (2010) argue that matching on pre-intervention outcomes helps to control for unobserved factors affecting the outcome of interest as well as for the heterogeneity of the effect of the observed and unobserved factors on the outcome of interest. According to Abadie et al. (2010), “once it has been established that the unit representing the case of interest and the synthetic control unit have similar behavior over extended periods of time prior to the intervention, a discrepancy in the outcome variable following the intervention is interpreted as produced by the intervention itself.”

In order to construct the synthetic control group (the synthetic Madrid), we have to aggregate the data at the school level and then at the regional level. The year the CDI standardized exam was launched in the region of Madrid was 2004/05; therefore, we have two years in PISA of pre-treatment data (PISA 2000 and 2003). PISA 2006 and PISA 2009 will be our post-treatment period. The synthetic Madrid is constructed as a weighted average of the pool of untreated regions. Our donor pool includes 15 regions^{10}. The weights are chosen so that the resulting synthetic Madrid resembles the real Madrid as closely as possible in terms of the values of a set of predictors of students’ performance before the introduction of the CDI exam, that is, before the treatment.

We include in the list of predictor variables for calculating the weights the following variables: student/teacher ratio, school size, ESCS school index, proportion of immigrants in the school and proportion of repeaters in the school. All variables are averaged at the regional level and over the pre-intervention period (2000 and 2003).

Using these predictor variables we construct the synthetic Madrid as the convex combination of regions which most closely resembles the region of Madrid in the pre-treatment period. This matching contributes to accounting for unobserved heterogeneity and the potential endogeneity of treatment. Table 5 shows the characteristics of the real Madrid region, of the synthetic Madrid region and of the donor pool (the average of the 16 regions which form the donor pool) in terms of the control variables. The figures prove that the constructed synthetic Madrid is much more similar to the real one, in both reading and mathematics, than the simple average of the regions that form the donor pool. In reading, the student/teacher ratio in the real Madrid is 16.18, and in the synthetic Madrid it is 15 (the average of the control group is 14.03). Average school size is 933.74 in the real Madrid and 852.74 in synthetic Madrid (the average of the donor pool is 720.08). The school average of the ESCS index is −0.16 in the real Madrid and −0.19 in the synthetic Madrid (the average of the control group is −0.31). It terms of the percentage of repeaters, we find that the synthetic region and the control group are both quite similar to the real Madrid. Finally, there is a substantial difference in the percentage of immigrant students between the real Madrid and the synthetic Madrid. Furthermore, we find affinities between the synthetic and the real Madrid in students’ PISA outcomes as well.

Table 6 displays the weights of the 15 regions from the donor pool in the synthetic Madrid. It shows that the students’ performance in the region of Madrid is best approximated by a combination of Aragón, Asturias and the Canary Islands for reading and Aragón and the Canary Islands for mathematics. The rest of the regions in the donor pool are assigned zero weights.

The two graphs in Figures 1 and 2 show the evolution of the real Madrid and the synthetic Madrid in 2000 and 2003 (the pre-intervention years) and in 2006 and 2009 (the post-intervention years), separately for reading and for mathematics (Figures 1 and 2).

For reading, the graph shows that the synthetic Madrid approximates very well the evolution of the real Madrid in the pre-treatment period. After the treatment, which we take to be the introduction of the CDI standardized exam in the region of Madrid until 2006, PISA scores decrease in both real Madrid and synthetic Madrid. After 2006, even if both real Madrid and synthetic Madrid experience an increasing trend, synthetic Madrid is doing worse that the real Madrid: in 2009, the difference in performance is 17.24 PISA points in favor of real Madrid. This difference could be attributed to the introduction of a standardized exam in the region of Madrid, at least with the information that we can observe. This confirms, even quantitatively, the results we obtained previously with the diff-in-diff methodology, where we found that controlling for school characteristics, that the region of Madrid improved its performance relative to other regions of Spain in the period between 2000 and 2009 by between 14 and 17 PISA points. The flagship education publication of the OECD, Education at a Glance, arrives at a similar conclusion in the latest 2012 edition, stating, “students in school systems that use standards-based external examinations score 16 points higher, on average across OECD countries, than students in school systems that do not use these examinations (Education at a Glance, 2012, page 527).” Our estimation is a slightly lower than the range found in the literature by Hanushek and Woessmann (2011) of 20% to 40% of the standard deviation (20 to 40 points in PISA).

For mathematics, however, the synthetic control group methodology does not work so well. The synthetic Madrid does not approximate very well the evolution of the real Madrid in 2000 and 2003, the pre-treatment period. In the post-treatment period, the synthetic Madrid performs slightly better than the real Madrid. Nevertheless, the diff-in-diff estimation showed no strong statistical impact in mathematics.

As a robustness check, we introduce other variables among the predictor variables, which have been specified above. In particular, we include the average percentage of girls in the PISA exam and the average proportion of public, charter and private schools. Our results do not change significantly.

We are aware of the limitations of our data in performing the estimation by using synthetic control methods. One of the main limitations is that since the PISA study started in 2000, and it is carried out each three years, we only have two years of pre-intervention data (2000 and 2003), which complicates the calculations of the region weights for the synthetic control group. The result in mathematics, where the synthetic Madrid is not so similar to the real Madrid in the year before the treatment can be partly explained by this fact.

## Conclusions

This paper attempts to identify whether the implementation and publication of the results of external and standardized tests could have any impact on the performance of students. We use the fact that in the region of Madrid, a standardized exam was first given (and its results published) in 2004/05 to all 6^{th} grade primary students, while in the other regions of Spain, no such exam existed. Using a diff-in-diff strategy, we find a positive effect in reading of the order of 14 to 17 PISA points. The synthetic control method yields an effect that is very close even in quantitative terms. Our results are in line with previous research in the area, but our study provides one important innovation, since the external exams in Madrid have no consequences for the students, the effect has to come from the impact on teachers and school principals.

We have identified a possible effect in language, but not in mathematics. This is slightly surprising since many educational programs have observed effects that are larger in mathematics than in language (see, e.g., Abdulkadiroglu et al. 2011). A possible explanation may come from the different emphasis of the curricula of primary school education in Spain with respect to other countries, but this question deserves a more thorough investigation, which we defer to further research.

## Endnotes

^{1}Even in countries where school zones comprise a single school, concerned parents can decide where to live using school quality as an input to their choice.

^{2}See, e.g., Bishop (1997) for Canadian provinces, Jürges et al. (2005) and Wößmann (2010) for Germany, and Bishop et al. (2001) for US states.

^{3}A student with good grades in compulsory secondary schooling and a good mark in the CDI test obtains a certification with Merit or with Distinction. As it is just a certificate, it has no implications for admissions to schools beyond the compulsory schooling or for grants, nor is there evidence that employers look at those distinctions. For students with really extraordinary grades (only 25 a year in a region with over 50,000 students in the last year of compulsory secondary schooling), they can obtain an Extraordinary Award yielding a cash prize of 1,000 Euros and a trip to a “cultural destination.”

^{4}We will not use the PISA scores in science, since the first year science was the main focus was 2006, and this is after our treatment was applied.

^{5}The PISA index of economic, social and cultural status (ESCS) was derived from the following three indices: highest occupational status of parents, highest educational level of parents in years of education according to ISCED and home possessions (OECD 2010).

^{6}Spain went from less than 1% of immigrants in the population to almost 10% during this period. The Spanish GDP per capita in PPS terms increased from 97% of that of the EU-27 in 2000 to 103% in 2009 (Source: Eurostat).

^{7}In an additional specification, we dropped the ESCS index, which is an aggregated index of the socioeconomic background of the students, and we controlled separately for the labor market situation and the level of education of the mother and the father. Our results did not change significantly. These results are available upon request.

^{8}In additional estimations not reported here, we estimated the same specifications for three schools that performed PISA in reading in both 2000 and 2009. We performed the diff-in-diff estimation for each of these three schools separately, and then we considered as the treatment group the group of these three schools. For two of the schools, we found a positive and significant effect of the treatment on PISA scores. Results of these estimations are available upon request.

^{9}Students not only study English as a foreign language, but also some subjects (at least science, history and geography) are taught in English. Spanish and mathematics are taught only in Spanish.

^{10}There are 17 regions (including the Madrid region) and two autonomous cities (Ceuta and Melilla) in Spain. We had to drop Baleares and Ceuta and Melilla because of missing data, so this leaves us with 15 regions.

## References

Abadie A, Gardeazabal J (2003) The economic costs of conflict: a case study of the Basque Country. Am Econ Rev 93(1):113–32

Abadie A, Diamond A, Hainmueller J (2010) Synthetic control methods for comparative case studies: estimating the effect of California’s tobacco control program. J Am Stat Assoc 105(490):493–505

Abdulkadiroglu A, Angrist J, Dynarski SM, Kane TJ, Pathak PA (2011) Accountability and flexibility in public schools: evidence from Boston’s charters and pilots. Q J Econ 126:699–748

Anghel B, Cabrales A, Carro JM (2012) “Evaluating a bilingual education program in Spain: the impact beyond foreign language learning”, CEPR Working Paper, No. 8995

Bishop JH (1997) The effect of national standards and curriculum-based examinations on achievement. Am Econ Rev 87(2):260–4

Bishop JH (2006) Drinking from the fountain of knowledge: student incentive to study and learn. In: Hanushek EA, Welch F (eds)

*Handbook of the economics of education*. North-Holland, AmsterdamBishop JH, Mane F, Bishop M, Moriarty J (2001) The role of End-of-course exams and minimum competency exams in standards-based reforms. Brookings Papers on Education Policy 4:267–345

Hanushek, E, Woessmann L (2011) The Economics of International Differences in Educational Achievement. In Hanushek E, Machin S, Woessmann L (Eds)

*Handbook of the Economics of Education*,*vol. 3*. North-Holland, Amsterdam, pp. 89–200Jürges H, Schneider K, Büchel F (2005) The effect of central exit examinations on student achievement: quasi-experimental evidence from TIMSS Germany. J Eur Econ Assoc 3(5):1134–55, 09

Ley Orgánica de Educación (2006), published in BOE no. 106

OECD (2009) PISA Data Analysis Manual: SPSS Second Edition. Available online at: http://browse.oecdbookshop.org/oecd/pdfs/free/9809031e.pdf

OECD (2010) PISA 2009 Results: What Makes a School Successful? Volume IV. Available online at: http://www.oecd.org/pisa/pisaproducts/48852721.pdf

OECD (2012): PISA 2012 Results: What Makes Schools Successful? Resources, Policies and Practices (Volume IV). Available online at: http://www.oecd.org/pisa/keyfindings/pisa-2012-results-volume-IV.pdf

Wößmann L (2010) Institutional determinants of school efficiency and equity: German States as a microcosm for OECD Countries. Jahrbücher für Nationalökonomie und Statistik 230(2):234–70

## Acknowledgements

Brindusa Anghel gratefully acknowledges the support from the Spanish Ministry of Science and Technology from grant ECO2012-31985. The authors would like to thank the anonymous referee.

Resposible editor: Sara de la Rica.

## Author information

## Additional information

### Competing interests

The IZA Journal of European Labour Studies is committed to the IZA Guiding Principles of Research Integrity. The authors declare that they have observed these principles.

### Authors’ contributions

All four authors participated in the design of the study. BA and IS have made substantial contributions to acquisition of data and performed econometric and statistical analysis. JS also performed econometric analysis and robustness checks. AC has been involved in drafting the manuscript and revising it critically for important intellectual content. All authors participated in the coordination of the study and helped to draft the final manuscript. All authors read and approved the final manuscript.

## Rights and permissions

**Open Access** This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## About this article

#### Received

#### Accepted

#### Published

#### DOI

### Keywords

- External and standardized tests
- PISA
- Difference-in-difference
- Synthetic control methods