Název: Delphi study on standardized systems to monitor student learning outcomes in Flanders: mechanisms for building trust and/or control?
Zdrojový dokument: Studia paedagogica. 2017, roč. 22, č. 2, s. [9]-31
Rozsah
[9]-31
-
ISSN1803-7437 (print)2336-4521 (online)
Trvalý odkaz (DOI): https://doi.org/10.5817/SP2017-2-2
Trvalý odkaz (handle): https://hdl.handle.net/11222.digilib/136518
Type: Článek
Jazyk
Licence: Neurčená licence
Upozornění: Tyto citace jsou generovány automaticky. Nemusí být zcela správně podle citačních pravidel.
Abstrakt(y)
Several countries have implemented monitoring systems where students need to take standardized tests at regular intervals. These tests may serve either a development-oriented goal that supports public trust in schools, or a more accountability-oriented perspective to increase control. Currently, the Flemish education system has no standardized testing. The idea of implementing a monitoring system is highly contentious. By means of a Delphi study with policy makers, education specialists, school governors, principals, teachers, and a student representative (n=24), we identified the characteristics of a monitoring system that would be accepted by different stakeholders. Based on these characteristics, we proposed eight scenarios for future policy development. Next, the desirability of these scenarios was assessed by each respondent. The results show that in order to gain broad social support, a focus on strengthening trust is preferred over a focus on control through such measures as avoiding the public availability of test results. In addition, other key results for the development and implementation of a system to monitor student learning outcomes are discussed.
Reference
[1] Adler, M., & Ziglio, E. (1996). Gazing into the oracle: The Delphi method and its application to social policy and public health. London: Jessica Kingsley Publishers.
[2] Allal, L. (2013). Teachers' professional judgement in assessment: A cognitive act and a socially situated practice. Assessment in Education: Principles, Policy & Practice, 20(1), 20–34.
[3] Amrein-Beardsley, A., Berliner, D. C., & Rideau, S. (2010). Cheating in the first, second, and third degree: Educators' responses to high-stakes testing. Educational Policy Analysis Archives, 18(14), 1–36. | DOI 10.14507/epaa.v18n14.2010
[4] Au, W. (2007). High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher, 36(5), 258–267. | DOI 10.3102/0013189X07306523
[5] Baird, J., Ahmed, A., Hopfenbeck, T., Brown, C., & Elliott, V. (2013). Research evidence relating to proposals for reform of the GCSE. Oxford, UK: Oxford University Centre for Educational Assessment.
[6] Beaver, J. K., & Weinbaum, E. H. (2015). State test data and school improvement efforts. Educational Policy, 29(3), 478–503. | DOI 10.1177/0895904813510774
[7] Bishop, J. (1998). The effect of curriculum-based external exit exams on student achievement. Journal of Economic Education, 29(2), 172–182. | DOI 10.1080/00220489809597951
[8] Black, P., Harrison, C., Hodgen, J., Marshall, B., & Serret, N. (2011). Can teachers' summative assessments produce dependable results and also enhance classroom learning? Assessment in Education: Principles, Policy & Practice, 18(4), 451–469. | DOI 10.1080/0969594X.2011.557020
[9] Brennan, R., Kim, J., Wenz-Gross, M., & Siperstein, G. (2001). The relative equitability of high-stakes testing versus teacher-assigned grades: An analysis of the Massachusetts Comprehensive Assessment System (MCAS). Harvard Educational Review, 71(2), 173–217. | DOI 10.17763/haer.71.2.v51n6503372t4578
[10] Burgess, S., Propper, C., & Wilson, D. (2002). Does performance monitoring work? A review of evidence from the UK public sector, excluding health care CMPO Working Paper Series.
[11] Carnoy, M., & Loeb, S. (2002). Does external accountability affect student outcomes? A cross-state analysis. Educational Evaluation and Policy Analysis, 24(4), 305–331. | DOI 10.3102/01623737024004305
[12] Chong, H., Adnan, H., & Zin, R. M. (2012). A feasible means of methodological advance from Delphi methods: A case study. International Journal of Academic Research, 4(2), 247–253.
[13] Cizek, G. J. (2005). High-stakes testing: Contexts, characteristics, critiques, and consequences. In R. P. Phelps (Ed.), Defending standardised testing (pp. 23-54). London: Lawrence Erlbaum Associates Publishers.
[14] Cobb, F., & Russell, N. M. (2014). Meritocracy or complexity: Problematizing racial disparities in mathematics assessment within the context of curricular structures, practices, and discourse. Journal of Education Policy, 30(5), 631–649. | DOI 10.1080/02680939.2014.983551
[15] Collins, S., Reiss, M., & Stobart, G. (2010). What happens when high-stakes testing stops? Teachers' perceptions of the impact of compulsory national testing in science of 11-year olds in England and its abolition in Wales. Assessment in Education: Principles, Policy & Practice, 17(3), 273–286. | DOI 10.1080/0969594X.2010.496205
[16] Dalkey, N. C., & Helmer, O. (1963). An experimental application of the Delphi method to the use of experts. Management Science, 9(3), 458–467. | DOI 10.1287/mnsc.9.3.458
[17] Day, J., & Bobeva, M. (2005). A generic toolkit for the successful management of Delphi studies. Electronic Journal of Business Research Methods, 3(2), 103–116.
[18] De Lange, M., & Dronkers, J. (2007). Hoe gelijkwaardig blijft het eindexamen tussen scholen? Discrepanties tussen de cijfers voor het schoolonderzoek en het centraal examen in het voortgezet onderwijs tussen 1998 en 2005. Nijmegen: Netherlands.
[19] Elstad, E. (2009). Schools which are named, shamed and blamed by the media: School accountability in Norway. Educational Assessment Evaluation and Accountability, 21(2), 173–189. | DOI 10.1007/s11092-009-9076-0
[20] Gipps, C., & Stobart, G. (2009). Fairness in assessment. In C. Wyatt-Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century: Connecting theory and practice (pp. 105–118). London: Springer.
[21] Goodman, D., & Hambleton, R. K. (2005). Some misconceptions about large-scale educational assessments. In R. P. Phelps (Ed.), Defending standardized testing (pp. 91–110). Mahaj, NJ/London: Lawrence Erlbaum Associates Publishers.
[22] Haertel, E. H. (1999). Validity arguments for high-stakes testing: In search of the evidence. Educational Measurement: Issues and Practice, 18(4), 5–9. | DOI 10.1111/j.1745-3992.1999.tb00276.x
[23] Haney, W. (2000). The myth of the Texas miracle in education. Education Analysis Policy Archives, 8(41), 1–20. | DOI 10.14507/epaa.v8n41.2000
[24] Horn, C. (2005). Standardised assessments and the flow of students into the college admission pool. Educational Policy, 19(2), 331–348. | DOI 10.1177/0895904804274057
[25] Hoxby, C. M. (2003). School choice and school competition: Evidence from the United States. Swedish Economic Policy Review, 10(1), 11–67.
[26] Hsu, C., & Sandford, B. A. (2007). The Delphi technique: Making sense of consensus. Practical Assessment, Research & Evaluation, 12(10), 1–8.
[27] Janssens, F. J. G., Rekers-Mombarg, L., & Lacor, E. (2014). Leerwinst en toegevoegde waarde in het primair onderwijs. Den Haag: Ministerie van Onderwijs, Cultuur en Wetenschap; Inspectie van het Onderwijs; Rijksniversiteit Groningen; CED Groep; Universiteit Twente; CITO.
[28] Jürges, H., Büchel, F., & Schneider, K. (2005). The effect of central exit examinations on student achievement: Quasi-experimental evidence from TIMSS Germany. Journal of the European Economic Association, 3(5), 1134–1155. | DOI 10.1162/1542476054729400
[29] Jürges, H., Richter, W. F., & Schneider, K. (2005). Teacher quality and incentives: Theoretical and empirical effects of standards on teacher quality. FinanzArchiv / Public Finance Analysis, 61(3), 298–326. | DOI 10.1628/001522105774978985
[30] Jürges, H., & Schneider, K. (2010). Central exit examinations increase performance... but take the fun out of mathematics. Journal of Population Economics, 23(2), 497–517. | DOI 10.1007/s00148-008-0234-3
[31] Jürges, H., Schneider, K., Senkbeil, M., & Carstensen, C. H. (2009). Assessment drives learning: The effect of central exit exams on curricular knowledge and mathematical literacy. Economics of Education Review, 31(1), 56–65. | DOI 10.1016/j.econedurev.2011.08.007
[32] Karsten, S., Visscher, A., & De Jong, T. (2001). Another side to the coin: The unintended effects of the publication of school performance data in England and France. Comparative Education, 37(2), 231–242. | DOI 10.1080/03050060120043439
[33] Keeves, J. P., Hungi, N., & Afrassa, T. (2005). Measuring value added effects across schools: Should schools be compared in performance? Studies in Educational Evaluation, 31(2-3), 247–266.
[34] Klein, E. D., & Van Ackeren, I. (2012). Challenges and problems for research in the field of statewide exams. A stock taking of differing procedures and standardization levels. Studies in Educational Evaluation, 37(4), 180–188. | DOI 10.1016/j.stueduc.2012.01.002
[35] Klein, S. P., Hamilton, L. S., McCaffrey, D. F., & Stecher, B. M. (2000). What do test scores in Texas tell us? Educational Policy Analysis Archives, 8(49), 1–22.
[36] Klenowski, V. (2014). Towards fairer assessment. Australian Education Research, 41(4), 445–470. | DOI 10.1007/s13384-013-0132-x
[37] Linstone, H. A., & Turoff, M. (1975). The Delphi method. Techniques and applications. London, Amsterdam, Ontario, Sydney, Tokyo: Addison-Wesley Publishing Company.
[38] Loeb, S. (2013). How can value-added measures be used for teacher improvement? In Carnegie Knowledge Network (Ed.), What we know series: Value-added methods and applications. Stanford, CA: Carnegie Knowledge Network.
[39] Marlow, R., Norwich, B., Ukoumunne, O. C., Hansford, L., Sharkey, S., & Ford, T. (2014). A comparison of teacher assessment (APP) with standardised tests in primary literacy and numeracy. Assessment in Education: Principles, Policy & Practice, 21(4), 412–426. | DOI 10.1080/0969594X.2014.936358
[40] Mietzner, D., & Reger, G. (2005). Advantages and disadvantages of scenario approaches for strategic foresight. International Journal of Technology Intelligence and Planning, 1(2), 220–239. | DOI 10.1504/IJTIP.2005.006516
[41] Neumann, M., Trautwein, U., & Nagy, G. (2011). Do central examinations lead to greater grading comparability? A study of frame-of-reference effects on the university entrance qualification in Germany. Studies in Educational Evaluation, 37(4), 206–217. | DOI 10.1016/j.stueduc.2012.02.002
[42] OECD. (2013). Synergies for better learning. An international perspective on evaluation and assessment. Paris: OECD Publishing.
[43] Okoli, C., & Pawlowski, S. D. (2004). The Delphi method as a research tool: An example, design considerations and applications. Information & Management, 42(1), 15–29. | DOI 10.1016/j.im.2003.11.002
[44] Onderwijsinspectie. (2013). Onderwijsspiegel 2013 [Education mirror 2013]. Brussel: Onderwijsinspectie / Vlaams Ministerie van Onderwijs en Vorming.
[45] Perryman, J., Ball, S., Maguire, M., & Braun, A. (2011). Life in the pressure cooker – school league tables and English and mathematics teachers' responses to accountability in a results-driven era. British Journal of Educational Studies, 59(2), 179–195. | DOI 10.1080/00071005.2011.578568
[46] Phelps, R. P. (Ed.) (2005). Defending standardized testing. London: Lawrence Erlbaum Associates Publishers.
[47] Popham, W. J. (2005). Can growth ever be beside the point? Educational Leadership, 63(3), 83–84.
[48] Ramsteck, C., Muslic, B., Graf, T., Maier, U., & Kuper, H. (2015). Data-based school improvement: The role of principals and school supervisory authorities within the context of low-stakes mandatory proficiency testing in four German states. International Journal of Educational Management, 29(6), 766–789.
[49] Ritchie, J., & Spencer, L. (1993). Qualitative data analysis for applied policy research. In A. Bryman & R. Burgess (Eds.), Analysing qualitative data (pp. 173–194). London: Routledge.
[50] Sahlberg, P. (2011). Finnish lessons: What can the world learn from educational change in Finland? New York: Teachers College Press.
[51] Saunders, L. (1999). A brief history of educational 'value added': How did we get to where we are? School Effectiveness and School Improvement, 10(2), 233–256.
[52] Saunders, L. (2000). Understanding schools' use of 'value added' data: The psychology and sociology of numbers. Research Papers in Education, 15(3), 241–258. | DOI 10.1080/02671520050128740
[53] Schildkamp, K., Rekers-Mombarg, L., & Harms, T. J. (2012). Student group differences in examination results and utilization for policy and school development. School Effectiveness and School Improvement, 23(2), 229–255.
[54] Schwartz, P. (1991). The art of the long view. London: Century Business.
[55] Segool, N. K., Carlson, J. S., Goforth, A. N., von der Embse, N., & Barterian, J. A. (2013). Heightened test anxiety among young children: Elementary school students' anxious responses to high-stakes testing. Psychology in the Schools, 50(5), 489–499. | DOI 10.1002/pits.21689
[56] Shepard, L. A. (1992). Will standardised test improve student learning? Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing.
[57] Shewbridge, C., Hulshof, M., Nusche, D., & Stoll, L. (2011). School evaluation in the Flemish community of Belgium. In OECD (Ed.), OECD reviews of evaluation and assessment in education. Paris: OECD Publishing.
[58] Sireci, S. G. (2005). The most frequently unasked questions about testing. In R. P. Phelps (Ed.), Defending standardized testing (pp. 111–121). Mahaj, NJ/London: Lawrence Erlbaum Associates Publishers.
[59] Smagorinsky, P., Lakly, A., & Johnson, T. S. (2002). Acquiescence, accommodation, and resistance in learning to teach within a prescribed curriculum. English Education, 34(3), 187–213.
[60] Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: SAGE.
[61] Tymms, P. (1997). Responses of headteachers to value-added and the impact of feedback. London: School Curriculum and Assessment Authority.
[62] Van Ackeren, I., Block, R., Klein, E. D., & Kühn, S. M. (2012). The impact of statewide exit exams: A descriptive case study of three German states with differing low stakes exam regimes. Education Policy Analysis Archives, 20(8), 1–32.
[63] Vanhoof, J., Van Petegem, P., Verhoeven, J., & Buvens, I. (2009). Linking the policymaking capacities of schools and the quality of school self-evaluations: The view of school leaders. Educational Management Administration & Leadership, 37(5), 667–686. | DOI 10.1177/1741143209339653
[64] Vanhoof, J., & Van Petegem, P. (2007). Matching internal and external evaluation in an era of accountability and school development: lessons from a Flemish perspective. Studies in Educational Evaluation, 33(2), 101–119. | DOI 10.1016/j.stueduc.2007.04.001
[65] Wang, L., Beckett, G. H., & Brown, L. (2006). Controversies of standardized assessment in school accountability reform: A critical synthesis of multidisciplinary research evidence. Applied Measurement in Education, 19(4), 305–328. | DOI 10.1207/s15324818ame1904_5
[66] Wikeley, F. (1998). Dissemination of research as a tool for school improvement? School Leadership & Management, 18(1), 59–73. | DOI 10.1080/13632439869772
[67] Wiliam, D. (2010). Standardised testing and school accountability. Educational Psychologist, 45(2), 107–122. | DOI 10.1080/00461521003703060
[68] Wössmann, L. (2005). The effect heterogeneity of central examinations: Evidence from TIMSS, TIMSS-Repeat and PISA. Education Economics, 13(2), 143–169. | DOI 10.1080/09645290500031165
[2] Allal, L. (2013). Teachers' professional judgement in assessment: A cognitive act and a socially situated practice. Assessment in Education: Principles, Policy & Practice, 20(1), 20–34.
[3] Amrein-Beardsley, A., Berliner, D. C., & Rideau, S. (2010). Cheating in the first, second, and third degree: Educators' responses to high-stakes testing. Educational Policy Analysis Archives, 18(14), 1–36. | DOI 10.14507/epaa.v18n14.2010
[4] Au, W. (2007). High-stakes testing and curricular control: A qualitative metasynthesis. Educational Researcher, 36(5), 258–267. | DOI 10.3102/0013189X07306523
[5] Baird, J., Ahmed, A., Hopfenbeck, T., Brown, C., & Elliott, V. (2013). Research evidence relating to proposals for reform of the GCSE. Oxford, UK: Oxford University Centre for Educational Assessment.
[6] Beaver, J. K., & Weinbaum, E. H. (2015). State test data and school improvement efforts. Educational Policy, 29(3), 478–503. | DOI 10.1177/0895904813510774
[7] Bishop, J. (1998). The effect of curriculum-based external exit exams on student achievement. Journal of Economic Education, 29(2), 172–182. | DOI 10.1080/00220489809597951
[8] Black, P., Harrison, C., Hodgen, J., Marshall, B., & Serret, N. (2011). Can teachers' summative assessments produce dependable results and also enhance classroom learning? Assessment in Education: Principles, Policy & Practice, 18(4), 451–469. | DOI 10.1080/0969594X.2011.557020
[9] Brennan, R., Kim, J., Wenz-Gross, M., & Siperstein, G. (2001). The relative equitability of high-stakes testing versus teacher-assigned grades: An analysis of the Massachusetts Comprehensive Assessment System (MCAS). Harvard Educational Review, 71(2), 173–217. | DOI 10.17763/haer.71.2.v51n6503372t4578
[10] Burgess, S., Propper, C., & Wilson, D. (2002). Does performance monitoring work? A review of evidence from the UK public sector, excluding health care CMPO Working Paper Series.
[11] Carnoy, M., & Loeb, S. (2002). Does external accountability affect student outcomes? A cross-state analysis. Educational Evaluation and Policy Analysis, 24(4), 305–331. | DOI 10.3102/01623737024004305
[12] Chong, H., Adnan, H., & Zin, R. M. (2012). A feasible means of methodological advance from Delphi methods: A case study. International Journal of Academic Research, 4(2), 247–253.
[13] Cizek, G. J. (2005). High-stakes testing: Contexts, characteristics, critiques, and consequences. In R. P. Phelps (Ed.), Defending standardised testing (pp. 23-54). London: Lawrence Erlbaum Associates Publishers.
[14] Cobb, F., & Russell, N. M. (2014). Meritocracy or complexity: Problematizing racial disparities in mathematics assessment within the context of curricular structures, practices, and discourse. Journal of Education Policy, 30(5), 631–649. | DOI 10.1080/02680939.2014.983551
[15] Collins, S., Reiss, M., & Stobart, G. (2010). What happens when high-stakes testing stops? Teachers' perceptions of the impact of compulsory national testing in science of 11-year olds in England and its abolition in Wales. Assessment in Education: Principles, Policy & Practice, 17(3), 273–286. | DOI 10.1080/0969594X.2010.496205
[16] Dalkey, N. C., & Helmer, O. (1963). An experimental application of the Delphi method to the use of experts. Management Science, 9(3), 458–467. | DOI 10.1287/mnsc.9.3.458
[17] Day, J., & Bobeva, M. (2005). A generic toolkit for the successful management of Delphi studies. Electronic Journal of Business Research Methods, 3(2), 103–116.
[18] De Lange, M., & Dronkers, J. (2007). Hoe gelijkwaardig blijft het eindexamen tussen scholen? Discrepanties tussen de cijfers voor het schoolonderzoek en het centraal examen in het voortgezet onderwijs tussen 1998 en 2005. Nijmegen: Netherlands.
[19] Elstad, E. (2009). Schools which are named, shamed and blamed by the media: School accountability in Norway. Educational Assessment Evaluation and Accountability, 21(2), 173–189. | DOI 10.1007/s11092-009-9076-0
[20] Gipps, C., & Stobart, G. (2009). Fairness in assessment. In C. Wyatt-Smith & J. J. Cumming (Eds.), Educational assessment in the 21st century: Connecting theory and practice (pp. 105–118). London: Springer.
[21] Goodman, D., & Hambleton, R. K. (2005). Some misconceptions about large-scale educational assessments. In R. P. Phelps (Ed.), Defending standardized testing (pp. 91–110). Mahaj, NJ/London: Lawrence Erlbaum Associates Publishers.
[22] Haertel, E. H. (1999). Validity arguments for high-stakes testing: In search of the evidence. Educational Measurement: Issues and Practice, 18(4), 5–9. | DOI 10.1111/j.1745-3992.1999.tb00276.x
[23] Haney, W. (2000). The myth of the Texas miracle in education. Education Analysis Policy Archives, 8(41), 1–20. | DOI 10.14507/epaa.v8n41.2000
[24] Horn, C. (2005). Standardised assessments and the flow of students into the college admission pool. Educational Policy, 19(2), 331–348. | DOI 10.1177/0895904804274057
[25] Hoxby, C. M. (2003). School choice and school competition: Evidence from the United States. Swedish Economic Policy Review, 10(1), 11–67.
[26] Hsu, C., & Sandford, B. A. (2007). The Delphi technique: Making sense of consensus. Practical Assessment, Research & Evaluation, 12(10), 1–8.
[27] Janssens, F. J. G., Rekers-Mombarg, L., & Lacor, E. (2014). Leerwinst en toegevoegde waarde in het primair onderwijs. Den Haag: Ministerie van Onderwijs, Cultuur en Wetenschap; Inspectie van het Onderwijs; Rijksniversiteit Groningen; CED Groep; Universiteit Twente; CITO.
[28] Jürges, H., Büchel, F., & Schneider, K. (2005). The effect of central exit examinations on student achievement: Quasi-experimental evidence from TIMSS Germany. Journal of the European Economic Association, 3(5), 1134–1155. | DOI 10.1162/1542476054729400
[29] Jürges, H., Richter, W. F., & Schneider, K. (2005). Teacher quality and incentives: Theoretical and empirical effects of standards on teacher quality. FinanzArchiv / Public Finance Analysis, 61(3), 298–326. | DOI 10.1628/001522105774978985
[30] Jürges, H., & Schneider, K. (2010). Central exit examinations increase performance... but take the fun out of mathematics. Journal of Population Economics, 23(2), 497–517. | DOI 10.1007/s00148-008-0234-3
[31] Jürges, H., Schneider, K., Senkbeil, M., & Carstensen, C. H. (2009). Assessment drives learning: The effect of central exit exams on curricular knowledge and mathematical literacy. Economics of Education Review, 31(1), 56–65. | DOI 10.1016/j.econedurev.2011.08.007
[32] Karsten, S., Visscher, A., & De Jong, T. (2001). Another side to the coin: The unintended effects of the publication of school performance data in England and France. Comparative Education, 37(2), 231–242. | DOI 10.1080/03050060120043439
[33] Keeves, J. P., Hungi, N., & Afrassa, T. (2005). Measuring value added effects across schools: Should schools be compared in performance? Studies in Educational Evaluation, 31(2-3), 247–266.
[34] Klein, E. D., & Van Ackeren, I. (2012). Challenges and problems for research in the field of statewide exams. A stock taking of differing procedures and standardization levels. Studies in Educational Evaluation, 37(4), 180–188. | DOI 10.1016/j.stueduc.2012.01.002
[35] Klein, S. P., Hamilton, L. S., McCaffrey, D. F., & Stecher, B. M. (2000). What do test scores in Texas tell us? Educational Policy Analysis Archives, 8(49), 1–22.
[36] Klenowski, V. (2014). Towards fairer assessment. Australian Education Research, 41(4), 445–470. | DOI 10.1007/s13384-013-0132-x
[37] Linstone, H. A., & Turoff, M. (1975). The Delphi method. Techniques and applications. London, Amsterdam, Ontario, Sydney, Tokyo: Addison-Wesley Publishing Company.
[38] Loeb, S. (2013). How can value-added measures be used for teacher improvement? In Carnegie Knowledge Network (Ed.), What we know series: Value-added methods and applications. Stanford, CA: Carnegie Knowledge Network.
[39] Marlow, R., Norwich, B., Ukoumunne, O. C., Hansford, L., Sharkey, S., & Ford, T. (2014). A comparison of teacher assessment (APP) with standardised tests in primary literacy and numeracy. Assessment in Education: Principles, Policy & Practice, 21(4), 412–426. | DOI 10.1080/0969594X.2014.936358
[40] Mietzner, D., & Reger, G. (2005). Advantages and disadvantages of scenario approaches for strategic foresight. International Journal of Technology Intelligence and Planning, 1(2), 220–239. | DOI 10.1504/IJTIP.2005.006516
[41] Neumann, M., Trautwein, U., & Nagy, G. (2011). Do central examinations lead to greater grading comparability? A study of frame-of-reference effects on the university entrance qualification in Germany. Studies in Educational Evaluation, 37(4), 206–217. | DOI 10.1016/j.stueduc.2012.02.002
[42] OECD. (2013). Synergies for better learning. An international perspective on evaluation and assessment. Paris: OECD Publishing.
[43] Okoli, C., & Pawlowski, S. D. (2004). The Delphi method as a research tool: An example, design considerations and applications. Information & Management, 42(1), 15–29. | DOI 10.1016/j.im.2003.11.002
[44] Onderwijsinspectie. (2013). Onderwijsspiegel 2013 [Education mirror 2013]. Brussel: Onderwijsinspectie / Vlaams Ministerie van Onderwijs en Vorming.
[45] Perryman, J., Ball, S., Maguire, M., & Braun, A. (2011). Life in the pressure cooker – school league tables and English and mathematics teachers' responses to accountability in a results-driven era. British Journal of Educational Studies, 59(2), 179–195. | DOI 10.1080/00071005.2011.578568
[46] Phelps, R. P. (Ed.) (2005). Defending standardized testing. London: Lawrence Erlbaum Associates Publishers.
[47] Popham, W. J. (2005). Can growth ever be beside the point? Educational Leadership, 63(3), 83–84.
[48] Ramsteck, C., Muslic, B., Graf, T., Maier, U., & Kuper, H. (2015). Data-based school improvement: The role of principals and school supervisory authorities within the context of low-stakes mandatory proficiency testing in four German states. International Journal of Educational Management, 29(6), 766–789.
[49] Ritchie, J., & Spencer, L. (1993). Qualitative data analysis for applied policy research. In A. Bryman & R. Burgess (Eds.), Analysing qualitative data (pp. 173–194). London: Routledge.
[50] Sahlberg, P. (2011). Finnish lessons: What can the world learn from educational change in Finland? New York: Teachers College Press.
[51] Saunders, L. (1999). A brief history of educational 'value added': How did we get to where we are? School Effectiveness and School Improvement, 10(2), 233–256.
[52] Saunders, L. (2000). Understanding schools' use of 'value added' data: The psychology and sociology of numbers. Research Papers in Education, 15(3), 241–258. | DOI 10.1080/02671520050128740
[53] Schildkamp, K., Rekers-Mombarg, L., & Harms, T. J. (2012). Student group differences in examination results and utilization for policy and school development. School Effectiveness and School Improvement, 23(2), 229–255.
[54] Schwartz, P. (1991). The art of the long view. London: Century Business.
[55] Segool, N. K., Carlson, J. S., Goforth, A. N., von der Embse, N., & Barterian, J. A. (2013). Heightened test anxiety among young children: Elementary school students' anxious responses to high-stakes testing. Psychology in the Schools, 50(5), 489–499. | DOI 10.1002/pits.21689
[56] Shepard, L. A. (1992). Will standardised test improve student learning? Los Angeles, CA: National Center for Research on Evaluation, Standards, and Student Testing.
[57] Shewbridge, C., Hulshof, M., Nusche, D., & Stoll, L. (2011). School evaluation in the Flemish community of Belgium. In OECD (Ed.), OECD reviews of evaluation and assessment in education. Paris: OECD Publishing.
[58] Sireci, S. G. (2005). The most frequently unasked questions about testing. In R. P. Phelps (Ed.), Defending standardized testing (pp. 111–121). Mahaj, NJ/London: Lawrence Erlbaum Associates Publishers.
[59] Smagorinsky, P., Lakly, A., & Johnson, T. S. (2002). Acquiescence, accommodation, and resistance in learning to teach within a prescribed curriculum. English Education, 34(3), 187–213.
[60] Strauss, A., & Corbin, J. (1990). Basics of qualitative research: Grounded theory procedures and techniques. Newbury Park, CA: SAGE.
[61] Tymms, P. (1997). Responses of headteachers to value-added and the impact of feedback. London: School Curriculum and Assessment Authority.
[62] Van Ackeren, I., Block, R., Klein, E. D., & Kühn, S. M. (2012). The impact of statewide exit exams: A descriptive case study of three German states with differing low stakes exam regimes. Education Policy Analysis Archives, 20(8), 1–32.
[63] Vanhoof, J., Van Petegem, P., Verhoeven, J., & Buvens, I. (2009). Linking the policymaking capacities of schools and the quality of school self-evaluations: The view of school leaders. Educational Management Administration & Leadership, 37(5), 667–686. | DOI 10.1177/1741143209339653
[64] Vanhoof, J., & Van Petegem, P. (2007). Matching internal and external evaluation in an era of accountability and school development: lessons from a Flemish perspective. Studies in Educational Evaluation, 33(2), 101–119. | DOI 10.1016/j.stueduc.2007.04.001
[65] Wang, L., Beckett, G. H., & Brown, L. (2006). Controversies of standardized assessment in school accountability reform: A critical synthesis of multidisciplinary research evidence. Applied Measurement in Education, 19(4), 305–328. | DOI 10.1207/s15324818ame1904_5
[66] Wikeley, F. (1998). Dissemination of research as a tool for school improvement? School Leadership & Management, 18(1), 59–73. | DOI 10.1080/13632439869772
[67] Wiliam, D. (2010). Standardised testing and school accountability. Educational Psychologist, 45(2), 107–122. | DOI 10.1080/00461521003703060
[68] Wössmann, L. (2005). The effect heterogeneity of central examinations: Evidence from TIMSS, TIMSS-Repeat and PISA. Education Economics, 13(2), 143–169. | DOI 10.1080/09645290500031165