by Nicholas Meier
There are a variety of important issues in regards to educational research these days. One hot topic right now is that our current federal administration has restricted the definition of exceptable research to only one type of research design. This design is known as the experimental design. Qualitative research, which allows us to look at what actually goes on in classrooms and schools, and with children, as well as at the process of how education is working, is not deemed acceptable. Neither are other designs of quantitative research, which might examine a particular school or setting or situation, without having a matched control group. This decision to only allow this type of research does not come from any consensus within the scientific or educational research community as to what counts as research(1). It is a political decision by the current federal administration. This policy has important implications for our schools. One implication is that it highly influences what research gets done. It does so directly by the fact the government sponsors research. Federal dollars will only sponsor research that fits the administration’s definition. It affects schools secondarily by what research they cite and use for their policy decisions. Researchers who want their research to influence these policies are likely to adhere to those protocols. Third, outside researchers and universities may decide to only fund research that follows that research paradigm, again restricting what research gets done. It also affects in some cases what practices schools may use, as the federal government insists that it only fund “scientifically proven” methods. Federal monies for curriculum and instruction are therefore funneled to areas that are supported by this one particular type of research.
I raise the above issue of what counts as research to make a point about educational research in general. This point is about the limitations of much of the research that is done and has been done in education even before the current policies. The above policies will only exacerbate the one’s I will address below.
Two difficulties that I will address here in regards to interpreting educational research are, one: what was used to measure the effects; and two: over what period were the effects measured.
Most educational research uses standardized tests to measure the success or failure of a particular program, or method or other variable of interest(2). However, the validity and reliability of these tests as actual measures of what they purport to measure is highly controversial(3). I will use the example of reading. I recently went to a talk about the research on learning to read. The presenter argued that the research showed that phonemic awareness was required to learn to read. However, the research cited actually showed that the explicit teaching of phonemic awareness and phonics helped students to score higher on tests of phonemes and phonics! This has been part of the trouble with the debate between whole language, and phonics and “phonemic awareness” advocates. Whole language theorists tend to use measures such as comprehension, reading for pleasure, and quantity of reading as their measures of success. Phonics and phonemic awareness advocates tend to use standardized tests that focus on phonics and phonemic awareness skills as their measures of success. How they define “reading” and how they measure reading end up predicting the outcomes they are looking for! According to Elaine Garan(4), a member of the National Reading Panel, the panel made this error in its recommendations—in limiting its analysis to studies using the experimental design, and focusing on experiments that looked at reading sub-skills, it biased its own conclusions.
Similar scenarios occur across many areas of educational research. It is not that research can say anything, but that one must be careful to examine how the researcher defined and measured success of the variable they claim to be examining. The reader and user of the research must be able to decide if they agree with the researcher’s definition, and whether the tool used to measure it is valid according to that definition.
The second problem is with the short-term aspect of most educational research. Most research is done over a one school year or shorter duration. There is an assumption that if gains are shown, they will persist over time. However, much of what we know from experience and other research contradicts that assumption. I refer us here to the “Three Little Pigs” analogy. Let us say we decide to study what materials are best for building houses. We have three identical pigs, all building houses. We notice one is building his house from straw, another from sticks, and a third from bricks. First, what is our measure of success? It is going to be how far has each pig gotten in building his house. After day one, we look to see how far each has gotten, and we notice the pig who is building his house from straw is already done. The one using sticks has his walls mostly up. The pig building with bricks is just getting his foundation done. We conclude straws much be the best material for house building, and mandate straw—based on research!
As most of us are aware, we often forget what we learned in a class or course soon after the class is over, or even within the class, right after the test! Short terms gains often do not correlate to long-term gains. Sometimes it is just do to lack of use—the knowledge or skills learned are not used again, and therefore we don’t remember them. Sometimes it may be that a strong foundation was not built, and so, like the straw house, our understanding collapses easily when it needs to support more complex use or understanding. Researchers Wayne Thomas and Virginia Collier(5) have shown evidence of this particularly in language learning, where English-only methods show slight gains in early language learning for English language learners, but students in bilingual classes overtake them in later years, due to, according to language theory, a stronger foundation in their primary language. Research on developmental versus skills based approaches to early childhood education have shown similar patterns. Early academic advantages for skills based approaches are lost over the years to longer-term advantages for the developmental approaches(6).
Short term designs and standardized test measurement is the lamppost. It is very difficult to carry out long-term research. It is expensive, so funding is difficult. The researcher must commit to the long haul. They may need a team who can also commit this time. The “subjects” are hard to keep track of as years go by. And the variables get more complex as time passes. At the end of a school term, or of our test of the method, we can be fairly sure that the large majority of our subjects will be right there in the same place for us to administer our tests.
Standardized tests are given to virtually all students, can easily be compared across students, classes, schools, even districts or possibly states. Even if the standardized tests are particular to the study, they tend to be quicker, easier and less expensive to administer than other measures. They are also easier to run statistical analyses on.
However, what good does it do for me to know that “such and such” a reading series or teaching method led to higher test scores for these second graders, if there is no evidence that these higher test scores actually lead to an adult who reads, understands what they reads, and knows how to use what they read to better their life and their society?
As they say “Garbage in, garbage out.” All of the advantages of time and money and statistical reliability do not matter if they will not really answer the questions we want answers to. If what I want to know is: will what am studying lead to a better educated citizen?, then I better make sure that the tools I use to measure that really do measure it.
Now I come back to my original discussion of what counts are research by the government. The federal government defines research only as the experimental design. This design lends itself well to short-term research using quantifiable scores, such as those of standardized tests. The second issue—what counts as evidence—is also more restricted. It is especially difficult to get long term research to fit the experimental design, as following exactly matched groups over years becomes more and more difficult as time passes. Many questions cannot be studied using matched samples, as in many instances it would be unethical to randomly assign students to different groups. Should we randomly retain some students and not others to see the effects of this policy? In other cases, it is impossible. For instance we cannot clone a school or district and recreate the exact same situation if we want to understand policy or curriculum decisions made on that scale. What makes for an educated and successful citizen is not always easily quantifiable, and definitions vary. Therefore, the narrow type of research the government allows also restricts what types of questions even get asked by the research.
It is my contention that although the experimental design in research is commendable and valuable where practical, it can never be the only model of research to answer the complex questions about human learning and behavior. To answer such questions we must use the broader definition of research that virtually all scientific disciplines understand.
If we want to answer important questions in education we are going to have to find a way to fund long term research, and use more complex measures of success that are more closely aligned with the actual skills and knowledge that successful members of society need and use.
1. Debra Viadero, "AERA Stresses Value of Alternatives to 'Gold Standard'," Education Week, April 18 2007.
2. Deborah W Meier, "Needed: Thoughtful Research for Thoughtful Schools," in Issues in Education Research, ed. Ellen Condliffe Lagemann and Lee Shulman (San Francisco: Jossey-Bass, 1999).
3. Alfie Kohn, The Case against Standardized Testing: Raising the Scores, Ruining the Schools (Portsmouth, NH: Heinemann, 2000), Deborah W Meier, In Schools We Trust: Creating Communities of Learning in an Era of Testing and Standardization (Boston: Beacon Press, 2002), Susan Ohanian, One Size Fits Few: The Folly of Educational Standards (Portsmouth, NH: Heinemann, 1999).
4. Elaine M. Garan, "What Does the Report of the National Reading Panel Really Tell Us About Teaching Phonics," Language Arts 79, no. 1 (2001).
5. Wayne Thomas and Virginia Collier, "School Effectiveness for Language Minority Students," (Washington, DC: National Clearinghouse for Bilingual Education, 1997), Wayne Thomas and Virginia Collier, "A National Study of School Effectiveness for Language Minority Students' Long-Term Academic Achievement: Executive Summary," (Washington, DC: Center for Research on Education, Diversity & Excellence, 2002).
6. Rebecca A. Marcon, "Moving up the Grades: Relationship between Preschool Model and Later School Success," Early Childhood Research & Practice 4, no. 1 (2002), Jeanne E. Montie, Zongping Xiang, and Lawrence J. Schweinhart, "Preschool Experience in 10 Countries: Cognitive and Language Performance at Age 7." Early Childhood Research Quarterly 21, no. 3 (2006): 313-31.
© 2007 Nicholas Meier / nsmeier @ sbcglobal.net