NAEP and the Future of Science Education

picture of Harry KellerBy Harry Keller
Editor, Science Education

Recently, Nora Fleming, in “NAEP Reveals Shallow Grasp of Science” (Education Week, 19 June 2012) reported on a new study regarding the 2009 NAEP (National Assessment of Educational Progress) exam for grades four, eight, and twelve. Around 2,000 students at each grade level were given three “Interactive Computer Tasks” (ICT) and one “Hands-On Task” (HOT) to perform. They were asked questions about each task.

We all know that conclusions such as “shallow grasp of science” are fraught with difficulties, especially when the results tell more about the test than the students. In this case, there are more issues. What exactly does “shallow grasp” mean in this situation? What sort of understanding is required to achieve a high result?

You can check for yourself at a website that allows you to take these tests yourself and score your answers, if you have patience with the tests. I went through the grade 8 and grade 12 materials (my work with science education focuses on grades 6-13) and did not approve of most of the content. Because it’s quite difficult to separate student ability from test quality, the results are suspect.

Because generalities won’t suffice, I’ll provide one example that shows how test results can mislead regarding understanding. This example shows how an eighth-grade student responds to the ICT, “Playground Soil.” According to the instructions,

In this 20-minute task, students investigate the permeability of soil samples from two sites a town is considering for a play area. Students use their results to help decide which site has the better water drainage and is therefore the better place for a grassy play area.

Students analyze two samples: soil sample A that is 10% clay, 50% fine gravel, and 40% silt; and soil sample B that is 10% clay, 50% sand, and 40% silt. Note that the only difference is the middle component. Diagrams illustrate the soil composition.

Students are asked to predict which soil will be more permeable. Generally speaking, the one with more voids (more space between soil grains) will be more permeable. Clearly, fine gravel has larger particles than sand. Now, do a thought experiment with me. Suppose that you increase the size of the fine gravel particles substantially. You might even have a rock in the midst of a clay-silt mixture. The permeability of this rock plus clay-silt will closely match that of clay-silt alone. Now, imagine reducing the size of the rock while keeping the volume equal to one-half of the total. Once the particle size reaches that of clay (60% clay, 40% silt), you’ll have very little permeability because mixtures that are mostly clay don’t allow much water to pass through them.

Fine gravel and sand are somewhere in between rocks and clay in size. Permeability as a function of that size begins rather flat at the size of the rock and increases before decreasing as the size becomes close to silt and clay. Students cannot possibly be expected to know where the maximum of that function lies. However, they’re expected to choose one and explain their choice intelligently.

Next, students are asked to describe what data must be collected to calculate permeability. With the definition of permeability readily available with a mouse click, that shouldn’t be too difficult.

The permeability of a soil is a measure of the rate at which water flows through the soil’s pores (empty spaces). In this task, permeability is measured as milliliters of water per minute.

Students are expected to name volume and time as data to collect and dividing volume by time to do the calculation. Only 7% did so, and 64% were at least able to name volume as data to collect. Because every experiment in the simulation lasts 30 minutes, students may have decided not to include the time because it did not vary.

The experiment itself is a cartoon-like animation that leaves much to be desired, but that’s a topic for a different article.

It should have come as no surprise that students did not get the answer.

The remainder of the exercise involves making the permeability calculations and describing the reasons for a final choice of playground location based on the best permeability. Student scores on these simple matters were depressingly low.

Here is a sample of a complete student response, including grammatical errors, to the last question about choice:

The soil in site A is more permeable, meaning the water goes quicker through the soil resulting in faster drainage. The permeability in site A is 2.7 mL/min more than Site B, therefore it is better for the playground.

Stating that site A is 2.7 mL/min more misses the point. The values were 3.0 mL/min and 0.3 mL/min. The factor of ten should have been noted, not the difference in rate. The same difference would have been seen if the rates were 102.7 mL/min and 100.0 mL/min.

The rest of the ICT tests are similarly flawed but not necessarily in the same way. As a whole, they present material in a manner that may be confusing. They leave out measuring any real depth of understanding. They contain flaws in the grading rubric, especially in requiring students to include details in their answers that aren’t very interesting. For example, the bottling honey test had the answer marked off if the explanation did not include the actual time of 0.2 seconds instead of just saying that the two equal times were the shortest. The question in no way implies that you must write down the actual time.

Such a grading rubric may downgrade students with better understanding of the material but not with an extremely detail-oriented mental attitude. Remember that this is eighth grade.

The hands-on portion for eighth grade was not very exciting. Students were given four bars of roughly equivalent size and were asked to figure out which were a strong magnet, a weak magnet, a piece of steel, and a piece of copper. They were also given some steel washers and asked to determine which magnet was stronger and which was weaker. They also had a ruler and some grid paper. This is really dull stuff and will project an image of science as being dull.

This lab did not really involve anything scientific. It did not investigate magnetism but used prior knowledge of magnetism to classify some materials and to make some crude measurements of two magnets. The students either knew what to expect beforehand or they didn’t. Even so, only 4% scored “complete” for their answers. I suspect that once again the rubric focused too much on details and not enough on understanding. Incidentally, a student can “do” this lab in a couple of minutes without actually following all of the instructions. Once you have classified the four bars, you can write nearly all of the responses to all parts.

What does all of this information mean?

We have some serious problems. Even the shallow depth required in these tests were not attainable by students. More serious is that the tests did not truly measure scientific thinking. They counted detail too much and real thought too little. If this is what passes for science learning, we’re in trouble. Fortunately, this one test does not show us the entire picture. I’ve been in contact with many science teachers who understand the difference between detail and thought, who focus on thinking rather than memorizing, and who don’t simply teach to the state-mandated tests.

The Obama STEM Master Teacher Corps, if it could pass Congress, might make a difference. It’s a great thought but will be too slow to develop and not large enough to make as much difference as we must have. I like that it elevates the status of teaching, but you cannot just scatter some nice rewards to some teachers. More must happen.

I see just two general strategies for improvement. These two approaches are not mutually exclusive, and both should be attempted at once. The first involves building up our teaching corps. The second gives our teachers much better tools.

In the first, we spend our way into better K-12 science education. That approach calls for more education spending by localities, not by the federal government. This spending must do what the STEM Master Teacher Corps intends and raise teacher pay so much that teaching becomes an honored profession once again. That approach requires increases in taxes, which is unlikely in the current political climate.

The second approach costs much less but will not substitute for excellence in teaching. We can build tools for learning based on new Internet technologies that improve learning, take less time for students, and reduce overall costs. This is the continuing promise of technology. Unfortunately, technology in education has had some really bad press due to adoption of inferior products in the rush to “21st century” learning or to “digital natives” or other such catch phrases.

These tools, if done really well, will be able to turn ordinary teachers into good teachers, turn good teachers into great teachers, and provide some relief to the great teachers. Great tools alone won’t turn the average teacher-to-be into a great teacher, but they can make widespread improvements in learning everywhere for very little cost. Needless to say, tools won’t turn bad teachers into anything but bad teachers. Fortunately, they are a small minority. Good tools will buy us the time we must have to turn around our ideas of how to reward teachers for what may be our most important national defense priority – a great education for all.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s