Some thoughts on my approach to evaluation design
I’ve just finished another internal evaluation of a project. This time it’s the AMORES project http://www.amores-project.eu/ Reflecting on the evaluation, and the similarities with the previous evaluation I did, led me to some realisations about the sort of evaluations I conduct, how they are designed, and what their essential elements are. I thought I’d collect these together into a couple of blog posts, mainly so that the next time I design one, I can remember the best of what I did before.
I should specify that I’m discussing particularly internal evaluation. For those not familiar with educational projects, most of them have two evaluation strands. One is the external evaluation; this is conducted by someone outside of the project who examines how well the project functioned, whether it met its goals or not, how well communications worked within it, and so on. It’s part of the Quality Assurance, compliance and accountability process.
The internal evaluation asks questions of the learners, teachers and anyone else involved with the educational aspects to identify good practice, look for tips that can be passed on, and encapsulate the overall experience for the learners and educators. In short, it’s there to answer the research questions addressed by the project.
There’s a good deal of overlap between the two, but they are essentially different things, and should be done by different people. You merge the two at your peril, as part of the external evaluation is to address the success of the internal evaluation. And you do really need both to be done.
I’ve been the internal evaluator on 13 education projects now, but the last two (the other one was the BIM Hub project http://bim-hub.lboro.ac.uk/) were very similar in evaluation design; I think I’ve cracked the essential elements of what an internal evaluation should look like.
Part of the issue with being an internal evaluator is that, even though you’re part of the project team, you’re not (usually) one of the teachers. And teachers on projects have their own agenda, which is to teach (obviously) and, quite rightly, this takes precedence over all the analysis, research and general nosiness that a researcher wants to conduct.
For this reason, an evaluation design needs to be as unobtrusive as possible. Most education activities generate a lot of data in themselves, artefacts, essays, recordings of teaching sessions, all of these can be used without placing any additional burden on the learners or teachers. Sometimes the evaluation can drive some of the learning activities. So, for example, you need students’ perceptions of their learning; so you set as an assignment a reflective essay. You need something to disseminate, so you set students the task of creating a video about their experiences, which can also be evaluation data. And when we’ve done this, not only does this prove to be a very useful set of data, it also becomes an excellent learning opportunity for the students. Teaching generates a lot of data already, too, such as grades, results of literacy testing, pupil premium figures, tracer studies. As long as the institution releases the data, then this is stuff you can use with no impact on the learners or teachers.
So here’s the first set of criteria. Evaluations must be:
Unobtrusive, opportunistic, aligned with teaching practice
The second set of criteria is related to actually having an evaluation that makes sense. There’s no point gathering a set of data that are more than you can deal with (having said that, every project I’ve done has). Also the data you collect have to be targeted towards finding out something that will be of use to other practitioners once you’ve finished the project (I’ll come to outputs later). The RUFDATA approach is a good one here. There’s also no point trying to gather so many data that no-one will look at the surveys you’re distributing, or complete them if they start them. For length of survey principles that seem to work are:
Quantitative questions – no more than one page (and use a 5-point Lykert scale obviously – anything else looks ridiculous – but add “don’t know” and “N/A” as options too.
Free text questions: well no-one wants to write an essay, and if it’s on paper you’ll have to transcribe them at some point anyway. As far as numbers go, a good rule of thumb is that if it’s a number you’d see in a movie title it’s OK. So seven, or a dozen, or even 13, is fine. More than that is pushing it (and if you’re going to ask about 451 or 1138 questions then full marks for movie trivia, but minus several million for being a smart arse. The point of the movie title thing is that if you see your research questions as characters in the narrative you’re going to weave, then you don’t want to overcrowd your story anyway. Putting too many in then becomes pointless. You want all your questions to be Yul Brynners, rather than Brad Dexters.
So: useful, targeted, light touch, practicable
A third set of principles is based around whose research is it anyway? Which will be covered when we reconvene in the next post.