Skip to main content

On Partnerships with Evaluators: Q&A with Kirk Knestis of Hezel Associates

Kirk Knestis is the CEO of Hezel Associates, a research and evaluation firm specializing in education. His varied career experiences include working as a small business owner and employer, as a classroom teacher, and as a university program administrator. All views expressed in this excerpt from a recent Q&A are those of Dr. Knestis and are not representative of CAISE or the National Science Foundation.


Do you have suggestions for building good partnerships between evaluators and project team members throughout the course of a project’s lifetime?

First and foremost, PIs should actively engage their external evaluator (and research partner if applicable) as early in the design process as is practical. Because so much of this type of work is grant funded, this typically means “early in the proposal writing process” but I would argue that we should take a step even further back, getting all parties involved in a proactive process to define the work to be done, and then describing it in proposal narrative content.

Irrespective of which role might be served by a contractor, an external evaluation/research perspective can be extremely valuable to establishing the theoretical basis of the innovation being developed and tested (e.g., through facilitated logic modeling strategies), explicating outcomes (persistent knowledge, skills, and dispositions expected for public and professional AISL project audiences), and integrating data collection and analysis into all aspects of the proposed project. Determining the type of research appropriate for a project, given the innovation’s maturity and existing evidence of promise/effectiveness, requires some specialized technical understanding that a researcher should bring to the effort. Having that expertise fully integrated into the team developing the idea and proposal for funding should substantially increase the likelihood of success, both of the proposal and of project activities once the work is funded. And applying this kind of collaborative orientation up front will do wonders for establishing relationships, communication channels, and working practices conducive to doing the work once a grant is awarded.

How do you think the release of the Common Guidelines for Educational Research and Development by NSF and the Department of Education in 2013 has influenced how the informal STEM learning field thinks about evaluation and research?

It’s early days in terms of seeing large scale impact of the Common Guidelines in the NSF STEM education community, but that document does a number of things that will eventually make important differences for innovation developers, researchers, and practitioners. First, it frames a typology built around the principle that research should inform the ongoing development of education innovations; each of which is grounded in some theoretical basis and is constantly evolving from a fresh idea to a STEM education resource, tool, technology, or program with demonstrated efficacy. The guidelines also provide a common vocabulary with which we will be able to work out questions relating to research (and by extension, evaluation) in NSF projects.

Perhaps most importantly, however, the Common Guidelines typology puts Foundational, Early Stage/Exploratory, and Design and Development research on an equal footing with “impact research”—Efficacy, Effectiveness, and Scale-up studies. By describing distinct purposes for each, based on the maturity of the innovation (and its theory of action) being examined, the guidelines propose that using a randomized controlled trial design for the wrong purpose is as much a mistake as would be claiming that a program “works” without an appropriate basis for comparison, sufficient numbers to assure explanatory power, and demonstrably valid and reliable measures of dependent variables. The guidelines also define quality for each type of research, based not only on purpose but also in terms of expected results, aspects of the research plan, and the theoretical bases and policy/practical significance that serve as rationale for funding in a proposal.

For our readers, how would you describe a rigorous evaluation study? And how, if at all, has the definition of rigor changed when evaluating informal environments and settings?

Recognizing the distinctions in purpose and quality framed for each of the six types of research in the Common Guidelines, conversations about “rigor” can be redefined in a couple of important ways. An experimental or quasi-experimental impact study (Research Type 4, 5 or 6) is, by definition, more “rigorous” than a Design and Development Research effort (Type 3) aimed at iteratively improving an innovation that shows promise but isn’t yet ready for a true impact study. However, rigorous is not synonymous with “better,” as has often implicitly been the case in evaluation/research requirements with past federal funding solicitations. Readers familiar with the What Works Clearinghouse standards should appreciate that the Common Guidelines offer substantially greater freedom to do research that’s recognized as being of quality, without the imposition of inappropriate design limitations and unreasonable expectations of rigor. This bears most importantly on Design and Development Research (often referred to in practice as Developmental Evaluation or Design-Based Research).

To the second question, I hope readers are not disappointed by this but I don’t think there is anything inherent in informal STEM learning contexts that calls for different thinking regarding the rigor of research or evaluation practices. Conversations that turn from rigor to quality should be good for all involved.

In your view, what are some reasons for having an external review panel, or a third-party evaluator that is independent, versus an “internal” person for a project, such as for projects funded by NSF?

I am likely in the minority on this but, for what it might be worth, I believe that the independence and perspective that should result from an evaluator being “external” to the project is more a function of professional methods and practices, than a result of organizational separation. An evaluator employed by a completely separate contracted entity can completely abrogate responsibility for collecting and analyzing data “at arm’s length.” Equally, a well-trained individual employed by the same agency as the PI can maintain appropriate professional distance if they focus on doing responsible evaluation work.

Degree of separation is, however, just one aspect that might be addressed when choosing an external evaluation scheme. For example, a panel approach allows a broader range of perspectives to be considered, to provide external input over the life of the project and to address implementation quality and achievement of project goals. This could be useful if substantive expertise—so-called “connoisseurship” evaluation of pedagogy, STEM content, technology applications, and/or product quality—is more important for a given project, than are evaluation methods skills. An external evaluator will be charged with addressing process quality so, particularly if a project involves a number of partners, understandings of organizational change processes or project management might be as (or more) useful than would quantitative/inferential data analysis skills.

In what ways do you feel the field is becoming more sophisticated about how to use evaluation to learn from and improve practice?

While it has long been argued that “every NSF project is a research project,” program solicitations have gotten substantially more particular in the last 3-5 years in terms of how they articulate the importance of contributing to broader understandings of STEM teaching and learning. The current AISL solicitation is a great example, calling out “Knowledge-building” as a specific priority to be addressed by any proposal submitted for consideration. This helps emphasize the first principle behind the research expected from funded projects, irrespective of more nuanced ways that evaluation and research functions and responsibilities are parsed out. I am confident that the NSF merit criterion of Intellectual Merit is being better served now than was the case when I evaluated my first NSF project.

I also think that (again setting aside the use of particular terminology) we are collectively getting better at using data for formative purposes, to iteratively improve the innovations being developed AND to improve the quality of research and development (R&D) implementation activities. It’s common to fall somewhat short in terms of how quickly findings from testing data are used to inform changes, leaving our “feedback cycles” longer and less responsive than might be optimal, but I have found client PIs seem less likely to leap straight to trying to “prove that their programs work.” That’s good news for NSF, grantee institutions, and our audiences of learners.

Finally, what suggestions would you share for building good communication and collaboration between evaluators and project team members when evaluation findings are used at different junctures of a project?

Once a project award is made, the fun starts. Any contracted contribution (by an evaluator or research partner) must be supported by a clear, formal scope of work and contract, which will serve as the basis for the professional relationships between project staff and the evaluators/researchers. The research study embedded in the project must have a formal protocol finalizing its design and guiding instrumentation, data collection, analysis, and reporting. This is an easy step to neglect, once everyone is clear about the services, deliverables, timeline, and costs of a contractor’s work, but the technical aspects of the study have got to be equally clear and well documented. Further, the external evaluation of the project requires a separate written protocol, since it serves a completely different purpose and answers completely different questions—again, tracking progress and success of grant-funded activities. Complicating things even further, an NSF project might actually require multiple, parallel research protocols if, for example, early internal testing is used to inform changes to features of the innovation but later testing will include formal pilots in the field. All of these working protocols will have to reflect what was described in the proposal, but final study designs will almost certainly benefit from additional time available post-award.

It is important for PIs to facilitate these conversations but moreover, they should make sure that discussion extends to how findings will be communicated and how they will actually contribute to the collective work of the project. Formal reports might not be particularly valuable for product-improvement purposes associated with Design and Development Research. Reports are expensive to produce and potentially decrease the utility of findings for development and implementation improvement. Research results to inform changes to an innovation in development must be fast-tracked to the people making the design decisions, ideally in ways that put the researcher in something approaching a “co-developer” role. External evaluation feedback to improve R&D practices (formative program evaluation) must be reported to managers empowered to make “monitor-and-adjust” decisions about project activities, and delivered quickly enough that time remains so benefits can accrue from such changes. Regular conversation across these responsibilities will improve the quality of our collective work, the value of the innovation to educators and learners, the deliverables developed by grant activities, and contributions to broader understandings of STEM learning.

Posted by Jared Nielsen