Measuring Learning Across ISE Projects

Measuring Learning

October 8th, 2014

The Diving Deeper, Looking Forward session topics at the 2014 AISL PI Meeting emerged from a pre-meeting survey of AISL-funded Principal Investigators; discussions with PIs and others who have participated in CAISE convenings over the past two years; and input from CAISE staff, co-PIs, and NSF Program Officers. These sessions were intended to stimulate discussions about cross-sector topics and issues that can continue beyond the meeting and generate new ideas for future projects and collaborations. The following blog post is a summary of questions, issues and ideas expressed by the participants in this session.

As the field of informal science education has grown, many have been working to evaluate the effectiveness and outcomes of interventions and programs in informal learning environments. Specifically in the ISE program at NSF (now the Advancing Informal STEM Learning, or AISL, program), there has historically been a strong emphasis on evaluation of individual projects (with hundreds of evaluations from ISE/AISL-funded projects shared on Now, the field is moving more toward integrating learning research into projects (as signaled by the publication of the Common Guidelines for Education Research & Development by NSF and the Department of Education), as well as fostering a shared understanding of the impacts of informal STEM learning work in the aggregate. A question before the field now is: can we agree on definitions of learning constructs? Relatedly, can we measure them in ways that will allow us to compare how they are addressed in different settings and contexts—and if so, what tools can we use?

In this session at the 2014 AISL PI Meeting, a panel of project leaders that are developing “common measures” for the field came together to discuss their work and discuss priorities with currently-funded AISL PIs and project staff. The conversation was partly built off of a December 2013 convening that dove deeply into exploring the cutting-edge advancements in measuring informal STEM learning. The panel included Kirsten Ellenbogen (President of the Great Lakes Science Center and CAISE Co-PI), Gil Noam (of the Program in Education, Afterschool & Resiliency [PEAR] at Harvard University), Christian Schunn (Senior Scientist at the Learning Research & Development Center at the University of Pittsburgh), and Tina Phillips (Evaluation Program Manager at the Cornell Lab of Ornithology).

Measuring Learning panelists, from left to right: Kirsten Ellenbogen, Christian Schunn, Tina Phillips, Gil Noam (standing).

Measuring Learning panelists, from left to right: Kirsten Ellenbogen, Christian Schunn, Tina Phillips, Gil Noam (standing).

The session’s presentations and discussions were framed around three questions:

  1. What kinds of quality and outcome measurements are being conducted in ISE?
  2. In what ways can we integrate shared measurements across projects?
  3. What are the best practices for using these measurements in formative and summative evaluation?

What are Shared Measures?

Why are we, as a field, considering common instruments as useful and important? Gil Noam presented this question to the group, pointing out that other learning research fields don’t have common instruments. A group of learning researchers working in informal environments has begun to develop a shared vision for a suite of instruments that would:

  • Be large-scale but sizable enough to work across a number of projects
  • Be voluntary for evaluators, designers of experiences and settings, and learning researchers to use
  • Allow for aggregation of data across many types of informal STEM learning program types
  • Feed into aggregation that is practice-oriented and feedback-driven for development
  • Through data aggregation, track activity across the field, which would allow for more transparency and constructs that all agree on

Some of this work has been initiated at PEAR, which is dedicated to assessment work. The team there started with a literature review that collected 60 instruments in different areas of informal STEM learning. These were aggregated on a searchable website, Assessment Tools in Informal Science (ATIS), where one can enter parameters for a study and receive suggested instruments to use.

Over the years, this conversation has evolved to be about a number of instruments rather than a single one that would be useful for multiple informal STEM learning programs. PEAR is developing instruments specifically focused on interest, socio-emotional and 21st century skills, and program quality. One of the instruments, the Dimensions of Success observation tool, is collecting data on programs using a set of shared elements in 25 states. So far, the team that is looking across the data has observed that reflection, relevance, and inquiry impacts in after school programs are currently measuring low, indicating that the field needs to work more aggressively on these parameters.

Measuring the Right Things: Where do you Want Your Learners to Go?

Christian Schunn observed that there are many different options for learning outcomes that evaluators, practitioners and researchers could potentially measure. The key is figuring out not which outcomes are easy to measure, but rather, in assessing the impact on your learner audience.

We are all interested in measuring real, impactful engagement in science learning. The bigger question that the field needs to address is what larger constructs should we be measuring—that is, what effects do learning experiences have in the longer-term? Schunn suggests the following possibilities:

Christian Schunn presents on his recent learning research.

Christian Schunn presents on his recent learning research.

  • Later engagement—are learners able to engage more in future efforts as a result of your intervention?
  • Later choices—are they more likely to choose additional STEM learning opportunities because of their experience?
  • In their experience with your intervention did they gain critical knowledge and skills that position them to participate successfully in career outcomes or other longer-term engagements?

In Schunn’s work, he has found that three different things can impact longer-term engagement: fascination (with STEM content), values, and competency beliefs (more information about these terms can be found through the Science Learning Activation Lab).

Integrating Shared Measures in Evaluation and Research

Once a researcher has overcome the challenge of developing an instrument for measuring learning in informal settings, they may face an additional challenge—integrating that instrument into evaluation practices. Tina Phillips shared findings from work that seeks to address that challenge. While the context for her work has been in citizen science, the tools and instruments that have been developed can be used in other informal learning settings.

The Developing, Validating and Implementing Situated Evaluation (DEVISE) project seeks to improve evaluation quality and capacity across citizen science programming. In an initial literature review for the project, the team discovered that instruments were either too lengthy or focused to be used in informal citizen science contexts, so they sought to develop their own measures. With input gathered through the literature review, Learning Science in Informal Environments, and from the wider ISE field, the team developed a set of learning outcomes to measure in the context of citizen science, which include constructs like interest in science and the environment, efficacy, motivation, knowledge of nature of science, skills of science inquiry, and behavior and stewardship.

Shared Written Comments

One important outcome of the session was a list of measurement needs collected from the participants. This input can inform the informal STEM learning evaluation and research community on where to build capacity with regard to common measures. Participants were asked to write their suggestions, which are listed below.

Note: the comments have been edited for length and clarity. Full comments are available in the unedited session notes in the 2014 AISL PI Meeting Group.

Related Resources

The following resources were shared in the session.

  • Measurements for argumentation skills.
  • Usefulness of retrospective pre/posts
  • For out of school time STEM programs:
    • Customizable, publicly available tools
    • Guide/trainings for doing evaluation, particularly for non-surveys
    • Tools that measure STEM skills (and are customizable)
    • Best practices for observation-based, embedded assessment, interview-based assessment
  • Validated, reliable tools for measuring constructs in after-school
  • Assessing improvements, change in communication abilities, and the effectiveness of communication
  • Monitoring changes in interest and knowledge of an audience following broadcasted information (segments), workshop attendance, and outreach program participation
  • Is there a standard or common definition for the modifier “evidence based” when applied to measurement?
  • Can you pick and choose parts of common tools to use along with my own assessment—don’t want assessment to get too long
  • Common assessments for online databases, such as loading citizen science data online after collecting for user interaction and continued knowledge gain
  • Public outreach, public art
  • How to do formal assessment for small projects (i.e. $3K seed project) with small budgets
  • Picking/choosing validated questions—or do you need to use the full set of validated questions?
  • How to measure effectiveness of cooperative learning in teams (i.e., teamwork)?
  • Attitudes towards STEM/STEAM and context specific
  • How might you measure the idea of curiosity?
  • It would be really important for “other” museums, i.e., history museums to better understand how to measure their success beyond attendance numbers!
  • Guidance on how to determine measurement needs?
  • Resource base of measurement tools and how and when, where, to use them, and not–a one-stop center.
  • Measurement needs in informal science museum settings; kids in different ages
  • How to measure creativity? City planning issues?
  • Geographic reasoning skills instrument
  • Instruments specific/relevant to citizen science
  • Ways to measure learning outcomes or other (non-cognitive) outcomes for project participants with whom we interact exclusively online
  • Non-self report shared/validated instruments/measures

Selected Resources from the Informal Commons

Special thanks to Sarah Cohn for documenting this session.​