Building a Guiding Framework for Website Evaluation Design Using the BISE Database

Framework for planning and designing website evalaution listing

May 18th, 2014

As described in our previous blog posts, the National Science Foundation-funded project Building Informal Science Education (BISE) (DRL-1010924) created a framework to code and synthesize evaluation reports on informalscience.org. This first round included all reports voluntarily posted through May 2013. In this blog, Carey Tisdal talks about the methods she used to synthesize 22 evaluation reports focusing on websites. Since 2004, Tisdal has been the Director of Tisdal Consulting in St. Louis, Missouri and has evaluated several informal learning projects with websites.

My aim in this study was to develop a heuristic, or theoretical model, to guide my own thinking and to use in discussions with stakeholders (e.g. web designers, program developers, audiences) that could help define comprehensive and useful evaluation questions for framing website evaluation studies. In selecting a topic, I looked for a subset of reports that would inform my own work, hoping this approach would also make the study relevant for others. I decided to review 22 reports that are coded as “website” evaluands (i.e. the entity being evaluated) in the database (for a full list of those reports, see the bottom of this blog post).

After reviewing these reports, I realized that it was not appropriate to mine this sample for generalizable knowledge about websites or website evaluations. Like other evaluation reports, the findings focused on the merit or worth of a specific set of products and actions within a limited time and place to inform the decisions of one or more stakeholder groups. This is the defining feature of evaluation, and it is one way that evaluation is different from research or policy studies (Guba & Lincoln, 1989). However, I found that the sample could be used to explore the focus areas my colleagues included in their designs. I framed my investigation with the following questions:

What questions did evaluators ask to frame their study?
What information did evaluators provide to clients and other stakeholders (e.g. funders, program participants, and the community) about the value or worth of the website?

To answer these questions, I then identified nine primary focus areas and organized them into a set of sequential, necessary steps that lead to user impacts.

Characteristics of the Sample

Based on the BISE codes, the 22 relevant reports fell into only four categories: 6 were identified as Formative Evaluation studies, 1 as Remedial, 14 as Summative, and 1 as Don’t Know. The reports were written within a seven-year timeframe: 2003-2011 (Figure 1). This sample would not be suitable as a representative sample for hypothesis testing; it was suitable for exploration, using a grounded theory (Glaser & Strauss, 2009) approach to analysis.

Figure 1. Number of reports by year written.

Methodology and Methods

The overarching methodology I used was grounded theory. Unlike more quantitative approaches that aim to test theory, using this method, the researcher generates or discovers theory from the data. I began by reviewing “Findings” and “Conclusions” sections of the 22 reports and coding value or worth statements into temporary groups of similar topics. Looking only at this section, my understanding of the actual focus of each study was not always clear. So, I revised my plan and reviewed each report in its entirety. I continued organizing additional bits of data into temporary categories that apparently related to the same type of information (Lincoln & Guba, 1985). My next step was to “devise rules that describe category properties and that can, ultimately, be used to justify the inclusion of each data bit that remains assigned to the category” (Lincoln and Guba, 1985 p. 347). This involved reviewing the temporary categories, writing explicit definitions, and then excluding any instances that failed to meet the definitions. I also looked for similar categories in relevant literature (e.g. marketing, usability, web analytics) to refine definitions and connect findings back to the broader conversation about websites, thereby implementing constant comparative analysis throughout the study (Glaser & Strauss, 2009).

The “Appeal” category provides an example of this process. After coding, I found that all 22 reports had asked users to provide feedback about how attractive, satisfying or interesting they found some aspect of the site. Based on these instances, I defined Appeal as the extent to which users found the site, as a whole, or a particular aspect or feature on the site, attractive, pleasing, or interesting. Yet, within this broad category, I identified instances where four different types of appeal were assessed. I developed definitions for each of these sub-categories, or types of Appeal.

Visual Appeal: the attractiveness of the graphic elements and their arrangement (e.g. color, photos, layout, and fonts (n = 13).
Means of Engagement: the attractiveness of the specific instances and types of interaction on the site and the overall range of ways users interact on the site (e.g. reading, playing games, reading, watching streaming video, reading, viewing photographs, commenting, listening to audio) (n = 20).
Content Appeal: user interest and attraction to specific topics on the site and the attractiveness of the range of information on the site (n = 21).
New Content and Features: user interest in finding new content or features on repeat visits (n = 2).

Exploring the relationship among categories is another step in developing a grounded theory. During analysis, I realized that some of the categories appeared to provide a pathway to impact. For example, a user must access and use a website in order for expected impacts to be achieved. These examples in the data led me to consider their similarity to Carol Weiss’ Program Theory technique. As Weiss (1998) points out, “For evaluation purposes, it is useful to know not only what the program is expected to achieve but how it expects to achieve it” (p. 55).

Findings

In addition to Appeal, I identified eight other evaluation focus areas across the reports and arranged them as necessary, sequential steps leading to user outcomes, that is, as a high-level program theory (Figure 2). To explore each of these categories more deeply and apply them to your own work, you can review the complete white paper at the Visitor Studies Association website.

Figure 2. Framework for planning and designing website evaluation.

This framework is not intended to prescribe evaluation questions; rather, it functions as a heuristic to allow evaluators and clients to consider what specific questions are important at different stages of development and evaluation, and in different contexts. The framework could also serve as a template for the development of an explicit program theory for a specific website.

What does this mean for you?

What topics in the BISE database might you explore using grounded theory to inform your work?
By generating evaluation questions based on a framework such as this, evaluators can support Web specialists in developing comprehensive strategies for user impact and testing them through front-end and formative evaluation studies. How do you see yourself using the tool to think about questions for your website evaluation?
Since websites are part of larger designed systems with several evaluands, do websites have a unique contribution to user impact? How do we assess this?

For more on the BISE database, the codes, or the coding process, please see the prior BISE blogs!

References Cited

Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative research. Chicago, IL: Aldine Publishing Company.

Guba, E. G. & Lincoln, Y. S. (1989). Fourth Generation Evaluation. SAGE.

Lincoln, Y. S & Guba E. G. (1985). Naturalistic Inquiry. SAGE.

Weiss, C. H. (1998). Evaluation: Methods for studying programs and policies (2nd ed.). Upper Saddle River, NJ: Prentice Hall.

Reports Examined as Part of the Study

Summative Evaluation of Einstein’s Big Idea by Karen Peterman, Kathryn Franich, and Irene Goodman.
Summative Evaluation of The Human Body by Ralph Adler, Alice Apley, Wendy Graham, and Laura Winn.
Summative Evaluation of Strange Days on Planet Earth Website by Valerie Knight-Williams.
PEEP and the Big Wide World Web site Final Evaluation Report by Jennifer Beck.
Design Squad: Website Addendum Report by Goodman Research Group, Inc.
NOVA scienceNOW: Season Two Summative Evaluation by Karen Peterman, Emilee Pressman, and Irene Goodman.
RACE–Are We So Different? Summative Evaluation of the Website http://www.understandingrace.com/ by Minda Borun.
Formative Evaluation of the Ganga Website by Irene Goodman.
The Music Instinct Formative Evaluation by Rucha Londhe, Miriam Kochman, Nivedita Ranade, and Irene Goodman.
A Summative Evaluation of Roadside Heritage by Jerry Hipps, Sharon Herpin, and Donna Winston.
Citizen Science Toolkit Project by Stephanie Thompson.
Use of New Media to Engage the Public in the Ethics of Nanotechnology by Douglas Spencer.
Connecting Tennessee to the World Ocean: Formative Evaluation Report by Christopher Horne.
Remedial Evaluation of ExhibitFiles by Carey Tisdal.
Ice Stories Summative Evaluation Compilation Report by Valerie Knight-Williams, Divan Williams Jr, Christina Meyers, Ora Grinberg, Tal Sraboyants, Eveen Chan, and David Tower.
Tissues of Life: Exhibition Remedial Evaluation, Program Summative Evaluation, and Web Summative Evaluation by Randi Korn & Associates, Inc.
Formative Evaluation of Cyberchase: The Next Frontier Web Redesign by Barbara Flagg, Hilde Hochwald, Debra Klich, and Laura Minnigerode.
Formative Evaluation of Season VII Cyberchase Materials by Barbara Flagg, Alice Bernard, Allan Brenman, Laura Minnigerode.
Focus Group Study in Support of Cyberchase Parent Website by Barbara Flagg.
Evaluation of Current Cyberchase Website by Barbara Flagg.
InformalScience.org Evaluation Research Study by Julie Remold, Judi Fusco, Bill Penuel, Patricia Schank, Mingyu Feng, and Vera Michalchik.