MOOC dropouts – What we learn from students who leave

Sherif Halawa 11 July 2014

Over the past few decades, online learning platforms have been well known for the richer types of learner interaction data they bring to the researcher's table than face-to-face instruction. Massive open online courses, or MOOCs, bring two additional dimensions: scale and learner diversity.

A typical MOOC contains many thousands of learners who come from very diverse backgrounds and demographic groups, have different intentions, persist in the MOOC to different extents, and leave for different reasons.

Many MOOC researchers are asking the classical questions: Why do students drop out, and how can we mitigate dropout?

Perhaps it is more reasonable to first ask: Why do we care about studying dropouts in MOOCs?

Intention survey results and data analysis from three recent Stanford MOOCs reveal that over 90% of students who left before the end of the course did not achieve their initially self-set goals (non-attainers). Furthermore, a dropout diagnosis survey administered in a recent Stanford MOOC revealed that 71% of dropouts reported course difficulty or procrastination as the main reason for dropout.

This learner group is potentially more amenable to educational interventions compared to learners who dropped out because the course was irrelevant or because of a lack of reliable internet access, time or language skills. The former group may be referred to as “amenable non-attainers”.

The mission underlying MOOC dropout research is to predict and diagnose dropouts so we can identify amenable non-attainers, then design and deliver interventions to potentially increase their achievement levels by direct means (explaining difficult concepts) or indirect means (developing the learner's self-regulated learning skills or commitment level).

Predicting dropouts

Similar to predicting whether it will rain tomorrow, predicting dropouts yields true and false positives and negatives. Prediction accuracy, however, is not the only relevant figure of merit. Can you figure out what is wrong with the following predictor: “One day before the end of the course, red-flag all students who have not shown up in the last two weeks”?

If inactivity in the last two weeks is your criterion for dropout, then this predictor will be highly accurate. However, the predictions are produced too late to be of any use for intervention purposes.

Thus, researchers are trying to design predictors that red-flag dropouts early on, but earliness and accuracy are, unfortunately, attained at the expense of one another.

Time is a more complicated issue than just that. Training a predictor requires truth values (did the learner actually drop out or not?) and this data is only available after the course has ended. So how can we develop models that are useful during the life of the course?

Researchers resort to training models on some courses and then testing them on others. The adequacy of the model for a test (or future) course depends on its similarity to the training courses and how successfully the model accounts for the differences between the courses. MOOCs differ in content, difficulty and workload, and the demographic distributions of their learner populations are also different.

Disengagement

MOOC platforms allow for a previously created course to be re-run with no necessary instructor supervision. This allows for designing more powerful models by training them on previous instances of the MOOC in which they are to be used.

To our surprise, a comparative study we ran on successive iterations of MOOCs revealed that completion rates can differ significantly even between instances of a single MOOC that serve identical content to learner populations with almost identical demographic distributions.

Is this a seasonal variation related to the time of year, or an emerging trend caused by a shift in some quality of the learners across the instances? The question remains open.

A holistic approach to dropout modelling requires us to scan for cues of disengagement and also take into account learner and course attributes. Some very interesting discrepancies exist between learner groups when classified by certain demographic attributes.

For instance, completion rates seem to increase with age regardless of the highest attained educational degree and field. Such discrepancies are often immediately visible on instructor dashboards, but interpreting them correctly might be very tricky.

Several of Stanford's computer science, or CS, MOOCs exhibit lower female than male completion rates. A careful analysis, however, reveals higher completion rates of learners with stronger CS backgrounds than those with weaker backgrounds.

The fraction of females is high in the non-CS degree group (learners coming from the humanities), and lower in the CS-group. The analysis also reveals almost equal completion rates for males and females inside each of these two groups separately.

Thus, the correlation of completion rate with gender as observed on a simple instructor dashboard is most probably spurious. The MOOC research space is expected to see more extensive studies on interactions between demographics and persistence.

More detail

The wide gamut of interactions that MOOC platforms record helps us not only to predict and understand more about dropouts but also to distinguish between learners who leave because of lack of time, learners who leave because of lack of motivation and learners who leave because of course difficulty.

In a classroom setting, the teacher can observe students' active engagement with a discussion, but cannot measure non-participating students' engagement (whether or not they are silently following the discussion).

MOOCs record forum post visits even for students who never post to the forum. MOOC forums also record conversations between students, which is difficult to capture in a classroom setting.

In a dropout diagnosis experiment, we asked students to self-report on their state of perceived course difficulty, motivation and their amount of weekly free time. Analysis of respondents' learning interaction data revealed that certain behaviours are associated with high or low levels of each of these three factors.

For instance, students who answered other students' forum questions, socialised with other learners or participated in study groups rarely came from the respondent group who declared lower motivation levels. Regardless of time allowance, students who reported high motivation had a higher rate of repeating assessment questions until they got the correct answer.

Combining multiple features allowed us to distinguish between learners who procrastinated and students who had no time and to predict whether a low-performing learner would self-report ‘not trying hard enough’ or ‘trying but not succeeding’.

With the ability to predict and diagnose dropout, MOOC dropout researchers hope to find ways to help learners better achieve their initial self-set goals.

Embedded interventions (that are presented when the learner visits the course site) might be more appropriate for learners with frequent activity. Inactive learners, whether due to procrastination or lack of free time, represent a case where delivered interventions (that are emailed to the learners) are more appropriate and where automatic prediction and diagnosis models become essential.

MOOCs in the future

People are asking: What will the MOOC landscape look like five years from now? Technology facilitates sharing and reuse of MOOC material. Will future MOOC developers develop supplements, complements or both?

To many, MOOCs present a rich and efficient experimentation platform.

We still stand short of a clear understanding of how course and learner attributes interact to affect behaviours and learning outcomes in MOOCs, but MOOCs continue to supply data and MOOC platforms are adding better feature and content A/B testing support.

I think the fun is just starting to happen and the lessons we can learn from experiments by enthusiastic MOOC instructors are enough fuel to keep us going for many years.

* Sherif Halawa is a PhD candidate at the Learning Analytics Lab, Stanford University, California, United States.