Concept: Using Natural User Interface technology to enable live adjustment of e-Learning presentation

Introduction: What good teachers do best…

One of the ways in which an instructor in the classroom soundly beats e-Learning is the ability to monitor the words, body language and facial expressions of the students. Skilled teachers are always, as it were, running multiple processes in parallel, paying attention, consciously and sub-consciously, to many things at once:

  • They are presenting the content of the lesson, coordinating modes of presentation, modifying emphasis and tone of voice to better explain a concept to that particular group.
  • They are monitoring the time used and time remaining and assessing in relation to where they are in the lesson
  • They are thinking about the part of the lesson they are in and preparing for the next
  • They are monitoring the class for disciplinary issues (in an elementary or secondary classroom)
  • They are monitoring the class for signs of interest, confusion, boredom, or frustration

Good teachers then use these cues to in real time calibrate and re-adjust the style and content of the presentation.

It’s a tiring load to balance, and takes a special skill set. But it’s the sort of task on which humans still far outperform machines. Multiple simultaneous tasks of a high level of dense, subtle, multi-faceted complexity in parallel. This is something that computers can not yet approach. (Though with the relentless exponential march of technology and research, AI experts would probably say it’ll get there sooner than we may think or be comfortable about)

Simulation of this using Natural User Interface technology

However, a very rough approximation of some of this sort of adaptive delivery could potentially be used in e-Learning in the relatively near future using available sensor technology. Learning software could potentially use input from natural user interface  devices like the MS Kinect sensor which comes with the Xbox One video game platform. This sensor has stereo-cameras to track user movements and face expressions and microphones to record user speech for game control.

When people are interested, bored, confused, or frustrated, these are all internal affective (emotional) states, but they manifest externally in facial expressions, spoken language, and body language in a relatively uniform fashion from person to person. This is what enables a human teacher to recognize these states in learners.

Software could use the SDK for the sensor and plug into APIs of the sensor. In that way developers could make applications that track user words, facial characteristics, and body language based on data read  by the sensor. These APIs are available  for professional use, and for research purposes. The learning delivery software could then analyze the input audio and video data based on certain criteria to try to identify (with a sufficiently high degree of probabilistic confidence) the emotional state of the learner.

Once the emotional state of the learner is identified, the software could respond by altering the flow of the presentation in an appropriate way.

This could take different forms.

  • The software could slow down the presentation, or speed it up/skim.
  • It could segue into a repetition of the confusing material or switch to a more elaborate explanation audio track.
  • It could pause the presentation.
  • It could present another optional example from a bank of examples to further clarify a topic.
  • It could ask one or two knowledge checks from a bank of questions. An NUI based system could also ask questions of the learner in a relatively natural voice (text to speech) based on an algorithm.
  • It could simply give a verbal prompt to confirm if the learners are following and if the material is moving too slowly or quickly and then wait for and interpret the user response.
  • It could offer some remediation, such as offering to take the user back one slide.
  • It could suggest good web-based materials to review.

Now, supposing such software were developed and incorporated in some way into learning management and delivery platforms, there is the question of what sort of efforts it would take for an Instructional Designer to develop some of these items (banks of examples, multiple explanations at different levels of depth / detail, questions, and narration tracks for all of these). As well, to what extent that development process could be streamlined or automated (using more improved future text to speech capabilities, for example) to prevent the development from being unwieldy. Also, if this software was incorporated into Learning Management Systems (LMS) or Learning Record Systems (LRS), which track learner session data, there might be concerns about privacy, as to whether at least some high level affective state data from the session should be kept on record. Designers might find this sort of data about how student reaction invaluable for evaluation purposes. But learners might find it creepy.

But, these questions aside, the fact is that the sensor technology with the capabilities to collect the needed data to support such a thing exists. At this point it would simply be a matter of a knowledgeable team or teams doing the hard work of writing the algorithms. And if used, it could potentially lead to much more adaptive and user-customized delivery of e-Learning, living up to the e-Learning marketing over-promises of yesteryear.

Microsoft Kinect for Windows Development

Microsoft’s Kinect for Windows development pages include access to the SDK (Software Development Kit).

The SDK with Kinect for Windows APIs is free to download and to use (no direct software licensing fee) to develop Windows apps that use the Kinect for Windows sensor.

Microsoft sells the special Kinect for Windows sensors (distinct from and somewhat more capable for commercial applications than the sensor used by the Xbox gaming platform) separately for $250. (There are apparently discounts for students and educational institutions.)

Microsoft also includes Human Interface Guidelines (HIG) with this package.

Research studies along these lines

In writing this article, I was pleased to discover that researchers are already looking into ways to use sensors and algorithms to automatically detect affective (emotional) states of learners relevant to attention and learning. Here is some further reading for those who are interested: