Welcome and an early happy weekend. This article is intended to give a bit of deeper background around trends in what are called “Natural User Interfaces,” or NUIs. This term refers to a computer technology trend related to how we interact with computers. It’s a term that I’ve used in some other articles recently, but didn’t get into too deeply, because it takes a bit to explain it so that you do it justice.
Fair warning that this article is intended to be forward looking. It is NOT about looking at tools that are currently available off the shelf. This is not about immediately applicable information. This is a look at where the technology of human-computer interfaces has come from, where it is, where it is probably going in the next few years to come, and what kinds of possibilities that could introduce for computer based training.
So in that respect, it’s about getting yourself mentally prepared for what will be coming a few years down the road. For those who like to think ahead, to dream about future Instructional Design possibilities using the tools that haven’t been invented yet.
My recommendation: if the preface and introduction pique your interest, bookmark this article, email yourself the link, and maybe set it aside for a quiet Sunday afternoon when you have some time to read and reflect. Then you can process it and reflect on the future possibilities of what you can do with this technology. Anyway, I hope you enjoy the article.
Introduction: What is a Natural User Interface (NUI)?
In a recent article, I talked about the future potential for the Kinect sensor to enable on the fly adjustments to presentation in e-Learning. In that article, I brought up the concept of a Natural User Interface, or NUI (pronounced “noo-ey”). In that article, I introduced the term almost in passing, but I recognize that a lot of people might not be familiar with the concept. The intention of the present article is to go into a little more background, to give some sense of the significance of this new type of human-computer interface, what came before it, how it has already changed how we use computers, and how future developments promise to further shape our interactions with computers. Finally, I will try to look ahead a bit at how these types of interfaces could shape the way we train people using computers.
Let’s get started.
Paradigms of human-computer interaction
So the first question for those unfamiliar with the notion of an NUI would be “what is a NUI?”
Well, to answer this question, it helps to go back a bit into the history of computing.
Computers as we generally know them (electronic calculation devices) have a history going back about 70 years, since the time of the second world war. If you want to be technical, you can trace computing back to Ada Lovelace and Charles Babbage and the Difference Engine and Analytical Engine in the early to mid 1800s, but for simplicity, let’s say 70 years, starting around 1945.
What started as a technology used to automate complex computations for a handful of high-end research and military institutions via massive electrical machines has evolved and grown over these seven decades to become a technology that is an integrated, essential part of the fabric of life (at least for people in relatively developed parts of the world). Along the way, the power, speed, and storage capacities of computers have increased exponentially, while the costs and sizes of components have at the same time shrunk at exponential rates. Computers have gone from machines numbering a handful in the whole world to numbering somewhere in the billions. Some billion powerful computers are carried around in people’s pockets in the form of smart phones, and embedded computing devices appear in almost any electrical device produced today.
Along with these developments, the means through which people interface and interact with computers have also dramatically changed. This change has come both as a result of technological developments, and at the same time as a driver to uptake of computers amongst the general population, Human-Computer interaction has gone through a number of important paradigm shifts.
A paradigm, for those unfamiliar with the term, is a dominant contemporary pattern or way of conceptualizing and doing things. There have been a few major paradigms of human-computer interaction, with corresponding shifts as the technology moves from one dominant mode of interface to another.
I first want to speak about three major early paradigms of human-computer interaction:
- Batch interfaces (1940s to 1960s)
- Command Line Interfaces (1960s to 1980s)
- Graphical User Interfaces (1980s to 2000s)
I will then speak about the recently emerging paradigm of Natural User Interfaces (NUI). I will discuss some of the different examples of NUIs, and finally look at new possibilities for training opened up by these sorts of interfaces.
First paradigm: Batch interface (1940s to 1960s)
The first computer interface paradigm was the batch interface. In this setup, users entered commands through stacks of punch cards punched by hand and fed into a card reader peripheral, which read the punched holes via optical scanning and turned the entries into electrical inputs. Programmers would carefully enter their code on the punch cards and submit their stack of cards as a batch to be scheduled and run by the administrators of the machine.
Remember, this was a time when computers were huge machines taking up most of a room, and a whole university or department might share one of these machines. It was a scarce, in demand resource, so programmers had to wait their turn for their code to be run. Computers could run one program for one user at one time. This produced a serious bottle neck in performance. Users could not typically just sit at the computer by themselves and use it because the resource was limited and the time could be used more efficiently if the programs were run together one after another as a batch.
This cycle from submission of the program to scheduling to entering it into the computer to running could take days, depending on how busy the administrators of the computer center were. And if there was a bug, something miscoded in the punch cards, the program would fail, and the programmer would have to start again, identifying where the error was without any sort of guidance (“syntax error on line 57,” etc). Such aids didn’t exist. The programmer would try to track down the error in logic by hand, and then resubmit the revised program to the queue. It was a system that encouraged refined first draft work.
In a batch interface, the computer reads commands, coded in rigidly structured messages, carries out commands, and gives output through a printer. The computer would take in the programs of many people at one time, and process them, one after another, as a batch. It was in this time period that the first computer languages were developed.
The frustrations of dealing with these batch processing systems were a major drive for computer science researchers of the day to look into alternate modes of human-computer interaction.
Second paradigm: Terminals and Command line interface (CLI) (1960s to early 1980s)
Then followed the command line interface (CLI). This came about along with development of early computer displays and monitors with keyboards used as inputs. Users could input characters through a keyboard and see them displayed on the screen. This would take place at a terminal with a keyboard and display connected or networked to the main computer machine.
The main computer would be set up to time share between multiple users. The computer basically rapidly switches between carrying out tasks for each user, allowing the central computer to “simultaneously” handle many users at once. To get a sense of how this works, imagine getting your household chores done by doing laundry for a minute, then switching to keeping an eye on dinner for a minute, then switching to attending to your kids for a minute, then switching to tidying the living room for a minute, then switching to sweeping the floor for a minute. Then imagine thistask switching a million times faster. You’re doing one thing at a time in little slices, but to a casual observer, everything is smoothly proceeding all at once. Generally, your computer at home or at work “multi-tasks” in a similar sort of way. The coordination of the time sharing created a certain amount of overhead using up computer resources, but this became less of a concern as computers became faster over time.
So the user no longer had to punch cards, and no longer had to give them to someone else to feed into the machine, and wait. The different programmers and application users could get access to a terminal, and use that to interact directly with the computer in something resembling real time. The user could input text information, and get text output back more or less immediately.
This paradigm also overlapped with the appearance of the first so-called “micro-computers” used as office business machines (e.g. the IBM era). It was also the paradigm under which the first “personal computers” were born. These were standalone computing machines small enough to fit on a desk.
The user of one of these machines could use the keyboard, aided by the feedback visuals from the screen, to type documents, or to enter commands. The user controls the computer and performs actions such as creating, saving, deleting, copying, and moving files and directories using text based commands typed into a a command line. This can still be seen today in the command line in Linux and in the mstsc / Commad Prompt ultility in Windows. MS DOS, the first Microsoft operating system, worked like this.
This is known as a Command Line Interface or CLI. More advanced computer programming languages were also developed at this time.
Third paradigm: Graphical User Interface (GUI) (1980s to 2000s)
The next paradigm was the Graphical User Interface or GUI (“goo-ey”). This consists of a “desktop metaphor,” with program windows, menus, virtual “buttons” and other controls on the screen with which the user interacts using a mouse and pointer. Associated with this is the acronym WIMP=Windows, Icons, Mouse, Pointer.
The earliest GUI was from research at Xerox PARC in the 1970s. These ideas were later taken up by Apple Computers in the early Macintosh and Microsoft in their Windows OS. Interactions simulated the way a person might interact with a real world machine, by “pushing” (with mouse clicks) virtual buttons, turning virtual dials, etc. It was at this stage, corresponding with a sufficient miniaturization of computer components and fall in price, that the idea of a home “personal computer” took hold. With the desktop metaphor, windows, and mouse pointers, it became much more natural for everyday people to use computers. There were still many rough edges, and certain arcane bits of knowledge to learn, but overall, it became much simpler for everyday people to do basic things with computers. Computers were starting down the road to becoming a household appliance that average people would use as part of their everyday lives.
The emerging paradigm: The natural user interface (NUI) (2000s to present)
The next paradigm of human-computer interaction is so-called Natural User Interfaces, or NUI. This can encompass a variety of types of interaction, but the overarching idea is that rather than having artificial or mechanical intermediary means of input, the user interacts with the computer in ways more like those used to interact with people and objects in the real world, and more directly. This typically means touch, body / hand gestures, facial expressions, speech, and giving queries or commands to the computer in something much closer to the ambiguities of everyday language rather than in rigid computer syntax.
What does this mean? Well, to illustrate, let’s look at the predominant method of computer interaction that we’re just coming from and still wrapped up with. Namely, the mouse. Or, more precisely, the mouse and pointer as a way of navigating graphical menus and control interfaces on a screen display, with the keyboard for the user to enter in data like on some electronic typewriter. This form of interaction was almost completely predominant from around 1984 right up through to around 2008, a period of 24 years. The 1984 date marks the appearance of the Apple Macintosh (128k), which featured a GUI and mouse. 2008 on the other hand was the appearance of the iPhone 3G, which helped to explode the popularity of capacitive multi-touch smartphones. (As much as I dislike Apple’s closed model and think they’re past their prime, I have to grudgingly give them credit for having been squarely at the center of both of these technological inflection points.)
The mouse has become so much a part of our daily activities, at home and at work, for so long, that it’s easy to lose sight of how awkward and un-natural a way this is of interacting with with a computer. Or interacting with anything. You sit in front of a computer screen, staring at it.You have a button on the screen. You have to grab this mouse on the desktop, drag it along the horizontal plane of the desk surface in order to move the visual of a pointer arrow on the vertical plane of the screen surface. And then you click on a button on the mouse to “click” the on-screen button. Once upon a time, this was the only way to mediate the pressing of a button. It was simply the only way to do it. But what is the most natural instinct to do this, today, given the technology widely available now, namely touchscreens? Well, since 2008, with the iPhone, and since 2010, with the iPad, it’s simple. You reach out your hand to the screen and touch the putton to press it. The whole step becomes much more natural and effortless.
Admittedly, it’s still kind of weird, because you’re still blocked by this 2 dimensional surface as you bump up against it and touch it or move your hands over it. It’s still a little limiting and artificial. But it’s getting there. You’re completing the metaphor at least of the classical graphical user interface or the desktop workspace on which you place things and move things around. Instead of moving them with a mouse, you move them directly with your fingers. You’re still operating something like some old fashioned instrument panel, but that has become more naturally engaging. You move like you’re actually operating an instrument panel in real life.
As mobile computing and mobile internet have taken off, this has impacted web and application design so that even on the desktop, the user interface principles inspired by touchscreen usability – lots of white space, simplified menus and controls, and large button targets – have become predominant. Designers try to build applications that work well on both.
Interacting with the computer in these more natural, everyday ways means that in a sense, the interface fades from attention and becomes invisible to the user. But the idea is that generally the experience is smoother, more realistic, more like a real world interaction. The distance between the user and the computer becomes smaller. In this way the computer becomes a little more like an extension of the user’s body. The user simply smoothly interacts with the computer to do what he needs to do.
We call such an interface a Natural User Interface, abbreviated NUI, and pronounced “noo-ey.” It’s the idea of an interface that drapes itself over us, fits us like a glove by letting us interact with the computer more like we interact with real world objects and people.
In popular entertainment, we see some examples of futuristic concepts of use of NUIs. The computer on Star Trek TNG, for example, which the crew commanded through voice or touch screen control panels as they walked around the ship and did their thing.
Or the gesture interfaces Tom Cruise’s character used in the Pre-Crime unit in Minority Report.
Or more recently in the battle “simulation” in the film Ender’s Game.
Multi-touch touch capacitive screens as seen in modern smartphones and tablets are one good example of an NUI. You interact with screen items by touching them with one or more fingers to stretch items, rotate them, shrink them, etc.
Virtual assistants or agents such as Apple’s Siri or Microsoft’s Cortana are another example, or another aspect of natural user interface technology. Here users interact in a somewhat conversational manner with the computer using speech. Some of the predictive elements of Google Now would also be examples.
Haptics (touch based interfaces) are yet another element to make interfacing more natural by simulating the texture and force feedbacks and resistances you would get interacting with real objects.
Virtual reality would be another example of a natural user interface.The person interacts with the virtual world through head and body movements, receiving visual feedback through some sort of helmet screen.This is a technology going back some decades, but is becoming more affordable and feasible now. An example of a mass product is the Oculus Rift by company OculusVR (In the news of late for having been acquired by Facebook).
Another example is augmented reality, as in Google Glass. Here, important contextual information is projected within the user’s field of view to give continuously present information.
NUIs can also be combinations of these different types of technology. For example, the combination of speech and body / hand gestures is used in the Microsoft Xbox Kinect sensor. Microsoft, has opened the sensor with free APIs and SDK for developing NUI-enabled software for Windows using the Kinect for Windows sensor. The Kinect is a sensor that was previously sold as an optional peripheral for the Xbox and which is now a bundled part of the new Xbox One gaming and home entertainment console.
This particular device features two cameras for stereo machine vision with depth perception. Software in the device can make out limbs, facial expressions, hand gestures, limb and finger movements, face movements, facial expressions, even the pulse of the user, and use these as inputs for control. Multiple microphones are present for noise cancellation and for recognizing directionality of sound. There is software on board for voice recognition and for facial recognition.The user controls the game by voice inputs and by moving his body and hands.
This represents a more natural way to interact and brings to life some of these models of human-computer interaction forseen by science fiction earlier. It is not hard to forsee possible applications to training with this, especially with APIs of the device open to commercial and research development. The following links and the video below give some sense of what is being done with this sensor tool.
The Xbox One with Kinect is probably the hardest push right now for mass adoption of Natural User Interface technology in the home. There is also an Xbox Kinect for Windows sensor coming out that would allow games and software to be written using this device to control a computer.
Another potential route forward might come in the form of the iPad a few generations down the road if/when Apple can put something similar to Kinect’s sensors today in the iPad. The iPad would make a sophisticated control device for the TV, with the iPad mirroring to the TV screen. So this hypothetical future iPad could watch you through twin cameras, to read your eye movements and facial expressions or detect hand gesture based inputs. The microphone inputs, combined with cloud services, could read speech queries or commands from you. The touch screen would detect button presses, finger or stylus drawing inputs. The accelerometer and gyro would recognize if you’re sitting or standing and in what orientation you’re holding the iPad. You could then hold the iPad in different orientations in space as a control surface or workspace. The problem with the Xbox Kinect sensor is that it watches from farther back. So it can’t pick up yet as much nuance of detail as you could with a closer camera. A camera in the iPad could do that.
I wouldn’t be surprised to see Apple to do this, getting everyone used to this method of interaction, and then hitting with the long-predicted Apple TV, integrating something like the Kinect sensor and a slick multiple layers of Natural User Interfaces built in. Bang and bang. It would have a big impact.
Learning and Training Applications
All of this promises to really shake up how we interact with computers. And since interaction is such a key element of computer based training, this has implications for us as designers of instruction.
There are a number of foreseeable learning and training applications for this sort of technology. To name just a few examples:
Speech recognition and text to speech could be useful for language learning.
Gesture based controls could enable more lifelike interaction with 3D models, especially if using stereoscopic 3D image displays. This could potentially be used for a variety of applications in technical training:
- to manipulate and examine equipment in maintenance training.
- to learn structure of machinery by virtual manipulation of 3d models, including assembly and disassembly. Haptic feedback outputs could even simulate the sensation of touching and working with the actual equipment.
- in biochemistry, to manipulate 3-D models of large molecules like proteins to understand their structure and active sites
- or to visualize biological reaction steps
Virtual reality could be used to simulate the operation of certain complex equipment, including running through rare or emergency scenarios.
For soft skills, imagine the immersiveness of a training program where you interact with a 3d character in a scenario using simply your body language and your speech. The realism is greatly heightened. Or imagine a training program that can give feedback on your body language, verbal tics like filler words, and your facial expressions while you give a simulated presentation or sales pitch.