Sketch Recognition Fall 2007: Assistance

Naturally Conveyed Explanations of Device Behavior - Michael Oltmans, Randall Davis

Assistance combines sketching with verbal descriptions of the sketch. First the user draws a sketch of a physical system. Next, the user switches to description mode and describes how the components interact, both verbally and by sketching symbols. From these descriptions, Assistance develops a model of what each component does, how it will affect other components, and how the system of components operates.

Next, Oltmans provides an example input to describe how Assistance interprets the description. First, a verbal description is divided into clauses. These clauses provide information about a specific object and how it should move. Additionally, this information can be chained to objects that are connected to a describe object. Conditionals linking clauses are inferred to represent causal relationships between objects. Drawn arrows clarify the meaning of the motion described verbally giving them direction withing the sketch. Causal interactions can also be inferred from the use of transitive verbs or from internal models, such a the interaction between a pulley and objects connect to a string looped over the pulley. To demonstrate understanding, Assistance can answer questions about how a component falls into the sequence of action and what it will do, in addition to providing inferred representations of the components.

Oltmans then details the implementation of Assistance. The sketch is preprocessed by Assist to find objects and drawn connections, as well as description arrows. ViaVoice processes and parses the verbal description. These are fed to Assistance which translates them into propositional statements that are fed to a forward-chaining rule system and a truth maintenance system. After Assist finds object and their connections to each other or the fixed plain, Assistance determine how they could potentially move. ViaVoice input is translated to a constrained grammar syntax that understands motions, conditions, and propulsions. Reference gestures such as arrows and points link the verbal descriptions to the objects they describe. To translate utterances into events the voice input is filtered through a parse tree that determines subject, verb, and direct object. Then the referenced objects are determined and the motions and interactions assigned. Additionally, reference gestures must associated with verbal directives. When an ambiguous verbal reference is made ("this"), it must be accompanied by a distinguishing reference gesture, and an ordering of gestures can be maintained for multiple vague references with in a verbal phrase. Next, arrows are translated into events, and multiple event representations are merged. Assistance then finds implicit and explicit causal links from inferring them from events or finding voiced conditionals. Finally, forward chaining and truth maintenance attempts to determine a consistent causal chain the is compatible with the users intent. Events with no definite cause cause a search through plausible causes (explicit versus implicit causal links).

Discussion - While the description of Assistance is straight forward, it lacks the evidence in the form of user studies that would suggest whether user find the combination of sketching and voice useful and more intuitive that other options such as simply sketching. Some users may find the voice descriptions somewhat less useful, if they use arrows to specify direction of motion. However, Assistance appears to be a first step toward a powerful combination of interface techniques.

1 comment:

- D said...: I think almost any multimodal method for input is going to be powerful. Obviously you can do things wrong and really make things worse/harder on yourself. However, two brains are better than one, with a brain in this case being a method of input.; December 12, 2007 at 11:59 PM

Sketch Recognition Fall 2007

Monday, December 10, 2007

Assistance

1 comment:

About Me

The Required Stuff

Blog Archive