Sketch Recognition Fall 2007: Gross, Do

Ambiguous Intentions: a Paper-like Interface for Creative Design - Mark D Gross, Ellen Yi-Luen Do

Gross and Do begin by describe three characteristics used in what they term diagramming. The first is abstraction, which can take the form of a simple, abstract figure representing a more complex one, or a complex figure being recognized as an instance of the simple one. Second is ambiguity, which represents postponed decisions. Rather than resolve all ambiguity, the author suggest that some ambiguity is intentional and a system should not be overly concerned with resolving it. The last is imprecision, allowing the user to approximate without focusing on fine details.

Next, related work is discussed before moving to implementation details of their system called Cocktail Napkin. Users can draw using a variety of simulated tools which may be pressure sensitive. As glyphs are recognized, a text label may be applied to show recognition. Also, strokes are not cleaned by default. Additionally, editing tools are provided and two user collaboration is supported.

The author next detail configurations. Configurations are highly similar to shapes in LADDER. They are composed of simpler shapes along with geometric constraints, and may be represented by an abstract symbol. Configuration recognition triggers after the user pauses for 5 seconds. Constraints are initially generated from an example sketch and modifiable by the user by direct change to the constraints or providing additional examples. Finally, the user may provide an abstract symbol to represent the configuration.

Configurations may have different meanings depending on context. Additionally, basic shapes may have multiple interpretations. Cocktail Napkin maintains these multiple interpretations until additional information is provided to resolve the ambiguity. This information can come in several ways. First is overtracing, in which the user redraws the glyph more definitely and replaces the old. Alternately, if one of the interpretations is part of a larger configuration, the ambiguous glyph is resolved to the correct part of the configuration. Context can be either user specified or implicitly specified when a glyph or configuration unique to some context is found.

As the user draws, constraints are defined for recognized glyphs. As the user edits the drawing, Cocktail Napkin attempts to maintain the constraints between glyphs. Like configurations, constraints can be context dependent, such as connected for graphs or adjacent in a floorplan. By adjusting the constraints the user can incrementally refine the sketch.

Next, the authors move to implementation details. A low level recognizer attempts to classify glyphs. First the glyph's bounding box is determined and assigned to a fuzzy class (tall, wide, etc) then divided into a 3x3 grid. The path sequence around the grid is found, as is the number of corners. Along with total stroke count, these feature are compared to allowable ranges of template glyphs. First, classes are narrowed by sequence. Next if the other features fall within allowable ranges, the glyph is assigned potential classes. If only one is found it is assign certainty 1, 2 if multiple. Next, rotations of the path sequence are examined, and classes and certainties are assigned. Dots and lines take special cases. If no match is found one- and two-off paths are considered and assigned certainty 3 and 4. Alternately, the glyph can be user-specified as an example of a class and the template is updated. Configurations are recognized using LISP functions representing the constraints of a configuration, and the components are replace by the configuration. As specific configurations are found, the system transfers between various contexts with past contexts maintained in a context chain. Each context contains its own set of glyph templates, configurations, spatial constraints, and mappings to other contexts. New glyphs are matched within the current context if possible and moves to other contexts in the chain if necessary.

When presented to users, they first attempt to just draw without using recognition, intent on completing the drawing. Additionally, most users found the recognition echo annoying.

Discussion - The low level grid system would seem to have several flaws. Primarily, rotations of glyphs would require storage of a large number of paths as would local details. For example, consider a triangle. It could be pointed upward or downwards, or its legs form a an L. Each has a different path and all must be stored under the triangle template. Also, the context transitioning is not well detailed.

Sketch Recognition Fall 2007

Monday, December 10, 2007

Gross, Do

No comments:

About Me

The Required Stuff

Blog Archive