Ink Features for Diagram Recognition - Rachel Patel, Beryl Plimmer, John Grundy, Ross Ihaka
Patel et al statistically determine which of the various features used in sketch recognition are the most useful for distinguishing shapes from text. They drew from a possible set of 46 features, including those previously used in sketch recognition and newly created features. First they collected sample sketches from 26 people for nine diagram types, each of which included shapes and text. From these samples, each feature was computed. Next, a complete statistical analysis was conducted to determine the most useful features. Using statistical partitioning, the most useful feature was chosen iteratively to create a decision tree. At each node in the tree, a binary test is performed based on the feature which will best decide on the two classes. For each outcome of the test, if the decision is not "good enough," the next best feature for that branch is chosen and another test is formed. Surprisingly, the tree is only 6 levels deep and employs only 8 features. When compared to the Microsoft Ink and InkKit dividers, the new method outperformed both method in Shape classification, but worse than the Microsoft in text classification (Ink seems to classify sketches leaning strongly towards text) on training data, and better on shapes on test data. On all set the text recognition was close to the recognition rate of the InkKit divider. Seemingly, the new recognizer show a considerable improvement in accuracy of division between text and shapes. The eight features used include: time till next stroke, Speed till next stroke, Distance from last stroke, Distance to next stroke, Bounding box width, Perimeter to area, Amount of ink inside, and Total angle.
Discussion - An overall accuracy rate would be useful in addition to the shape-only and text-only accuracies. The overall accuracy of the new method depends on how heavily the datasets were weighted towards shapes or text. If diagrams are shape heavy, the new recognizer obviously performs better, but if text is predominant, the others would be better. The distribution of text/shapes may be difficult a priori for many domains. Alternately, a combination recognizer perhaps using a voting scheme or boosting could be used to gain the advantages of the highly accurate text classifiers along with the accuracy of the shape recognizers.
Thursday, December 13, 2007
Speech and Sketching
Speech and Sketching: An Empirical Study of Multimodal Interaction - A. Adler and R. Davis
Adler and Davis conduct a series of user studies to determine how a user would interact with a system combining both sketch and speech input that moves towards an interactive, 2-way communication between system and user. Previous studies were often limited to a set of commands, one-way interaction, annotation (no sketching), or a set of fixed sketch symbols. Rather than attempting to set up a complex Wizard of Oz experiment, users and experimenters sat face to face with the experimenter taking the role of an idealized recognition system and responded to the user's verbal questions and modifying drawings as appropriate. Users were asked to sketch and describe drawings from four different domains. The recorded conversations were analyzed qualitatively in five areas: sketching, language, multimodal interaction, questions, and comments. Sketching was divided into groups by stroke and amount of ink. By stroke, 52% were creative, 40% writing, 5% selection, and 3% modification, while by amount of ink, 63% was from creation, 21% writing, 8% selection, and 8% modification. Three colors were used on average and users frequently changed colors. Color was used to refer to previously sketched sections, to create groups, or to reflect the real-life color of a sketched object. They found language was often ungrammatical and repetitive, with the repeated phrases being key to the discussion, and related to what the user was currently sketching. Multimodal interaction often linked the order of speaking and drawing or combined speaking an annotation or label with saying it. Users often completed a sketch that was spoken about before speaking again. When asked questions, the users clarified the sketch and were encouraged to speak more freely. When commenting the user often described what they were planning to do or made comments about the sketch quality. Qualitatively, users were found to begin speaking before drawing if they were speaking a phrase, but to begin drawing before speaking if they were speaking a single word.
Adler and Davis conduct a series of user studies to determine how a user would interact with a system combining both sketch and speech input that moves towards an interactive, 2-way communication between system and user. Previous studies were often limited to a set of commands, one-way interaction, annotation (no sketching), or a set of fixed sketch symbols. Rather than attempting to set up a complex Wizard of Oz experiment, users and experimenters sat face to face with the experimenter taking the role of an idealized recognition system and responded to the user's verbal questions and modifying drawings as appropriate. Users were asked to sketch and describe drawings from four different domains. The recorded conversations were analyzed qualitatively in five areas: sketching, language, multimodal interaction, questions, and comments. Sketching was divided into groups by stroke and amount of ink. By stroke, 52% were creative, 40% writing, 5% selection, and 3% modification, while by amount of ink, 63% was from creation, 21% writing, 8% selection, and 8% modification. Three colors were used on average and users frequently changed colors. Color was used to refer to previously sketched sections, to create groups, or to reflect the real-life color of a sketched object. They found language was often ungrammatical and repetitive, with the repeated phrases being key to the discussion, and related to what the user was currently sketching. Multimodal interaction often linked the order of speaking and drawing or combined speaking an annotation or label with saying it. Users often completed a sketch that was spoken about before speaking again. When asked questions, the users clarified the sketch and were encouraged to speak more freely. When commenting the user often described what they were planning to do or made comments about the sketch quality. Qualitatively, users were found to begin speaking before drawing if they were speaking a phrase, but to begin drawing before speaking if they were speaking a single word.
Wednesday, December 12, 2007
Three main concerns
Three main concerns in sketch recognition and an approach to addressing them - James V. Mahoney and Markus P. J. Fromherz
Mahoney and Fromherz examine the segmentation of strokes into lines and curves useful for recognition. For the classification of strokes into the components of stick figures, segmentation ambiguity (lack of a 1-1 mapping of strokes and possible segments) often occurs due to small gaps between strokes, overlapping of strokes, crossing strokes, or interaction with background strokes. They seek to correct these problems by creating multiple interpretations either joining or dividing strokes so that the newly created strokes. These alternate interpretations can greatly improve the sub-graph matching recognition method used to classify components. Their method uses two stages to determine possible alternate graphs. The first is graph elaboration composed of 5 parts. Proximity linking joins small gaps. Link closure replaces connections or intersections with links. Virtual junction splitting divides a stroke near a non-connected endpoint and inserts links. Spurious segment jumping inserts links around small, possibly erroneous strokes. Lastly, continuity tracing adds a connected interpretation for segments that could connect smoothly.
The second stage attempts to classify strokes. First, strokes are preprocessed and divided at corners, then passed through the first stage, creating an input graph of connected segments and segment interpretations. Next, the input is compared models of sketches describing a connected graph of that class including optional segments. This is done using subgraph matching, that is the input graph is checked for a subgraph whose components match the components of the model graph, as well as matching characteristics of the model components (relative length, etc). Subgraph matching is solve by translating it to a constraint satisfaction problem. The CSP has several constraints not translated from the graph including 1-1 matching of data to model components, minimum total link length, optimal proportions, and maximal part count (as many non-optional parts assigned to data components).
Mahoney and Fromherz examine the segmentation of strokes into lines and curves useful for recognition. For the classification of strokes into the components of stick figures, segmentation ambiguity (lack of a 1-1 mapping of strokes and possible segments) often occurs due to small gaps between strokes, overlapping of strokes, crossing strokes, or interaction with background strokes. They seek to correct these problems by creating multiple interpretations either joining or dividing strokes so that the newly created strokes. These alternate interpretations can greatly improve the sub-graph matching recognition method used to classify components. Their method uses two stages to determine possible alternate graphs. The first is graph elaboration composed of 5 parts. Proximity linking joins small gaps. Link closure replaces connections or intersections with links. Virtual junction splitting divides a stroke near a non-connected endpoint and inserts links. Spurious segment jumping inserts links around small, possibly erroneous strokes. Lastly, continuity tracing adds a connected interpretation for segments that could connect smoothly.
The second stage attempts to classify strokes. First, strokes are preprocessed and divided at corners, then passed through the first stage, creating an input graph of connected segments and segment interpretations. Next, the input is compared models of sketches describing a connected graph of that class including optional segments. This is done using subgraph matching, that is the input graph is checked for a subgraph whose components match the components of the model graph, as well as matching characteristics of the model components (relative length, etc). Subgraph matching is solve by translating it to a constraint satisfaction problem. The CSP has several constraints not translated from the graph including 1-1 matching of data to model components, minimum total link length, optimal proportions, and maximal part count (as many non-optional parts assigned to data components).
Monday, December 10, 2007
Constellation
Constellation Models for Sketch Recognition - D. Sharon and M. van de Panne
Sharon and van de Panne use a constellation model to recognize groups of shapes. Individual objects are represented by a feature vector consisting of the x and y coordinates of the center of the bounding box and the length and angle of the diagonal of the bounding box. Pairs of parts are represented by a feature vector consisting of the relative change in x and y coordinates, the minimum distance between an end point of the first object to any point in the second, and the minimum distance between an end point of the second object to any point in the first. However, since this model does not scale well, parts are subdivided into two groups, mandatory and optional. Individual features are computed on all parts, while pair features are computed only if one of the pair is a mandatory part. Recognition is performed in two stages. First mandatory parts are found, then unlabeled strokes are fit to optional parts. Models are learned using probabilistic methods. From labeled examples, the mean and covariance are determined for the feature vectors of each modeled class. Strokes are labeled as various mandatory and optional parts by determining the maximum likelihood labeling. Using branch-and-bound, the algorithm assigns mandatory labels in sequence to the stroke that is most likely of that label. After training on 50-60 examples for each class, recognition is tested on several examples. Recognition generally runs in a relatively short time period.
Discussion - Recognition accuracy is surprisingly missing, while the results focus on the time taken during successful recognitions. It is also unclear whether all five models are considered for assignment of labels to a new sketch or if only the model corresponding to the class of sketch is used. If the second is the case, users must identify what kind of sketch has been drawn before the system can label the parts, and user labeling of the components doesn't seem to be too strenuous of an additional task.
Sharon and van de Panne use a constellation model to recognize groups of shapes. Individual objects are represented by a feature vector consisting of the x and y coordinates of the center of the bounding box and the length and angle of the diagonal of the bounding box. Pairs of parts are represented by a feature vector consisting of the relative change in x and y coordinates, the minimum distance between an end point of the first object to any point in the second, and the minimum distance between an end point of the second object to any point in the first. However, since this model does not scale well, parts are subdivided into two groups, mandatory and optional. Individual features are computed on all parts, while pair features are computed only if one of the pair is a mandatory part. Recognition is performed in two stages. First mandatory parts are found, then unlabeled strokes are fit to optional parts. Models are learned using probabilistic methods. From labeled examples, the mean and covariance are determined for the feature vectors of each modeled class. Strokes are labeled as various mandatory and optional parts by determining the maximum likelihood labeling. Using branch-and-bound, the algorithm assigns mandatory labels in sequence to the stroke that is most likely of that label. After training on 50-60 examples for each class, recognition is tested on several examples. Recognition generally runs in a relatively short time period.
Discussion - Recognition accuracy is surprisingly missing, while the results focus on the time taken during successful recognitions. It is also unclear whether all five models are considered for assignment of labels to a new sketch or if only the model corresponding to the class of sketch is used. If the second is the case, users must identify what kind of sketch has been drawn before the system can label the parts, and user labeling of the components doesn't seem to be too strenuous of an additional task.
Perception
Perceptually Based Learning of Shape Descriptions for Sketch Recognition - Olya Veselova and Randall Davis
Veselova seeks to learn to recognize sketches based on ideas derived from human perception. After low level recognition, higher level shapes are constructed based on constraints. These constraints are based on how people perceive similar shapes. People base similarity on several "singularities" determined by previously by Goldmeier. They include verticality, horizontality, straightness, parallelism, and others. These properties are translated into constraints. In addition, humans place a greater emphasis on some of these singularities. Thus the constraints are ranked according to priority. Priorities are adjusted depending on bases on three global properties: obstruction, tension lines, and grouping. Obstruction depends on the amount of stuff between two constrained objects; the more stuff the lower the constraint is ranked. Tension lines form a "balance" in the shape and is oriented along a grid system. Grouping is determined in two ways: connectedness and subshapes (similar to shapes making more complex shapes in LADDER). Finally, when compared to user determined similarity, the system achieved a 77% compatibility with the majority vote of similarity. The system performed even better on shapes that users were in high agreement.
Discussion: As noted in the final section of the paper the constraints are grouped into broad classes leaving a good deal of wiggle room for similarity. Also, the authors note that the system lacks global symmetry constraints, a key singularity described earlier in the paper. Such a constraint would seem to be one the more important constraints added. Also lacking is a negation constraint; however, this is a problem in many constraint systems.
Veselova seeks to learn to recognize sketches based on ideas derived from human perception. After low level recognition, higher level shapes are constructed based on constraints. These constraints are based on how people perceive similar shapes. People base similarity on several "singularities" determined by previously by Goldmeier. They include verticality, horizontality, straightness, parallelism, and others. These properties are translated into constraints. In addition, humans place a greater emphasis on some of these singularities. Thus the constraints are ranked according to priority. Priorities are adjusted depending on bases on three global properties: obstruction, tension lines, and grouping. Obstruction depends on the amount of stuff between two constrained objects; the more stuff the lower the constraint is ranked. Tension lines form a "balance" in the shape and is oriented along a grid system. Grouping is determined in two ways: connectedness and subshapes (similar to shapes making more complex shapes in LADDER). Finally, when compared to user determined similarity, the system achieved a 77% compatibility with the majority vote of similarity. The system performed even better on shapes that users were in high agreement.
Discussion: As noted in the final section of the paper the constraints are grouped into broad classes leaving a good deal of wiggle room for similarity. Also, the authors note that the system lacks global symmetry constraints, a key singularity described earlier in the paper. Such a constraint would seem to be one the more important constraints added. Also lacking is a negation constraint; however, this is a problem in many constraint systems.
Assistance
Naturally Conveyed Explanations of Device Behavior - Michael Oltmans, Randall Davis
Assistance combines sketching with verbal descriptions of the sketch. First the user draws a sketch of a physical system. Next, the user switches to description mode and describes how the components interact, both verbally and by sketching symbols. From these descriptions, Assistance develops a model of what each component does, how it will affect other components, and how the system of components operates.
Next, Oltmans provides an example input to describe how Assistance interprets the description. First, a verbal description is divided into clauses. These clauses provide information about a specific object and how it should move. Additionally, this information can be chained to objects that are connected to a describe object. Conditionals linking clauses are inferred to represent causal relationships between objects. Drawn arrows clarify the meaning of the motion described verbally giving them direction withing the sketch. Causal interactions can also be inferred from the use of transitive verbs or from internal models, such a the interaction between a pulley and objects connect to a string looped over the pulley. To demonstrate understanding, Assistance can answer questions about how a component falls into the sequence of action and what it will do, in addition to providing inferred representations of the components.
Oltmans then details the implementation of Assistance. The sketch is preprocessed by Assist to find objects and drawn connections, as well as description arrows. ViaVoice processes and parses the verbal description. These are fed to Assistance which translates them into propositional statements that are fed to a forward-chaining rule system and a truth maintenance system. After Assist finds object and their connections to each other or the fixed plain, Assistance determine how they could potentially move. ViaVoice input is translated to a constrained grammar syntax that understands motions, conditions, and propulsions. Reference gestures such as arrows and points link the verbal descriptions to the objects they describe. To translate utterances into events the voice input is filtered through a parse tree that determines subject, verb, and direct object. Then the referenced objects are determined and the motions and interactions assigned. Additionally, reference gestures must associated with verbal directives. When an ambiguous verbal reference is made ("this"), it must be accompanied by a distinguishing reference gesture, and an ordering of gestures can be maintained for multiple vague references with in a verbal phrase. Next, arrows are translated into events, and multiple event representations are merged. Assistance then finds implicit and explicit causal links from inferring them from events or finding voiced conditionals. Finally, forward chaining and truth maintenance attempts to determine a consistent causal chain the is compatible with the users intent. Events with no definite cause cause a search through plausible causes (explicit versus implicit causal links).
Discussion - While the description of Assistance is straight forward, it lacks the evidence in the form of user studies that would suggest whether user find the combination of sketching and voice useful and more intuitive that other options such as simply sketching. Some users may find the voice descriptions somewhat less useful, if they use arrows to specify direction of motion. However, Assistance appears to be a first step toward a powerful combination of interface techniques.
Assistance combines sketching with verbal descriptions of the sketch. First the user draws a sketch of a physical system. Next, the user switches to description mode and describes how the components interact, both verbally and by sketching symbols. From these descriptions, Assistance develops a model of what each component does, how it will affect other components, and how the system of components operates.
Next, Oltmans provides an example input to describe how Assistance interprets the description. First, a verbal description is divided into clauses. These clauses provide information about a specific object and how it should move. Additionally, this information can be chained to objects that are connected to a describe object. Conditionals linking clauses are inferred to represent causal relationships between objects. Drawn arrows clarify the meaning of the motion described verbally giving them direction withing the sketch. Causal interactions can also be inferred from the use of transitive verbs or from internal models, such a the interaction between a pulley and objects connect to a string looped over the pulley. To demonstrate understanding, Assistance can answer questions about how a component falls into the sequence of action and what it will do, in addition to providing inferred representations of the components.
Oltmans then details the implementation of Assistance. The sketch is preprocessed by Assist to find objects and drawn connections, as well as description arrows. ViaVoice processes and parses the verbal description. These are fed to Assistance which translates them into propositional statements that are fed to a forward-chaining rule system and a truth maintenance system. After Assist finds object and their connections to each other or the fixed plain, Assistance determine how they could potentially move. ViaVoice input is translated to a constrained grammar syntax that understands motions, conditions, and propulsions. Reference gestures such as arrows and points link the verbal descriptions to the objects they describe. To translate utterances into events the voice input is filtered through a parse tree that determines subject, verb, and direct object. Then the referenced objects are determined and the motions and interactions assigned. Additionally, reference gestures must associated with verbal directives. When an ambiguous verbal reference is made ("this"), it must be accompanied by a distinguishing reference gesture, and an ordering of gestures can be maintained for multiple vague references with in a verbal phrase. Next, arrows are translated into events, and multiple event representations are merged. Assistance then finds implicit and explicit causal links from inferring them from events or finding voiced conditionals. Finally, forward chaining and truth maintenance attempts to determine a consistent causal chain the is compatible with the users intent. Events with no definite cause cause a search through plausible causes (explicit versus implicit causal links).
Discussion - While the description of Assistance is straight forward, it lacks the evidence in the form of user studies that would suggest whether user find the combination of sketching and voice useful and more intuitive that other options such as simply sketching. Some users may find the voice descriptions somewhat less useful, if they use arrows to specify direction of motion. However, Assistance appears to be a first step toward a powerful combination of interface techniques.
Gross, Do
Ambiguous Intentions: a Paper-like Interface for Creative Design - Mark D Gross, Ellen Yi-Luen Do
Gross and Do begin by describe three characteristics used in what they term diagramming. The first is abstraction, which can take the form of a simple, abstract figure representing a more complex one, or a complex figure being recognized as an instance of the simple one. Second is ambiguity, which represents postponed decisions. Rather than resolve all ambiguity, the author suggest that some ambiguity is intentional and a system should not be overly concerned with resolving it. The last is imprecision, allowing the user to approximate without focusing on fine details.
Next, related work is discussed before moving to implementation details of their system called Cocktail Napkin. Users can draw using a variety of simulated tools which may be pressure sensitive. As glyphs are recognized, a text label may be applied to show recognition. Also, strokes are not cleaned by default. Additionally, editing tools are provided and two user collaboration is supported.
The author next detail configurations. Configurations are highly similar to shapes in LADDER. They are composed of simpler shapes along with geometric constraints, and may be represented by an abstract symbol. Configuration recognition triggers after the user pauses for 5 seconds. Constraints are initially generated from an example sketch and modifiable by the user by direct change to the constraints or providing additional examples. Finally, the user may provide an abstract symbol to represent the configuration.
Configurations may have different meanings depending on context. Additionally, basic shapes may have multiple interpretations. Cocktail Napkin maintains these multiple interpretations until additional information is provided to resolve the ambiguity. This information can come in several ways. First is overtracing, in which the user redraws the glyph more definitely and replaces the old. Alternately, if one of the interpretations is part of a larger configuration, the ambiguous glyph is resolved to the correct part of the configuration. Context can be either user specified or implicitly specified when a glyph or configuration unique to some context is found.
As the user draws, constraints are defined for recognized glyphs. As the user edits the drawing, Cocktail Napkin attempts to maintain the constraints between glyphs. Like configurations, constraints can be context dependent, such as connected for graphs or adjacent in a floorplan. By adjusting the constraints the user can incrementally refine the sketch.
Next, the authors move to implementation details. A low level recognizer attempts to classify glyphs. First the glyph's bounding box is determined and assigned to a fuzzy class (tall, wide, etc) then divided into a 3x3 grid. The path sequence around the grid is found, as is the number of corners. Along with total stroke count, these feature are compared to allowable ranges of template glyphs. First, classes are narrowed by sequence. Next if the other features fall within allowable ranges, the glyph is assigned potential classes. If only one is found it is assign certainty 1, 2 if multiple. Next, rotations of the path sequence are examined, and classes and certainties are assigned. Dots and lines take special cases. If no match is found one- and two-off paths are considered and assigned certainty 3 and 4. Alternately, the glyph can be user-specified as an example of a class and the template is updated. Configurations are recognized using LISP functions representing the constraints of a configuration, and the components are replace by the configuration. As specific configurations are found, the system transfers between various contexts with past contexts maintained in a context chain. Each context contains its own set of glyph templates, configurations, spatial constraints, and mappings to other contexts. New glyphs are matched within the current context if possible and moves to other contexts in the chain if necessary.
When presented to users, they first attempt to just draw without using recognition, intent on completing the drawing. Additionally, most users found the recognition echo annoying.
Discussion - The low level grid system would seem to have several flaws. Primarily, rotations of glyphs would require storage of a large number of paths as would local details. For example, consider a triangle. It could be pointed upward or downwards, or its legs form a an L. Each has a different path and all must be stored under the triangle template. Also, the context transitioning is not well detailed.
Gross and Do begin by describe three characteristics used in what they term diagramming. The first is abstraction, which can take the form of a simple, abstract figure representing a more complex one, or a complex figure being recognized as an instance of the simple one. Second is ambiguity, which represents postponed decisions. Rather than resolve all ambiguity, the author suggest that some ambiguity is intentional and a system should not be overly concerned with resolving it. The last is imprecision, allowing the user to approximate without focusing on fine details.
Next, related work is discussed before moving to implementation details of their system called Cocktail Napkin. Users can draw using a variety of simulated tools which may be pressure sensitive. As glyphs are recognized, a text label may be applied to show recognition. Also, strokes are not cleaned by default. Additionally, editing tools are provided and two user collaboration is supported.
The author next detail configurations. Configurations are highly similar to shapes in LADDER. They are composed of simpler shapes along with geometric constraints, and may be represented by an abstract symbol. Configuration recognition triggers after the user pauses for 5 seconds. Constraints are initially generated from an example sketch and modifiable by the user by direct change to the constraints or providing additional examples. Finally, the user may provide an abstract symbol to represent the configuration.
Configurations may have different meanings depending on context. Additionally, basic shapes may have multiple interpretations. Cocktail Napkin maintains these multiple interpretations until additional information is provided to resolve the ambiguity. This information can come in several ways. First is overtracing, in which the user redraws the glyph more definitely and replaces the old. Alternately, if one of the interpretations is part of a larger configuration, the ambiguous glyph is resolved to the correct part of the configuration. Context can be either user specified or implicitly specified when a glyph or configuration unique to some context is found.
As the user draws, constraints are defined for recognized glyphs. As the user edits the drawing, Cocktail Napkin attempts to maintain the constraints between glyphs. Like configurations, constraints can be context dependent, such as connected for graphs or adjacent in a floorplan. By adjusting the constraints the user can incrementally refine the sketch.
Next, the authors move to implementation details. A low level recognizer attempts to classify glyphs. First the glyph's bounding box is determined and assigned to a fuzzy class (tall, wide, etc) then divided into a 3x3 grid. The path sequence around the grid is found, as is the number of corners. Along with total stroke count, these feature are compared to allowable ranges of template glyphs. First, classes are narrowed by sequence. Next if the other features fall within allowable ranges, the glyph is assigned potential classes. If only one is found it is assign certainty 1, 2 if multiple. Next, rotations of the path sequence are examined, and classes and certainties are assigned. Dots and lines take special cases. If no match is found one- and two-off paths are considered and assigned certainty 3 and 4. Alternately, the glyph can be user-specified as an example of a class and the template is updated. Configurations are recognized using LISP functions representing the constraints of a configuration, and the components are replace by the configuration. As specific configurations are found, the system transfers between various contexts with past contexts maintained in a context chain. Each context contains its own set of glyph templates, configurations, spatial constraints, and mappings to other contexts. New glyphs are matched within the current context if possible and moves to other contexts in the chain if necessary.
When presented to users, they first attempt to just draw without using recognition, intent on completing the drawing. Additionally, most users found the recognition echo annoying.
Discussion - The low level grid system would seem to have several flaws. Primarily, rotations of glyphs would require storage of a large number of paths as would local details. For example, consider a triangle. It could be pointed upward or downwards, or its legs form a an L. Each has a different path and all must be stored under the triangle template. Also, the context transitioning is not well detailed.
Sunday, December 9, 2007
Herot
GRAPHICAL INPUT THROUGH MACHINE RECOGNITION OF SKETCHES - Christopher F. Herot
Herot describes his HUNCH system, an early sketching system with various levels of interpretations that the user can manipulate or convert between. He begins by describing the hardware and how the user inputs to the system. User sketch on paper covering an input tablet. This input is fed into the program STRAIT, a precursor of HUNCH. Seemingly an early Sezgin, STRAIT determines corners based on speed, with slower speed point determined as line end points. Next, HUNCH uses the CURVIT program to fit curves in the sketch to B-splines. At gradual curvature changes or a "careful" speed, STRAIT flags the point for fitting by CURVIT.
Next, Herot describes various programs used to infer the meaning of a sketch. First is an early version of STRAIT that uses latching to which joined together nearby point, for example reducing the number of endpoint for a sketched square from 5 to 4. Due to thresholding problems, STRAIT was converted to STRAIN which used a speed determined latching mechanism. However, this method did not produce ideal results either. Similar problems occurred facing overtracing of lines. Additionally, HUNCH possess limited inferencing mechanism that can produce 3D representations from 2D perspective sketches. Finally, a "room finding" program can produce a graph representation of a floorplan.
Herot next discusses how context can be used to help in the inference process. By defining the context, a sketcher can supply missing information needed to resolve ambiguities and guide low level recognition. The user defines a network of case descriptions that are matched to the context free interpretations made by the lower levels. The context is matched top-down. Partial matches may cause the low level recognizers to be re-invoked to search for missing components. However, even narrowed by context, the search space is large, and creating an adequate matching system would require solutions to several hard AI problems.
Therefore, Herot proposes a user-interactive system, prompting the user to resolve ambiguities that the system cannot. A database of interconnected interpretations is created and updated as new levels of interpretation are determined. Thus the overall system consists of interpretation, display, and manipulation programs which determine potential interpretations, display them, and allow the user to determine the correct interpretation or what modifications will correct the interpretations directly.
Moving to a more detailed discussion of the line and corner finder, Herot next discusses how lines and curves can be determined on the fly. Speed and "bentness" are determined over intervals of various sizes depending on the desire for local accuracy or smoothing. This system is tunable to the user, either through user modification of the interval or automated increase/decrease of the interval as the user add/deletes points. Herot's new system automatically calls the various components that HUTCH uses only when prompted by the user and reruns those components as their input is changed by user modification of the database.
Lastly, Herot poses the latching problem as a key area for future research and presents several criteria for latching.
Discussion - The use of paper over the input tablet seems unusual at first. It would appear to be an extraneous paper record of the user's sketches. However, these could be useful as notes of the user's thought process and intentions, helpful to another to determine how the sketches were generated or as a memory aid for the user. It's also interesting to note how much earlier this paper is compared to Sezgin, who reintroduced the ideas of speed and curvature at a later date as a novel idea. However, these metrics are somewhat buried in the paper and are far from the main focus of the paper which seems more directed as the interactivity of the system and the use of context for resolving sketch ambiguity.
Herot describes his HUNCH system, an early sketching system with various levels of interpretations that the user can manipulate or convert between. He begins by describing the hardware and how the user inputs to the system. User sketch on paper covering an input tablet. This input is fed into the program STRAIT, a precursor of HUNCH. Seemingly an early Sezgin, STRAIT determines corners based on speed, with slower speed point determined as line end points. Next, HUNCH uses the CURVIT program to fit curves in the sketch to B-splines. At gradual curvature changes or a "careful" speed, STRAIT flags the point for fitting by CURVIT.
Next, Herot describes various programs used to infer the meaning of a sketch. First is an early version of STRAIT that uses latching to which joined together nearby point, for example reducing the number of endpoint for a sketched square from 5 to 4. Due to thresholding problems, STRAIT was converted to STRAIN which used a speed determined latching mechanism. However, this method did not produce ideal results either. Similar problems occurred facing overtracing of lines. Additionally, HUNCH possess limited inferencing mechanism that can produce 3D representations from 2D perspective sketches. Finally, a "room finding" program can produce a graph representation of a floorplan.
Herot next discusses how context can be used to help in the inference process. By defining the context, a sketcher can supply missing information needed to resolve ambiguities and guide low level recognition. The user defines a network of case descriptions that are matched to the context free interpretations made by the lower levels. The context is matched top-down. Partial matches may cause the low level recognizers to be re-invoked to search for missing components. However, even narrowed by context, the search space is large, and creating an adequate matching system would require solutions to several hard AI problems.
Therefore, Herot proposes a user-interactive system, prompting the user to resolve ambiguities that the system cannot. A database of interconnected interpretations is created and updated as new levels of interpretation are determined. Thus the overall system consists of interpretation, display, and manipulation programs which determine potential interpretations, display them, and allow the user to determine the correct interpretation or what modifications will correct the interpretations directly.
Moving to a more detailed discussion of the line and corner finder, Herot next discusses how lines and curves can be determined on the fly. Speed and "bentness" are determined over intervals of various sizes depending on the desire for local accuracy or smoothing. This system is tunable to the user, either through user modification of the interval or automated increase/decrease of the interval as the user add/deletes points. Herot's new system automatically calls the various components that HUTCH uses only when prompted by the user and reruns those components as their input is changed by user modification of the database.
Lastly, Herot poses the latching problem as a key area for future research and presents several criteria for latching.
Discussion - The use of paper over the input tablet seems unusual at first. It would appear to be an extraneous paper record of the user's sketches. However, these could be useful as notes of the user's thought process and intentions, helpful to another to determine how the sketches were generated or as a memory aid for the user. It's also interesting to note how much earlier this paper is compared to Sezgin, who reintroduced the ideas of speed and curvature at a later date as a novel idea. However, these metrics are somewhat buried in the paper and are far from the main focus of the paper which seems more directed as the interactivity of the system and the use of context for resolving sketch ambiguity.
Tuesday, October 9, 2007
LADDER, part 2
Interactive Learning of Structural Shape Descriptions from Automatically Generated Near-miss Examples - Hammond and Davis
Summary
The authors present an extension of the previous version of LADDER, focusing on the refinement of constraints to ensure the shapes described are not overly specific or too general. Two types of constraint errors are considered, inclusion of an extraneous constraint and omission of a required constraint. More complex errors, such as substitution, are treated as a combination of both basic error types. Shape descriptions can be generated in two ways. The first is user description, in which the user inputs directly the constraints for a given shape. LADDER provides online debugging tools through auto-completion and marking syntax errors in red. Next, a sketch corresponding to the shape must be drawn, and if the sketch does not meet the constraints, it is assigned to the shape for which it fails the least constraints. Alternately, the constraints may be automatically generated by drawing the shape and allowing LADDER to generate all constraints seen in the sketch and heuristically pruning the constraints. Next, the shape description is examined for over-constraint. For each constraint a near miss is generated in which that constraint is false while all others hold. If the user believes the near miss represents the same shape concept, the constraint is overly specific and is deleted. Next, under-constraint is examined. Additional potential constraints are generated similarly to the generation of all possible constraints for automatic description generation, and are filtered to omit those that are directly derivable from the current set of constraints. To test if a constraint should be added, its negation is added to the constraint list and a sketch is generated. If the sketch is representative of the shape concept, the new constraint is not added as it would over constrain the description, while if the sketch is rejected the constraint is added as descriptive of the concept. For both methods, example sketches are generated by converting the set of constraints to a set of equations (along with some general rules about shapes) and the set of equations is minimized. In this equation, required constraints and optimal constraints are distinguished by weighting. Required constraints are those that must be true to be considered representative of a shape concept, while optimal are not necessarily true. If the minimization process fails or reports a high error, the shape is considered impossible and eliminated.
Summary
The authors present an extension of the previous version of LADDER, focusing on the refinement of constraints to ensure the shapes described are not overly specific or too general. Two types of constraint errors are considered, inclusion of an extraneous constraint and omission of a required constraint. More complex errors, such as substitution, are treated as a combination of both basic error types. Shape descriptions can be generated in two ways. The first is user description, in which the user inputs directly the constraints for a given shape. LADDER provides online debugging tools through auto-completion and marking syntax errors in red. Next, a sketch corresponding to the shape must be drawn, and if the sketch does not meet the constraints, it is assigned to the shape for which it fails the least constraints. Alternately, the constraints may be automatically generated by drawing the shape and allowing LADDER to generate all constraints seen in the sketch and heuristically pruning the constraints. Next, the shape description is examined for over-constraint. For each constraint a near miss is generated in which that constraint is false while all others hold. If the user believes the near miss represents the same shape concept, the constraint is overly specific and is deleted. Next, under-constraint is examined. Additional potential constraints are generated similarly to the generation of all possible constraints for automatic description generation, and are filtered to omit those that are directly derivable from the current set of constraints. To test if a constraint should be added, its negation is added to the constraint list and a sketch is generated. If the sketch is representative of the shape concept, the new constraint is not added as it would over constrain the description, while if the sketch is rejected the constraint is added as descriptive of the concept. For both methods, example sketches are generated by converting the set of constraints to a set of equations (along with some general rules about shapes) and the set of equations is minimized. In this equation, required constraints and optimal constraints are distinguished by weighting. Required constraints are those that must be true to be considered representative of a shape concept, while optimal are not necessarily true. If the minimization process fails or reports a high error, the shape is considered impossible and eliminated.
Monday, October 1, 2007
Ladder
LADDER, a sketching language for user interface developers - Hammond and Davis
Summary
LADDER is a system that allows the specification of domain specific sketch recognition systems. Users begin by defining basic shapes. Each shape consists of a set of components, geometrics constraints on those components, component aliases, editing methods, and how the components are drawn. Shapes are defined hierarchically, beginning at the lowest level with predefined shapes such as line, circle, etc. Much like object oriented languages, abstract shapes can be defined forcing shapes that inherit from them to contain certain components or methods. Shape groups can also be defined, allowing for changes to one shape in the group to be reflected in all shapes in the group. The system provides not only a set of primitive shapes but also defines two sets of constraints. The first, consisting of constraints such as parallel, meet, etc., is rotation invariant, while the second (vertical, horizontal, etc.) is not. Several predefined editing methods and display methods, such as delete, rotate, or translate and original or cleaned strokes, are also defined. Users may also define a shape to contain a vector of subshapes, should a large or unknown number of these type of shapes be needed. Lowest level, predefined shapes are recognized in a domain independent manner and are added as facts to a Jess-based rule system whose rule are translated from the shape definition. As low level shapes are added as facts, the Jess system attempts to combine the shape with all others to generate a higher level shape. Once a high level shape has been fixed, its components are removed from the list of facts. If idealized representation is desired, the set of constraints associated with the shapes is solved via Mathematica.
Summary
LADDER is a system that allows the specification of domain specific sketch recognition systems. Users begin by defining basic shapes. Each shape consists of a set of components, geometrics constraints on those components, component aliases, editing methods, and how the components are drawn. Shapes are defined hierarchically, beginning at the lowest level with predefined shapes such as line, circle, etc. Much like object oriented languages, abstract shapes can be defined forcing shapes that inherit from them to contain certain components or methods. Shape groups can also be defined, allowing for changes to one shape in the group to be reflected in all shapes in the group. The system provides not only a set of primitive shapes but also defines two sets of constraints. The first, consisting of constraints such as parallel, meet, etc., is rotation invariant, while the second (vertical, horizontal, etc.) is not. Several predefined editing methods and display methods, such as delete, rotate, or translate and original or cleaned strokes, are also defined. Users may also define a shape to contain a vector of subshapes, should a large or unknown number of these type of shapes be needed. Lowest level, predefined shapes are recognized in a domain independent manner and are added as facts to a Jess-based rule system whose rule are translated from the shape definition. As low level shapes are added as facts, the Jess system attempts to combine the shape with all others to generate a higher level shape. Once a high level shape has been fixed, its components are removed from the list of facts. If idealized representation is desired, the set of constraints associated with the shapes is solved via Mathematica.
Paulson
Recognizing and Beautifying Low-level Sketch Shapes with Two New Features and Ranking Algorithm - Paulson and Hammond
Summary
Paulson has developed a new low level shape recognizer supplemented by two novel features. The first is normalized distance between direction extremes (NDDE), the distance between the point with the highest direction and the lowest direction divided by total stroke length. Next is direction change ratio (DCR), which is the maximum change in direction divided by the average change in direction. Both are used to determine the spikiness of the direction graph. Both, along with direction, curvature, and speed graphs and corners, are computed during the "pre-recognition" phase of the recognizer. Next, a series of tests determine if the sketch can be classified as one of several primitives. These are Line, Poly-line, Circle, Arc, Curve, Ellipse, Spiral, and Helix. Additionally, a stroke may be classified as a Complex figure. If classified as Complex, the stroke is subdivided into subsegments until each subsegment can be classified as under one of the low level shapes. Complex, Poly-line, and Curve classes are further distinguished by a ranking system. Each subsegment is assigned a weight associated with the complexity of the assigned shape. The weights are summed over the stroke and the interpretation with the lowest score wins. Next low level shapes are hierarchically added to an interpretation.
Discussion
Summary
Paulson has developed a new low level shape recognizer supplemented by two novel features. The first is normalized distance between direction extremes (NDDE), the distance between the point with the highest direction and the lowest direction divided by total stroke length. Next is direction change ratio (DCR), which is the maximum change in direction divided by the average change in direction. Both are used to determine the spikiness of the direction graph. Both, along with direction, curvature, and speed graphs and corners, are computed during the "pre-recognition" phase of the recognizer. Next, a series of tests determine if the sketch can be classified as one of several primitives. These are Line, Poly-line, Circle, Arc, Curve, Ellipse, Spiral, and Helix. Additionally, a stroke may be classified as a Complex figure. If classified as Complex, the stroke is subdivided into subsegments until each subsegment can be classified as under one of the low level shapes. Complex, Poly-line, and Curve classes are further distinguished by a ranking system. Each subsegment is assigned a weight associated with the complexity of the assigned shape. The weights are summed over the stroke and the interpretation with the lowest score wins. Next low level shapes are hierarchically added to an interpretation.
Discussion
Tuesday, September 25, 2007
Online line simplification
Abam, de Berg, and Hachenberger. Streaming Algorithms for Line Simplification
The authors present a greedy algorithm that allows for the simplification of a polyline path to a simpler representative path on the fly, or as points along the path are being generated. The reasoning behind path simplification is a largely matter of storage space (not being able to store as many points as could be generated over long -- possibly infinite -- periods of time) rather then forming an idealized (or intentional) shape as usually seen in the sketch recognition setting. After discussing background information and previous approaches, they begin with the discussion of how their algorithm works. To begin, while the number of points does not exceed storage capacity, all points are stored. However, once the limit is exceeded their algorithm comes into play. Each point is stored in a priority queue with an associated error as the priority. The error assigned to a point is the error that would be generated by removing that point from the path (reducing a path p(i-1)p(i)p(i+1) to p(i-1)p(i+)). This error is measured in two way, Hausdorff error and Fréchet error, whose implementations are detailed towards the end of the paper. The errors are applicable to different types of paths, namely Hausdorff is only applicable to convex and xy-monotone paths. For arbitrary paths and the Fréchet error, they achieve a 4*sqrt(2)-competitive error with respect to an optimal offline algorithm (or at most 4*sqrt(2) time the optimal error).
This would be a novel concept to apply to the corner finding problem in sketch recognition, except for a problem, namely, in this method, the number of points (and therefore corners) is fixed a priori. One solution to this problem would be to use the set of points produced by this method as a set of candidate points similar to curvature points and speed points in Sezgin. This still runs into other problems, however. By limiting the number of points, we also limit the complexity of the figure, for example, choosing to remember only four points eliminates any figure with more than 3 segments. While extreme, this problem scales up, arbitrarily limiting user input. Though it seems unlikely that a user will require to be able to input drawings with 100 segments at a time, the possibility for such a sketch still exists. Additionally, if this set of points is used as candidate points in Sezgin, a quality metric must be assigned, possibly based on the Hausdorff and Fréchet errors, which could be feasible given the efficiency the authors assign to the error oracles.
The authors present a greedy algorithm that allows for the simplification of a polyline path to a simpler representative path on the fly, or as points along the path are being generated. The reasoning behind path simplification is a largely matter of storage space (not being able to store as many points as could be generated over long -- possibly infinite -- periods of time) rather then forming an idealized (or intentional) shape as usually seen in the sketch recognition setting. After discussing background information and previous approaches, they begin with the discussion of how their algorithm works. To begin, while the number of points does not exceed storage capacity, all points are stored. However, once the limit is exceeded their algorithm comes into play. Each point is stored in a priority queue with an associated error as the priority. The error assigned to a point is the error that would be generated by removing that point from the path (reducing a path p(i-1)p(i)p(i+1) to p(i-1)p(i+)). This error is measured in two way, Hausdorff error and Fréchet error, whose implementations are detailed towards the end of the paper. The errors are applicable to different types of paths, namely Hausdorff is only applicable to convex and xy-monotone paths. For arbitrary paths and the Fréchet error, they achieve a 4*sqrt(2)-competitive error with respect to an optimal offline algorithm (or at most 4*sqrt(2) time the optimal error).
This would be a novel concept to apply to the corner finding problem in sketch recognition, except for a problem, namely, in this method, the number of points (and therefore corners) is fixed a priori. One solution to this problem would be to use the set of points produced by this method as a set of candidate points similar to curvature points and speed points in Sezgin. This still runs into other problems, however. By limiting the number of points, we also limit the complexity of the figure, for example, choosing to remember only four points eliminates any figure with more than 3 segments. While extreme, this problem scales up, arbitrarily limiting user input. Though it seems unlikely that a user will require to be able to input drawings with 100 segments at a time, the possibility for such a sketch still exists. Additionally, if this set of points is used as candidate points in Sezgin, a quality metric must be assigned, possibly based on the Hausdorff and Fréchet errors, which could be feasible given the efficiency the authors assign to the error oracles.
Thursday, September 20, 2007
Curvature segmentation
Dae Hyun Kim and Myoung-Jun Kim, A curvature estimation for pen input segmentation in sketch-based modeling
Kim & Kim's paper takes somewhat of a middle ground between Sezgin and Yu. While Sezgin's paper relied equally on speed and curvature to detect the end of stroke subsegments, and Yu relied only on curvature, the Kims only uses speed at specific points to help aid the choice of curvature points. They also present two new methods for computing curvature. The first incorporates curvature of adjacent points to not only capture how curvy the line is a specific point, but also in the surrounding regions. To do this they simply sum up the curvature values over a window centered at the point in question if the values have the same sign as the curvature at that point. The second method imposes a monotonicity constraint on the window as well, considering only curvatures of the same sign with a smaller magnitude.
The speed-adaptive threshold is an interesting compromise between Sezgin and Yu, seemingly making it more likely that a curvature corner will be found at a lower speed. Broadening the window for which we examine curvature does seem improve the recognition of corners, but at the same time it may smooth out intentional, small changes in the stroke.
Kim & Kim's paper takes somewhat of a middle ground between Sezgin and Yu. While Sezgin's paper relied equally on speed and curvature to detect the end of stroke subsegments, and Yu relied only on curvature, the Kims only uses speed at specific points to help aid the choice of curvature points. They also present two new methods for computing curvature. The first incorporates curvature of adjacent points to not only capture how curvy the line is a specific point, but also in the surrounding regions. To do this they simply sum up the curvature values over a window centered at the point in question if the values have the same sign as the curvature at that point. The second method imposes a monotonicity constraint on the window as well, considering only curvatures of the same sign with a smaller magnitude.
The speed-adaptive threshold is an interesting compromise between Sezgin and Yu, seemingly making it more likely that a curvature corner will be found at a lower speed. Broadening the window for which we examine curvature does seem improve the recognition of corners, but at the same time it may smooth out intentional, small changes in the stroke.
Tuesday, September 18, 2007
Yu
Yu - A Domain-Independent System for Sketch Recognition
Yu's sketch system is very similar to Sezgin's. In both a sketch is segmented into lines and curves by finding perceived corners. Ideally, these corners match those that a human would perceive in an idealized version of the sketch, namely those corners that the human drawer intended are maintained and extraneous corners or jitter in the drawing are smoothed out. Like Sezgin, direction and curvature graphs are constructed; however, Yu does not use speed. Additionally, rather than choosing corners based on certainty metrics, Yu iteratively subdivides line segments a the highest point on the curvature graph to achieve a tighter fit. Yu begins by approximating the sketch as a single, horizontal line segment on either the direction graph (or actual sketch). If this approximation is close enough in a least-squares sense (under a threshold), the segment is accepted, while if not, the segment is subdivided at the maximal curvature point and each segment is fitted. Circles are fitted to a line on the direction graph whose slope is 2*PI/n (n=num of points on the segment). Overtraced circles are broken into multiple segments, and if they appear similarly shaped are replaced by a single "average" circle. Arc are represented by portions of circles. Yu also introduces a set of clean up rules that should help to fit the calculated shape to the intended one. This cleanup consists of deleting very small segments and merging segment that are similarly oriented and connect or overlap. As shown in Yu's examples, this can greatly reduce the number of corner and help achieve shapes that are much closer to the intended shape.
Yu's idea of adding corners that best improve the least squares accuracy of the sketch seems a much better idea than the metrics used by Sezgin. From the programming of Sezgin's algorithm on the class data, I'd frequently see that the best curvature point and the best speed point to add were often very close to each other. Due to this, initially one of the sets of points is largely ignored as the "best" point to add from it has been taken care of by a very close point in the other set, and other points that may improve the sketch neglected because they are good enough according to the metric. Usually this meant nearly all of the curvature and speed point needed to be added to the final fit to achieve a decent fit, severely overestimating the number of points. As clean-up method like Yu's could help alleviate this somewhat, but it seems that being able to skip those points and not add them at all would be more optimal
Yu's sketch system is very similar to Sezgin's. In both a sketch is segmented into lines and curves by finding perceived corners. Ideally, these corners match those that a human would perceive in an idealized version of the sketch, namely those corners that the human drawer intended are maintained and extraneous corners or jitter in the drawing are smoothed out. Like Sezgin, direction and curvature graphs are constructed; however, Yu does not use speed. Additionally, rather than choosing corners based on certainty metrics, Yu iteratively subdivides line segments a the highest point on the curvature graph to achieve a tighter fit. Yu begins by approximating the sketch as a single, horizontal line segment on either the direction graph (or actual sketch). If this approximation is close enough in a least-squares sense (under a threshold), the segment is accepted, while if not, the segment is subdivided at the maximal curvature point and each segment is fitted. Circles are fitted to a line on the direction graph whose slope is 2*PI/n (n=num of points on the segment). Overtraced circles are broken into multiple segments, and if they appear similarly shaped are replaced by a single "average" circle. Arc are represented by portions of circles. Yu also introduces a set of clean up rules that should help to fit the calculated shape to the intended one. This cleanup consists of deleting very small segments and merging segment that are similarly oriented and connect or overlap. As shown in Yu's examples, this can greatly reduce the number of corner and help achieve shapes that are much closer to the intended shape.
Yu's idea of adding corners that best improve the least squares accuracy of the sketch seems a much better idea than the metrics used by Sezgin. From the programming of Sezgin's algorithm on the class data, I'd frequently see that the best curvature point and the best speed point to add were often very close to each other. Due to this, initially one of the sets of points is largely ignored as the "best" point to add from it has been taken care of by a very close point in the other set, and other points that may improve the sketch neglected because they are good enough according to the metric. Usually this meant nearly all of the curvature and speed point needed to be added to the final fit to achieve a decent fit, severely overestimating the number of points. As clean-up method like Yu's could help alleviate this somewhat, but it seems that being able to skip those points and not add them at all would be more optimal
Sezgin
Metin Sezgin, Sketch Based Interfaces: Early Processing for Sketch Understanding
This'll get filled in as soon as I find the summary I wrote up.
This'll get filled in as soon as I find the summary I wrote up.
Monday, September 17, 2007
MARQS
MARQS: RETRIEVING SKETCHES USING DOMAIN- AND STYLE-INDEPENDENT
FEATURES LEARNED FROM A SINGLE EXAMPLE USING A DUAL-CLASSIFIER
Brandon Paulson, Tracy Hammond
Paulson's system was designed with three goals: 1)recognition of a sketch regardless of drawing style and invariantly with scale and rotation, 2)beginning recognition with only a single example, and 3)learning over time. This is done by using a small number of global features for a single and a multiple example classifier. Sketches that are correctly recognized are added to the training examples, which initially consist of only a single drawing. To recognize a sketch, an input it first rotated so that the major axis is horizontal, eliminating variation due to rotation. Then four features are computed and fed to the classifiers. These features are: aspect ratio, pixel density, average curvature, and number of perceived corners. The single classifier is a simple nearest neighbor classifier, and the multi-example uses a linear classifier like that of Rubine. To test the system, 150 sketches were generated, representing 15 different symbols. The first sketch of each symbol was used as the training example. The search system relied on the linear classifer 73% of the time and the correct symbol was identified in the top 2 results on average.
The big issue with this system that I see it how well it would scale. The number of symbols used in the experiments, though possibly suitable for a media library search, seems rather small for the e-journal example presented in the associated slides. Not only the slow down and accuracy loss associated with adding a new sketch seem to be a costly downside of the system, but the fact that local differences and rotation are ignored may cause symbols that the user perceives as different to be recognized as the same symbol. Some symbols depend on orientation for meaning (<>) as well as local differences (the happy and frowny faces example).
FEATURES LEARNED FROM A SINGLE EXAMPLE USING A DUAL-CLASSIFIER
Brandon Paulson, Tracy Hammond
Paulson's system was designed with three goals: 1)recognition of a sketch regardless of drawing style and invariantly with scale and rotation, 2)beginning recognition with only a single example, and 3)learning over time. This is done by using a small number of global features for a single and a multiple example classifier. Sketches that are correctly recognized are added to the training examples, which initially consist of only a single drawing. To recognize a sketch, an input it first rotated so that the major axis is horizontal, eliminating variation due to rotation. Then four features are computed and fed to the classifiers. These features are: aspect ratio, pixel density, average curvature, and number of perceived corners. The single classifier is a simple nearest neighbor classifier, and the multi-example uses a linear classifier like that of Rubine. To test the system, 150 sketches were generated, representing 15 different symbols. The first sketch of each symbol was used as the training example. The search system relied on the linear classifer 73% of the time and the correct symbol was identified in the top 2 results on average.
The big issue with this system that I see it how well it would scale. The number of symbols used in the experiments, though possibly suitable for a media library search, seems rather small for the e-journal example presented in the associated slides. Not only the slow down and accuracy loss associated with adding a new sketch seem to be a costly downside of the system, but the fact that local differences and rotation are ignored may cause symbols that the user perceives as different to be recognized as the same symbol. Some symbols depend on orientation for meaning (<>) as well as local differences (the happy and frowny faces example).
Thursday, September 6, 2007
Week 2: "These Look Similar!" and Visual Similarity of Pen Gestures
"Those Look Similar!" by Long discusses the quill system that allows users to create a gesture recognition system. As the user adds gestures to be recognized, quill comments on the recognizability of the new gesture by providing warnings if the gesture is overly similar to previously created gestures. To measure similarity the Rubine method is used to generate similarity metrics measuring the degree to which a new gesture is alike other gestures. As the system provides similarity warning, the authors had to determine when, what, and how much advice should be given. Advice is given when the user pauses in hopes that the advice will not interrupt the user while still being relevant to the action the user is taking, and both graphical and textual descriptions of the advice are given. Additionally, as the advice computation is run in the background, the authors had determine what user actions were allowable during this computation. They determined five options ranging from allowing the user to take any action (even those that would render advice meaningless) to take no action (and delaying or shutting out the user).
"Visual Similarity of Pen Gestures" describes the system used to determine the similarity of gestures. It begins with a summary of pen-based input and several applications of gesture input. Then it describes the concept of perceptual similarity and its relation to human perception of the similarity of gestures. While humans tend to relate the similarity of gestures to various commonly-used metrics, more often the log of those metrics correlated best with perceived similarity, and humans often used different metrics for different types of gestures. Next, the authors briefly detail MDS, which is used to visual high dimensional data (>3 dimensions) in a low dimensional space (usually 2D or 3D graph) while retaining the structural characteristics of that data. Lastly, the authors conducted two experiments to determine how their system's judgment of similarity compared to human judgment. The first experiment attempted to determine what features could adequately describe the gestures. After humans rated gestures for similarity, an set of features expanded from Rubine's was examined for correlation with human perception. Using MDS, the set of features was reduce to only 5 dimensions, the first of which was found to correlate to curviness. The second experiment attempted to determine how well the selected features generalized across people and features. New gestures were created and new people determined the similarity of the gestures. MDS selected features, as well as the expanded feature set, were examined for correlation with the human evaluations. In this case, MDS needed only 3 dimensions, but the dimensions did not easily correspond to any computed feature.
Discussion:
The idea of determining what gestures are similar is a novel approach. In effect it takes a direction opposite the traditional approach. Traditionally, a set of gestures is created, then a classifier and set of features are created or tuned to provide a "good enough" classification. In Long's paper, the gestures are instead tuned to the fixed classifier and features. Which is a better approach is debatable. In each approach we run into obstacles. For the traditional approach, finding good features, much less the "best" features, can be difficult, and if the features do not provide good separability classification is very difficult. Finding good gestures could prove difficult for Long's method, especially when viewed from a perspective of training a user. Even if a set of gestures show little similarity to each other, they are worthless if the user has difficulty connecting a gesture with what it does.
"Visual Similarity of Pen Gestures" describes the system used to determine the similarity of gestures. It begins with a summary of pen-based input and several applications of gesture input. Then it describes the concept of perceptual similarity and its relation to human perception of the similarity of gestures. While humans tend to relate the similarity of gestures to various commonly-used metrics, more often the log of those metrics correlated best with perceived similarity, and humans often used different metrics for different types of gestures. Next, the authors briefly detail MDS, which is used to visual high dimensional data (>3 dimensions) in a low dimensional space (usually 2D or 3D graph) while retaining the structural characteristics of that data. Lastly, the authors conducted two experiments to determine how their system's judgment of similarity compared to human judgment. The first experiment attempted to determine what features could adequately describe the gestures. After humans rated gestures for similarity, an set of features expanded from Rubine's was examined for correlation with human perception. Using MDS, the set of features was reduce to only 5 dimensions, the first of which was found to correlate to curviness. The second experiment attempted to determine how well the selected features generalized across people and features. New gestures were created and new people determined the similarity of the gestures. MDS selected features, as well as the expanded feature set, were examined for correlation with the human evaluations. In this case, MDS needed only 3 dimensions, but the dimensions did not easily correspond to any computed feature.
Discussion:
The idea of determining what gestures are similar is a novel approach. In effect it takes a direction opposite the traditional approach. Traditionally, a set of gestures is created, then a classifier and set of features are created or tuned to provide a "good enough" classification. In Long's paper, the gestures are instead tuned to the fixed classifier and features. Which is a better approach is debatable. In each approach we run into obstacles. For the traditional approach, finding good features, much less the "best" features, can be difficult, and if the features do not provide good separability classification is very difficult. Finding good gestures could prove difficult for Long's method, especially when viewed from a perspective of training a user. Even if a set of gestures show little similarity to each other, they are worthless if the user has difficulty connecting a gesture with what it does.
Monday, September 3, 2007
Week 2: Specifying Gestures by Example
This paper discusses Rubine's gesture recognition system which allows users to define single stroke gestures and how the system should interpret those gestures. His example GDP allows for the creation, deletion, copying, and transformation of various geometric objects. A user clicks the mouse to initiate the gesture then either releases the mouse or waits on a brief timeout to transfer between gesturing and modifying the object created by the gesture. GDP was designed using the GRANDMA toolkit that allows creation of new gestures, by drawing the gesture, and specifying the semantics of the gesture. 15 examples of the gesture are then provided for training. Each gesture operates on one or more objects and all objects share a gesture handler. To recognize gestures, GRANDMA uses a statistical classification method. For each gesture 13 features are calculated, and these features are input into a linear evaluation function for each class. A gesture is assigned to the class with the maximal value of the evaluation function. For each class of recognized gesture, the average features and covariance of the examples is calculated. The sample covariances are averaged to determine a common covariance. The common covariance and average feature vectors determine the weights used in each evaluation function. Ambiguous gestures are rejected based on the probability that it belongs to its assigned class. Eager and multi-finger recognition systems have also been designed by the author.
Though the author mentions Mahalanobis distance as a possible method for outlier rejection, he strangely uses the estimated common covariance rather than class covariances for determination of the distance from class mean. This may account for the lower recognition rate using Mahalanobis distance, as the common covariance would be used to determine distance from the common mean feature vector across all classes rather than the class specific distance.
Though the author mentions Mahalanobis distance as a possible method for outlier rejection, he strangely uses the estimated common covariance rather than class covariances for determination of the distance from class mean. This may account for the lower recognition rate using Mahalanobis distance, as the common covariance would be used to determine distance from the common mean feature vector across all classes rather than the class specific distance.
Wednesday, August 29, 2007
Week 1: Sketchpad
"Sketchpad," Sutherland
Sutherland's paper begins with an example illustrating the capabilities of Sketchpad, highlighting features such as object oriented design and constraint solving. Then next sections detail the inner workings of the various components of the system, beginning with storage. Objects are stored in a doubly-linked list that ties together connected components such as points and edges. Additionally, as new objects are created from the composition of lower level objects, a hierarchy of objects is formed. Modification of the lower level component bubbles up through the hierarchy altering higher level object that make use of it. Next, the light pen used for input is described. The light pen can observe objects on the display and is able to focus the users aim on objects within 1/8 of the center of the light pen's field of vision, allowing a range of freedom for targeting an object. The method used to display drawings is discussed next. For each "spot" on the screen the coordinates of the spot as well as a link to the object responsible for the spot are stored. Drawings can be magnified by up to a factor of 2000 and intersections between the objects an screen edges are computed. Line and curves are generated using simple difference equations, while text and digits rely on pre-generated tables to determine spot locations. Sketchpad also uses recursive, generic methods that allow complex objects to be altered in the same manner as simple ones. Objects are able to be deleted and merged in this fashion, and dependent objects are merged or deleted as necessary. When displayed objects refer to the objects they are composed of to determine how they are drawn. An instance of an object can be copied and modified independently of the original. The final section describing how Sketchpad works deals with constraints. Constraints are enforce with a one pass method and are relaxed should the method fail. Next, several examples are provided, such as patterns, electrical circuits, and bridges.
The Sketchpad paper provides an interesting view into the beginnings of pen based computer input. Though the numerous buttons needed for determining the appropriate mode of operation seem burdensome at first, modern CAD or drawing programs require the use of a similar mode selection procedure, though it is graphical in nature and some efficiency improvement do exist.
Sutherland's paper begins with an example illustrating the capabilities of Sketchpad, highlighting features such as object oriented design and constraint solving. Then next sections detail the inner workings of the various components of the system, beginning with storage. Objects are stored in a doubly-linked list that ties together connected components such as points and edges. Additionally, as new objects are created from the composition of lower level objects, a hierarchy of objects is formed. Modification of the lower level component bubbles up through the hierarchy altering higher level object that make use of it. Next, the light pen used for input is described. The light pen can observe objects on the display and is able to focus the users aim on objects within 1/8 of the center of the light pen's field of vision, allowing a range of freedom for targeting an object. The method used to display drawings is discussed next. For each "spot" on the screen the coordinates of the spot as well as a link to the object responsible for the spot are stored. Drawings can be magnified by up to a factor of 2000 and intersections between the objects an screen edges are computed. Line and curves are generated using simple difference equations, while text and digits rely on pre-generated tables to determine spot locations. Sketchpad also uses recursive, generic methods that allow complex objects to be altered in the same manner as simple ones. Objects are able to be deleted and merged in this fashion, and dependent objects are merged or deleted as necessary. When displayed objects refer to the objects they are composed of to determine how they are drawn. An instance of an object can be copied and modified independently of the original. The final section describing how Sketchpad works deals with constraints. Constraints are enforce with a one pass method and are relaxed should the method fail. Next, several examples are provided, such as patterns, electrical circuits, and bridges.
The Sketchpad paper provides an interesting view into the beginnings of pen based computer input. Though the numerous buttons needed for determining the appropriate mode of operation seem burdensome at first, modern CAD or drawing programs require the use of a similar mode selection procedure, though it is graphical in nature and some efficiency improvement do exist.
Week 1: Intro to Sketch Recognition
"Introduction to Sketch Recognition, Tracy Hammond and Kenneth Mock
The paper begins with a brief history of pen-based computer input, then describes current pen-based computer systems, including several types of hardware (form, size, etc) and available software (OS support, display recording, pen input), as well as describing two methods used to determine where the input device is in relation to the what is displayed. In the next section, uses of pen-based computer input in the classroom are discussed, including lectures/presentations, note taking, and other uses along with advantages and disadvantages of these techniques. Several example systems developed for specific academic environments are described, such as art, music, chemistry, engineering, and computer science, as well as military packages. These systems recognize specific shapes drawn with the pen to represent objects and actions. Finally in this section, a package that allows development of new systems that recognize pen input and associate it with specific actions is described. Next, the paper describes some challenges facing pen interface device. Also, two case studies are presented in which pen devices are used in the classroom and the reaction to their use by the the teacher and students is recorded. The paper finishes with a brief discussions of the use of pen devices in distance learning and possible future use of pen devices.
Overall, the paper provides a fairly comprehensive view of pen-input devices and their use in education. Though educational uses were the primary focus of the examples, the same uses also exist in the business world. Many business presentations could benefit from the interactivity the pen interface provides. The text recognition features available on later model tablets could be useful for improved legibility in professions that tend handwrite documents. The discussion of distance learning uses seems misplaced towards the end.
The paper begins with a brief history of pen-based computer input, then describes current pen-based computer systems, including several types of hardware (form, size, etc) and available software (OS support, display recording, pen input), as well as describing two methods used to determine where the input device is in relation to the what is displayed. In the next section, uses of pen-based computer input in the classroom are discussed, including lectures/presentations, note taking, and other uses along with advantages and disadvantages of these techniques. Several example systems developed for specific academic environments are described, such as art, music, chemistry, engineering, and computer science, as well as military packages. These systems recognize specific shapes drawn with the pen to represent objects and actions. Finally in this section, a package that allows development of new systems that recognize pen input and associate it with specific actions is described. Next, the paper describes some challenges facing pen interface device. Also, two case studies are presented in which pen devices are used in the classroom and the reaction to their use by the the teacher and students is recorded. The paper finishes with a brief discussions of the use of pen devices in distance learning and possible future use of pen devices.
Overall, the paper provides a fairly comprehensive view of pen-input devices and their use in education. Though educational uses were the primary focus of the examples, the same uses also exist in the business world. Many business presentations could benefit from the interactivity the pen interface provides. The text recognition features available on later model tablets could be useful for improved legibility in professions that tend handwrite documents. The discussion of distance learning uses seems misplaced towards the end.
Subscribe to:
Posts (Atom)