Sketch Recognition Fall 2007: September 2007

Tuesday, September 25, 2007

Online line simplification

Abam, de Berg, and Hachenberger. Streaming Algorithms for Line Simplification

The authors present a greedy algorithm that allows for the simplification of a polyline path to a simpler representative path on the fly, or as points along the path are being generated. The reasoning behind path simplification is a largely matter of storage space (not being able to store as many points as could be generated over long -- possibly infinite -- periods of time) rather then forming an idealized (or intentional) shape as usually seen in the sketch recognition setting. After discussing background information and previous approaches, they begin with the discussion of how their algorithm works. To begin, while the number of points does not exceed storage capacity, all points are stored. However, once the limit is exceeded their algorithm comes into play. Each point is stored in a priority queue with an associated error as the priority. The error assigned to a point is the error that would be generated by removing that point from the path (reducing a path p(i-1)p(i)p(i+1) to p(i-1)p(i+)). This error is measured in two way, Hausdorff error and Fréchet error, whose implementations are detailed towards the end of the paper. The errors are applicable to different types of paths, namely Hausdorff is only applicable to convex and xy-monotone paths. For arbitrary paths and the Fréchet error, they achieve a 4*sqrt(2)-competitive error with respect to an optimal offline algorithm (or at most 4*sqrt(2) time the optimal error).

This would be a novel concept to apply to the corner finding problem in sketch recognition, except for a problem, namely, in this method, the number of points (and therefore corners) is fixed a priori. One solution to this problem would be to use the set of points produced by this method as a set of candidate points similar to curvature points and speed points in Sezgin. This still runs into other problems, however. By limiting the number of points, we also limit the complexity of the figure, for example, choosing to remember only four points eliminates any figure with more than 3 segments. While extreme, this problem scales up, arbitrarily limiting user input. Though it seems unlikely that a user will require to be able to input drawings with 100 segments at a time, the possibility for such a sketch still exists. Additionally, if this set of points is used as candidate points in Sezgin, a quality metric must be assigned, possibly based on the Hausdorff and Fréchet errors, which could be feasible given the efficiency the authors assign to the error oracles.

Thursday, September 20, 2007

Curvature segmentation

Dae Hyun Kim and Myoung-Jun Kim, A curvature estimation for pen input segmentation in sketch-based modeling

Kim & Kim's paper takes somewhat of a middle ground between Sezgin and Yu. While Sezgin's paper relied equally on speed and curvature to detect the end of stroke subsegments, and Yu relied only on curvature, the Kims only uses speed at specific points to help aid the choice of curvature points. They also present two new methods for computing curvature. The first incorporates curvature of adjacent points to not only capture how curvy the line is a specific point, but also in the surrounding regions. To do this they simply sum up the curvature values over a window centered at the point in question if the values have the same sign as the curvature at that point. The second method imposes a monotonicity constraint on the window as well, considering only curvatures of the same sign with a smaller magnitude.

The speed-adaptive threshold is an interesting compromise between Sezgin and Yu, seemingly making it more likely that a curvature corner will be found at a lower speed. Broadening the window for which we examine curvature does seem improve the recognition of corners, but at the same time it may smooth out intentional, small changes in the stroke.

Tuesday, September 18, 2007

Yu

Yu - A Domain-Independent System for Sketch Recognition

Yu's sketch system is very similar to Sezgin's. In both a sketch is segmented into lines and curves by finding perceived corners. Ideally, these corners match those that a human would perceive in an idealized version of the sketch, namely those corners that the human drawer intended are maintained and extraneous corners or jitter in the drawing are smoothed out. Like Sezgin, direction and curvature graphs are constructed; however, Yu does not use speed. Additionally, rather than choosing corners based on certainty metrics, Yu iteratively subdivides line segments a the highest point on the curvature graph to achieve a tighter fit. Yu begins by approximating the sketch as a single, horizontal line segment on either the direction graph (or actual sketch). If this approximation is close enough in a least-squares sense (under a threshold), the segment is accepted, while if not, the segment is subdivided at the maximal curvature point and each segment is fitted. Circles are fitted to a line on the direction graph whose slope is 2*PI/n (n=num of points on the segment). Overtraced circles are broken into multiple segments, and if they appear similarly shaped are replaced by a single "average" circle. Arc are represented by portions of circles. Yu also introduces a set of clean up rules that should help to fit the calculated shape to the intended one. This cleanup consists of deleting very small segments and merging segment that are similarly oriented and connect or overlap. As shown in Yu's examples, this can greatly reduce the number of corner and help achieve shapes that are much closer to the intended shape.

Yu's idea of adding corners that best improve the least squares accuracy of the sketch seems a much better idea than the metrics used by Sezgin. From the programming of Sezgin's algorithm on the class data, I'd frequently see that the best curvature point and the best speed point to add were often very close to each other. Due to this, initially one of the sets of points is largely ignored as the "best" point to add from it has been taken care of by a very close point in the other set, and other points that may improve the sketch neglected because they are good enough according to the metric. Usually this meant nearly all of the curvature and speed point needed to be added to the final fit to achieve a decent fit, severely overestimating the number of points. As clean-up method like Yu's could help alleviate this somewhat, but it seems that being able to skip those points and not add them at all would be more optimal

Sezgin

Metin Sezgin, Sketch Based Interfaces: Early Processing for Sketch Understanding

This'll get filled in as soon as I find the summary I wrote up.

Monday, September 17, 2007

MARQS

MARQS: RETRIEVING SKETCHES USING DOMAIN- AND STYLE-INDEPENDENT
FEATURES LEARNED FROM A SINGLE EXAMPLE USING A DUAL-CLASSIFIER
Brandon Paulson, Tracy Hammond

Paulson's system was designed with three goals: 1)recognition of a sketch regardless of drawing style and invariantly with scale and rotation, 2)beginning recognition with only a single example, and 3)learning over time. This is done by using a small number of global features for a single and a multiple example classifier. Sketches that are correctly recognized are added to the training examples, which initially consist of only a single drawing. To recognize a sketch, an input it first rotated so that the major axis is horizontal, eliminating variation due to rotation. Then four features are computed and fed to the classifiers. These features are: aspect ratio, pixel density, average curvature, and number of perceived corners. The single classifier is a simple nearest neighbor classifier, and the multi-example uses a linear classifier like that of Rubine. To test the system, 150 sketches were generated, representing 15 different symbols. The first sketch of each symbol was used as the training example. The search system relied on the linear classifer 73% of the time and the correct symbol was identified in the top 2 results on average.

The big issue with this system that I see it how well it would scale. The number of symbols used in the experiments, though possibly suitable for a media library search, seems rather small for the e-journal example presented in the associated slides. Not only the slow down and accuracy loss associated with adding a new sketch seem to be a costly downside of the system, but the fact that local differences and rotation are ignored may cause symbols that the user perceives as different to be recognized as the same symbol. Some symbols depend on orientation for meaning (<>) as well as local differences (the happy and frowny faces example).

Thursday, September 6, 2007

Week 2: "These Look Similar!" and Visual Similarity of Pen Gestures

"Those Look Similar!" by Long discusses the quill system that allows users to create a gesture recognition system. As the user adds gestures to be recognized, quill comments on the recognizability of the new gesture by providing warnings if the gesture is overly similar to previously created gestures. To measure similarity the Rubine method is used to generate similarity metrics measuring the degree to which a new gesture is alike other gestures. As the system provides similarity warning, the authors had to determine when, what, and how much advice should be given. Advice is given when the user pauses in hopes that the advice will not interrupt the user while still being relevant to the action the user is taking, and both graphical and textual descriptions of the advice are given. Additionally, as the advice computation is run in the background, the authors had determine what user actions were allowable during this computation. They determined five options ranging from allowing the user to take any action (even those that would render advice meaningless) to take no action (and delaying or shutting out the user).

"Visual Similarity of Pen Gestures" describes the system used to determine the similarity of gestures. It begins with a summary of pen-based input and several applications of gesture input. Then it describes the concept of perceptual similarity and its relation to human perception of the similarity of gestures. While humans tend to relate the similarity of gestures to various commonly-used metrics, more often the log of those metrics correlated best with perceived similarity, and humans often used different metrics for different types of gestures. Next, the authors briefly detail MDS, which is used to visual high dimensional data (>3 dimensions) in a low dimensional space (usually 2D or 3D graph) while retaining the structural characteristics of that data. Lastly, the authors conducted two experiments to determine how their system's judgment of similarity compared to human judgment. The first experiment attempted to determine what features could adequately describe the gestures. After humans rated gestures for similarity, an set of features expanded from Rubine's was examined for correlation with human perception. Using MDS, the set of features was reduce to only 5 dimensions, the first of which was found to correlate to curviness. The second experiment attempted to determine how well the selected features generalized across people and features. New gestures were created and new people determined the similarity of the gestures. MDS selected features, as well as the expanded feature set, were examined for correlation with the human evaluations. In this case, MDS needed only 3 dimensions, but the dimensions did not easily correspond to any computed feature.

Discussion:
The idea of determining what gestures are similar is a novel approach. In effect it takes a direction opposite the traditional approach. Traditionally, a set of gestures is created, then a classifier and set of features are created or tuned to provide a "good enough" classification. In Long's paper, the gestures are instead tuned to the fixed classifier and features. Which is a better approach is debatable. In each approach we run into obstacles. For the traditional approach, finding good features, much less the "best" features, can be difficult, and if the features do not provide good separability classification is very difficult. Finding good gestures could prove difficult for Long's method, especially when viewed from a perspective of training a user. Even if a set of gestures show little similarity to each other, they are worthless if the user has difficulty connecting a gesture with what it does.

Monday, September 3, 2007

Week 2: Specifying Gestures by Example

This paper discusses Rubine's gesture recognition system which allows users to define single stroke gestures and how the system should interpret those gestures. His example GDP allows for the creation, deletion, copying, and transformation of various geometric objects. A user clicks the mouse to initiate the gesture then either releases the mouse or waits on a brief timeout to transfer between gesturing and modifying the object created by the gesture. GDP was designed using the GRANDMA toolkit that allows creation of new gestures, by drawing the gesture, and specifying the semantics of the gesture. 15 examples of the gesture are then provided for training. Each gesture operates on one or more objects and all objects share a gesture handler. To recognize gestures, GRANDMA uses a statistical classification method. For each gesture 13 features are calculated, and these features are input into a linear evaluation function for each class. A gesture is assigned to the class with the maximal value of the evaluation function. For each class of recognized gesture, the average features and covariance of the examples is calculated. The sample covariances are averaged to determine a common covariance. The common covariance and average feature vectors determine the weights used in each evaluation function. Ambiguous gestures are rejected based on the probability that it belongs to its assigned class. Eager and multi-finger recognition systems have also been designed by the author.

Though the author mentions Mahalanobis distance as a possible method for outlier rejection, he strangely uses the estimated common covariance rather than class covariances for determination of the distance from class mean. This may account for the lower recognition rate using Mahalanobis distance, as the common covariance would be used to determine distance from the common mean feature vector across all classes rather than the class specific distance.

Sketch Recognition Fall 2007