Thursday, September 6, 2007

Week 2: "These Look Similar!" and Visual Similarity of Pen Gestures

"Those Look Similar!" by Long discusses the quill system that allows users to create a gesture recognition system. As the user adds gestures to be recognized, quill comments on the recognizability of the new gesture by providing warnings if the gesture is overly similar to previously created gestures. To measure similarity the Rubine method is used to generate similarity metrics measuring the degree to which a new gesture is alike other gestures. As the system provides similarity warning, the authors had to determine when, what, and how much advice should be given. Advice is given when the user pauses in hopes that the advice will not interrupt the user while still being relevant to the action the user is taking, and both graphical and textual descriptions of the advice are given. Additionally, as the advice computation is run in the background, the authors had determine what user actions were allowable during this computation. They determined five options ranging from allowing the user to take any action (even those that would render advice meaningless) to take no action (and delaying or shutting out the user).

"Visual Similarity of Pen Gestures" describes the system used to determine the similarity of gestures. It begins with a summary of pen-based input and several applications of gesture input. Then it describes the concept of perceptual similarity and its relation to human perception of the similarity of gestures. While humans tend to relate the similarity of gestures to various commonly-used metrics, more often the log of those metrics correlated best with perceived similarity, and humans often used different metrics for different types of gestures. Next, the authors briefly detail MDS, which is used to visual high dimensional data (>3 dimensions) in a low dimensional space (usually 2D or 3D graph) while retaining the structural characteristics of that data. Lastly, the authors conducted two experiments to determine how their system's judgment of similarity compared to human judgment. The first experiment attempted to determine what features could adequately describe the gestures. After humans rated gestures for similarity, an set of features expanded from Rubine's was examined for correlation with human perception. Using MDS, the set of features was reduce to only 5 dimensions, the first of which was found to correlate to curviness. The second experiment attempted to determine how well the selected features generalized across people and features. New gestures were created and new people determined the similarity of the gestures. MDS selected features, as well as the expanded feature set, were examined for correlation with the human evaluations. In this case, MDS needed only 3 dimensions, but the dimensions did not easily correspond to any computed feature.

Discussion:
The idea of determining what gestures are similar is a novel approach. In effect it takes a direction opposite the traditional approach. Traditionally, a set of gestures is created, then a classifier and set of features are created or tuned to provide a "good enough" classification. In Long's paper, the gestures are instead tuned to the fixed classifier and features. Which is a better approach is debatable. In each approach we run into obstacles. For the traditional approach, finding good features, much less the "best" features, can be difficult, and if the features do not provide good separability classification is very difficult. Finding good gestures could prove difficult for Long's method, especially when viewed from a perspective of training a user. Even if a set of gestures show little similarity to each other, they are worthless if the user has difficulty connecting a gesture with what it does.

3 comments:

- D said...

"Even if a set of gestures show little similarity to each other, they are worthless if the user has difficulty connecting a gesture with what it does."

Right on. I can create the world's most linearly separable gesture set with a little work. That doesn't mean those features have any decent syntactic meaning, however. I think the more difficult problem is to take a gesture set with meaning and fit a classifier to it, rather than the other way around.

Grandmaster Mash said...

I completely agree with what both of you say. Gesture similarity and gesture affordance are two separate issues. Affordance must come from the designers themselves, but I think Long's tool can help developers ensure that two "good" gestures are not too similar.

Test said...

I agree! Excellent point.