Thursday, December 13, 2007

Ink Features

Ink Features for Diagram Recognition - Rachel Patel, Beryl Plimmer, John Grundy, Ross Ihaka


Patel et al statistically determine which of the various features used in sketch recognition are the most useful for distinguishing shapes from text. They drew from a possible set of 46 features, including those previously used in sketch recognition and newly created features. First they collected sample sketches from 26 people for nine diagram types, each of which included shapes and text. From these samples, each feature was computed. Next, a complete statistical analysis was conducted to determine the most useful features. Using statistical partitioning, the most useful feature was chosen iteratively to create a decision tree. At each node in the tree, a binary test is performed based on the feature which will best decide on the two classes. For each outcome of the test, if the decision is not "good enough," the next best feature for that branch is chosen and another test is formed. Surprisingly, the tree is only 6 levels deep and employs only 8 features. When compared to the Microsoft Ink and InkKit dividers, the new method outperformed both method in Shape classification, but worse than the Microsoft in text classification (Ink seems to classify sketches leaning strongly towards text) on training data, and better on shapes on test data. On all set the text recognition was close to the recognition rate of the InkKit divider. Seemingly, the new recognizer show a considerable improvement in accuracy of division between text and shapes. The eight features used include: time till next stroke, Speed till next stroke, Distance from last stroke, Distance to next stroke, Bounding box width, Perimeter to area, Amount of ink inside, and Total angle.

Discussion - An overall accuracy rate would be useful in addition to the shape-only and text-only accuracies. The overall accuracy of the new method depends on how heavily the datasets were weighted towards shapes or text. If diagrams are shape heavy, the new recognizer obviously performs better, but if text is predominant, the others would be better. The distribution of text/shapes may be difficult a priori for many domains. Alternately, a combination recognizer perhaps using a voting scheme or boosting could be used to gain the advantages of the highly accurate text classifiers along with the accuracy of the shape recognizers.

No comments: