Access whitepaper

What's a Thesaurus Got to Do with Search?

Friday, February 20, 2009 by P.J. Hinton
That question was addressed, albeit briefly, in a recent blog post by Ina Fried over at CNet's Beyond Binary blog.

The post talks about an emerging thesaurus technology being plugged at TechFest, an internal event at Microsoft where project teams at Microsoft Research pitch their efforts to product development teams.  Think of it as the way research migrates to reality in Redmond.

The Next Generation Writing Assistance team is behind the effort, and it employs a novel approach to compute synonyms for words.  Turning to foreign language translation tables, the project's algorithms try to derive synonymous relationships between words if a word in another language translates to two different words.  There is also support for phrase synonymy.

The post acknowledges the foreseeable use of the technology in productivity applications like Microsoft Word, and then Fried goes on to note the other potential use.  Quoting from the article:

But the technology could also help Microsoft in another key area: search.

That's because while search engines are good at finding things like names, that have just one form, they have a harder time finding expressions that can be phrased in multiple ways.

That's less of an issue when searching across the whole Web. For example, searching "Who shot Abraham Lincoln?" "Who killed Abraham Lincoln" and "Who assassinated Abraham Lincoln" all direct you to a page with John Wilkes Booth.

However, when it comes to searching smaller universes, such as a company's intranet, that might not be the case.

A similar issue arises when targeting long tail search phrases for corporate blogging.  You might identify your product as a "widget that can do X", but your customer may be submitting search engines phrases to the effect of  a "whatchamacallit that does Y". 

It's a simple fact of life: not everyone uses the same terminology when describing what they are looking for, and that makes naive string matching a bad matchmaker.

I could see where technology like this, if it reaches maturity, might be a way to enhance the textual analysis of blog posts.  Our compending algorithm takes some degree of synonymy into account.  But it sounds like it will be a while before they'll have something ready for prime time, though.

Spread the Word

Comments for What's a Thesaurus Got to Do with Search?

Leave a comment





Captcha

© 2009 Compendium Blogware
All Rights Reserved