Friday, March 16, 2007

Going smart with LSI –Hats off to Google

There is a buzz all around the corner about the usage of Latent Semantic Indexing (LSI) by Google.

Why LSI?

Despite its success, the vector model suffers some serious problems. Unrelated documents may be retrieved simply because terms occur accidentally in it, and on the other hand related documents may be missed because no term in the document occurs in the query (consider synonyms, there exists a study that different people use the same keywords for expressing the same concepts only 20% of the time).

Thus it would be an interesting idea to see whether the retrieval could be based on concepts rather than on terms, by mapping first terms to a "concept space" (and queries as well) and then establish the ranking with respect to similarity within the concept space.

In amateur’s language, the search engines through their vast databases are able to use LSI to associate certain terms with concepts when indexing web pages. LSI tool endeavors to read the semantic map of searchers to display results. Google realized that it needed a better way for its bots to ascertain the true theme of a webpage and that’s what Latent Semantic Indexing is all about.

The above screenshot confirms that Google has started using LSI, although in some areas.

See the red marked circle, which confirms the usage of related words.

Clear Message: Instead of relying heavily on the number of occurrences of a particular keyword on a page to determine what that page is about, write exceptionally a good copy without any follow up of touted piece of prose.

Good news is for those, who always believe in ‘quality writing’. You are soon likely to get advantage to pull traffic with DIY SEO technique. Hats off to Google!
So, take off black hats now! :)


