tales from the central european web

Sunday, April 16, 2006

Google word stemming for Hungarian

Google introduced word stemming for English words two and a half years ago, but until recently no occurence of stemming a Hungarian word has been spotted. As you can see on the following screen captures Google highlights not only the given word "kereső" (~searcher, word used to denominate search engines) but "keresés" (~searching, gerund of the verb "keres" =which means "search") too. It's necessary to use an other keyword to see this phenomenon on the result pages: I have used the word "optimalizálás" which means optimisation:

This doesn't mean that Google has a perfect stemming algorythm for Hungarian words, there are a few regional search engines which do way better job at word stemming like SZTAKI kereső, tango.hu and the PolyMeta meta seach engine. The last screen captures show an example where Google fails to detect plural and other forms of the noun "egér" (which means mouse), while PolyMeta highlights all the relevant keywords appropriately.



We are looking forward to see this happen more frequently on Google results. Hopefully this is a sign that they will care more about smaller markets in the future: as you could see it's time for Google to catch up with regional competitors...

No comments: