Thursday, February 05, 2004

IBM advancing on its search engine project
"Search is trying to find the best page on a topic. WebFountain wants to find the trend,"
It's based on text mining, or what's called natural language processing (NLP). While it indexes Web pages, it tags all the words on a page, examines their inherent structure, and analyzes their relationship to one another. The process is much like diagramming a sentence in fifth grade, but on a massive scale. Text mining extracts blocks of data, nouns-verb-nouns, and analyzes them to show causal relationships.
"The Web has become just a huge bulletin board, and if you can look at that over time and see how things have changed, it answers the question, 'Tell me what's going on?'" said Sue Feldman, analyst at market research firm IDC. "This looks for the predicable structure in text, and uses that just the way people do, to do some analysis, categorize information and to understand it."
This is going to be massive ! Doing web crawling and NLP at the same time. The same idea I thougt about for the research papers (trying to define science's trends) but applied to the internet.