Please note that this newsitem has been archived, and may contain outdated information or links.
9 March 2001, Computational Logic Seminar, Djoerd Hemstra
9 March 2001, Computational Logic Seminar, Djoerd Hemstra
Speaker: Djoerd Hiemstra (UTwente)
Title: Statistical Language Models for Information Retrieval
Date and Time: March 9, 2001, 13.30
Location: Room P.327, ILLC, Plantage Muidergracht 24, Amsterdam
Abstract:
Information Retrieval (IR) probably was the first area of natural language
processing in which statistics were successfully applied. Two models of ranked
retrieval developed in the late 60s and early 70s are still in use today:
Salton's vector space model and Robertson/Sparck-Jones' probabilistic model.
However, the real breakthrough of statistical models in natural language
processing did not come from the IR community, but from the speech recognition
community in the 70s and 80s. Many of the statistical techniques that were
first successfully applied for speech, like Shannon's noisy channel model,
n-gram models and hidden Markov models are used today in all sorts of
applications, like e.g. part-of-speech tagging, optical character recognition,
statistical translation, stochastic context free grammars, etc.
In this talk I will show that statistical language models originally developed for speech can be used to model ranked retrieval as well. The application to IR has characteristics of both the vector space model and the probabilistic model of IR and gives a probabilistic interpretation of:
- tf.idf term weighting
- relevance weighting of query terms
- Boolean-structured queries
The model can easily be extended with additional statistical processes, like for instance statistical translation to model cross-language information retrieval, i.e. to search for documents in a language other than the query.
For more information, see https://www.illc.uva.nl/~mdr/ACLG/Local/seminar01-1.html.
Please note that this newsitem has been archived, and may contain outdated information or links.