Moens M. — Information Extraction: Algorithms and Prospects in a Retrieval Context :: Электронная библиотека попечительского совета мехмата МГУ

Information extraction regards the processes of structuring and combining content that is explicitly stated or implied in one or multiple unstructured information sources. It involves a semantic classification and linking of certain pieces of information and is considered as a light form of content understanding by the machine. Currently, there is a considerable interest in integrating the results of information extraction in retrieval systems, because of the growing demand for search engines that return precise answers to flexible information queries. Advanced retrieval models satisfy that need and they rely on tools that automatically build a probabilistic model of the content of a (multi-media) document.

The book focuses on content recognition in text. It elaborates on the past and current most successful algorithms and their application in a variety of domains (e.g., news filtering, mining of biomedical text, intelligence gathering, competitive intelligence, legal information searching, and processing of informal text). An important part discusses current statistical and machine learning algorithms for information detection and classification and integrates their results in probabilistic retrieval models. The book also reveals a number of ideas towards an advanced understanding and synthesis of textual content.

The book is aimed at researchers and software developers interested in information extraction and retrieval, but the many illustrations and real world examples make it also suitable as a handbook for students.