Country Sites Products & Services Careers Reuters.com

You are here >

HOME


What is available

Naming and versioning scheme

How to apply

Publications

Statistics
Invisible Placement Image
Reuters Corpus

In 2000 Reuters released a corpus of Reuters News stories for use in research and development of natural language-processing, information-retrieval or machine learning systems.

Reuters stopped distributing the corpus in 2004. Instead, the Reuters corpus is now available from NIST, the National Institute of Science and Technology. Application forms are available at Reuters Corpus @ NIST.

We believe the corpus to be superior in quality and size to previously available corpus of Reuters News stories such as the Reuters 21578 corpus, which has previously been seen as a standard real-world benchmarking corpus for the IR/IE etc community. The Reuters corpus is marked up in XML which we believe will significantly aid processing.