Country Sites Products & Services Careers Reuters.com

You are here >

HOME > WHAT IS AVAILABLE


What is available

Naming and versioning scheme

How to apply

Publications

Statistics
Invisible Placement Image
What is available

The following is currently available:

Reuters Corpus, Volume 1, English language, 1996-08-20 to 1997-08-19
(Release date 2000-11-03, Format version 1, correction level 0)

This is distributed on two CDs and contains about 810,000 Reuters, English Language News stories. It requires about 2.5 GB for storage of the uncompressed files.

All copyright subsisting in the materials contained in this and future volumes is reserved and remains the property of Reuters. Users must sign an agreement with NIST covering permitted uses of the corpus.

It is our intention to provide additional volumes but we have no timetable for this work. We recognise that there is a lot of interest in non-English Corpus and so our next goal is to produce a volume of non-English Language News stories covering the period 20 August 1996 - 19 August 1997. Although this is parallel in time to the currently released information it should not be considered to be a parallel corpus, for translation purposes.

Future work will expand both English and non-English material to cover additional (more recent) years.

For commercial reasons we will not release any data which is less than one year old.

Further information and discussion of the Reuters Corpus may be accessed by subscribing to the ReutersCorpora mailing list: http://groups.yahoo.com/group/ReutersCorpora