NewsML Toolkit - The NewsML library from Reuters & WAVO
Written by David Megginson

Last updated: 1 December 2000
Version: 0.1 alpha
Comments to: newsml-toolkit-comments@xmlnews.org

The NewsML Toolkit library and the NewsML Explorer demo application are copyright (c) 2000 by Reuters PLC and WAVO Corporation, Inc., and are released under the terms of version 2.1 of the Gnu Lesser General Public License (LGPL).

1. Overview

NewsML is a new, open electronic-news specification developed by the International Press Telecommunications Council (IPTC) and supported by major news vendors and amalgamators. Based on the Extensible Markup Language (XML), NewsML allows news providers to bundle compound news objects in different media (such as text, video, photographs and graphics) into a single package for electronic distribution.

News customers can process NewsML packages with low-level, generic XML tools and libraries like the Simple API for XML (SAX), the Document Object Model (DOM), and Extensible Stylesheet Language Transformations (XSLT), but the large feature set of the NewsML format can make the work difficult, especially if an XML specialist is not available. The Java-based NewsML Toolkit, jointly developed by the Reuters Group PLC in the U.K. and Wavo Corporation, Inc. in the U.S., provides a simple interface that lets you perform the most important NewsML processing tasks without any knowledge of XML or the intricacies of NewsML markup.

Java developers with no prior XML knowledge can use the NewsML Toolkit to extract many kinds of information from a multimedia NewsML package, including news lines, permissions, dates, whether a story is embargoed, and where to find the individual news objects, all using regular Java object methods. The first release of the library also includes a simple demonstration application, the NewsML Explorer, for browsing NewsML packages interactively.

For advanced users who need access to information not provided directly by the first alpha release of the library (such as full metadata support or incremental updates), the NewsML Toolkit allows direct access to the full original markup through a DOM interface whenever needed.

2. Features and Benefits

The NewsML Toolkit is implemented in Java and should run on any platform with a Java2-compliant virtual machine, including (but not limited to) Unix, Linux, Windows NT, Windows 2000, Windows 95/98, and MacOS. To date, the library has been tested under Linux and Windows.

The NewsML Toolkit and the NewsML Explorer application are both Open Source: freely redistributable, with source code included. The library's license allows it to be incorporated into commercial software packages royalty-free, as long as any modifications or improvements to the library itself are released back to the public. A shared, vendor-friendly open-source library makes it possible for NewsML developers to concentrate on innovation rather than writing basic NewsML processing code over and over again and losing weeks or months tracking down the resulting bugs.

The NewsML Toolkit works with the industry-standard DOM standard for XML processing, and will work with any conformant Java-based DOM library: if you have already assembled an XML toolkit that you're happy with, you do not have to throw it away. While the initial NewsML Toolkit release concentrates on presenting the most important information as simply as possible, the full XML markup is always available through the DOM whenever needed.

The NewsML Toolkit will save developers time and money, by allowing non-XML-specialists to develop NewsML-based applications quickly and easily.

3. Library Structure

The NewsML Toolkit contains many classes to represent the different kinds of information that can be present in a NewsML package, but most NewsML work is based on five key classes:

NewsML
This class represents the top-level NewsML package, containing one or more NewsItem objects. The top-level package also includes envelope information for routing.
NewsItem
This class represents a managed set of news information, containing a single NewsComponent. The NewsItem also contains identification and management information.
NewsComponent
This class represents a collection of related items, either complements or equivalent versions of the same news object in different formats, resolutions, languages, and so on. A NewsComponent includes one or more NewsItems or NewsItemRefs, NewsComponents, or ContentItems, and also includes news lines (headline, byline, and so on) and metadata describing the news objects.
NewsItemRef
This class represents a reference to another NewsItem, either inside or outside the current NewsML package.
ContentItem
This class represents a piece of actual news content for presentation to humans, either stored inline or available through an external URL reference.

The following figure provides a visual representation of the structure of a typical object tree in the NewsML Toolkit:

4. Demo Application

The initial release of the NewsML Toolkit comes bundled with a simple demonstration application, the NewsML Explorer for browsing NewsML packages. The explorer requires the Apache Xerces-Java XML library (available from http://xml.apache.org/) together with a Java2-compliant virtual machine.

The NewsML Explorer presents the logical structure of a NewsML document in an interactive tree, with summary information on the right side of the window:

Users can open and close various branches of the package to find what information is available, and can automatically highlight tree nodes matching various search criteria, including media type, format, role, and language. While the NewsML Explorer is not intended to be a full-featured application, it is useful both as a learning tool and as a simple mechanism for exploring a NewsML package.

5. Availability

The initial homes of the NewsML Toolkit are:

http://about.reuters.com/researchandstandards/firstcontact/newsml-toolkit/ (this page)
http://www.xmlnews.org/NewsML/toolkit

WAVO and the IPTC will also be providing mirrors of the NewsML toolkit. You can send comments or questions about the NewsML Toolkit to newsml-toolkit-comments@xmlnews.org.

6. Future Plans

The initial release of the NewsML Toolkit provides basic support for NewsML navigation and highlighting. The developers are considering additional functionality for future releases, including the following:

  • Full support for metadata, including the Catalog and TopicSet.
  • A NewsML validator built on top of the library.
  • On-demand retrieval of external resources.
  • An alternative SAX version that does not require the DOM (and thus, will run faster and use significantly less memory).
  • Possible ports of the library to C++ and to Perl.
  • Enhancements to the demo NewsML Explorer application.

You are welcome to contribute code, documentation, or bug-fixes to the project.

7. Further Reading

DOM
The Document Object Model, a low-level interface specification for processing XML documents from the World Wide Web Consortium.
http://www.w3.org/DOM/
IPTC
The International Press Telecommunications Council, the international consortium that develops interchange specifications for the news industry.
http://www.iptc.org/
Java
A platform-independent programming language and virtual machine specification from Sun Microsystems.
http://java.sun.com/
LGPL
The Gnu Lesser General Public License, an popular and long-established Open Source software license from the Gnu Project.
http://www.gnu.org/copyleft/lesser.html
NewsML
An open metadata and packaging specification for news distribution, from the International Press Telecommunications Council. NewsML is complementary to NITF, which describes news textual content.
http://www.iptc.org/NMLIntro.htm