banner.gif (3136 bytes)
Welcome to NewsML
Reuters & NewsML
NewsML Sample
NewsML Toolkit (Website)
IPTC Home (Website)
NewsML DTD (MS Word)
NewsML – Reuters Advances Multi-Media Publishing Standard
Irving Levine, Senior Vice President, Information Architecture, Chief Technology Office.

One cannot peruse a technology magazine today without reading about the advantages that XML brings to the business world. While HTML is today’s Internet "lingua franca" and provides a means for marking up text for display purposes, XML is being widely adopted as the next generation language for enhancing the functionality of the Internet.

XML is used to structure content for its meaning, as opposed to structuring it for display. As such, XML is acting as an enabling mechanism to make the Internet more amenable to business transactions and capable of distributing self-describing information.

Reuters has been an active advocate of using XML for multimedia news production for several years now. The concept of NewsML was brought to the International Press Telecommunications Council (IPTC) in 1999 and has evolved into an IPTC generated Document Type Definition (DTD), which was approved in Beta status on July 6th of this year. Formal IPTC approval of a version 1.0 specification is anticipated in early October.

What prompted Reuters to investigate XML for news production?

Reuters publishes content spanning text, photographs, graphics, and videos, yet our current production facilities are aligned along the individual media themselves, without providing any points where content integration can take place in an automated fashion. Clearly, we could provide greater value to our subscribers if we, ourselves, could integrate content and generate single "NewsItems", embodying the text, photos and videos covering news events.

The above observation is hardly new. Various groups within Reuters went about addressing the problem two years ago. What resulted was the Reuters version of NewsML, integrating newswires with photos and delivering content via ICE (Information and Content Exchange) software. Reuters, however, is well aware that attempting to promulgate a home-brewed solution as a public standard is risky business. As such, we decided to take the NewsML concept to the IPTC, in order to place it into the public arena.

So what is NewsML trying to accomplish?
The key functionality requirements can be summarized as follows:
  • Allow a NewsItem to consist of any arbitrary combination of media types, encodings, and languages. (i.e. be media neutral).
  • Support the electronic representation of news.
  • Allow sophisticated metadata to be attached to a NewsItem.
  • Be usable throughout the lifecycle of a NewsItem.
  • Provide mechanisms for efficiently publishing changes to NewsItems.
  • Be usable with existing news formats and other XML markups. Note that NewsML acts as a packaging mechanism for content of any media, which have been formatted by other means, including other XML vocabularies. NewsML itself does not address the formatting of content.
  • Allow for the use of modern encryption techniques for security and authentication of content.
  • Use XML and related standards (e.g. XSLT, XPath, XPointer, XLink, when final W3C approval is provided).
  • Be usable with any transmission protocols.
  • Be extensible.

To understand how NewsML provides for the creation of a multimedia NewsItem (the NewsItem being the publishable unit of news), we need to examine a NewsItem’s basic structure.

Figure 1 shows a NewsItem, which contains text, photos, graphs, and videos. In order to package such content meaningfully, I have to be able to explain how each piece of information relates to the other content in the NewsItem in question.

Figure 1 indicates that various pieces of content are referred to as ContentItems. The ContentItem can be thought of as the basic building block of a NewsItem. It can be of any media type and may be either directly embedded in the NewsItem (possibly in an encoded form amenable to XML) or may be present via a pointer to a file holding the actual ContentItem itself (the expected method of creating NewsItems which contain large media files.) A ContentItem by itself lacks the distinguishing characteristics of publishable news (e.g. headline, dateline, topic code, priority code). It is simply the "raw" information that is included in a publishable NewsItem.

The ContentItem can be embellished with various types of descriptive, administrative, and rights metadata and NewsLines (e.g. HeadLine, DateLine, ByLine, CopyrightLine, SlugLine). This richer structure is now referred to as a NewsComponent. A NewsComponent can be built in various ways:

  • from a ContentItem, which has had metadata and NewsLines attached.
  • from multiple ContentItems.
  • from other NewsComponents.
  • from other NewsItems themselves (allowing one to embed existing NewsItems into other NewsItems).

In Figure 2, we indicate the various NewsComponents that we’ve built for this NewsItem. The NewsComponent labelled "Main Text" contains 3 ContentItems, where each ContentItem is the same text translated into a different language. The "Primary Photo" NewsComponent contains 2 ContentItems, which are photos which differ only in the resolution of the image. The NewsComponent labelled "sidebar" is slightly more complex. It consists of two other NewsComponents. The first NewsComponent consists of 3 ContentItems, which are different translations of the same text. The second NewsComponent contained in the "sidebar" consists of 2 ContentItems which are graphs which differ in their image resolution. The remaining NewsComponent contains two Video clip ContentItems, which differ in language translation.

The NewsItem is structured so that all the aforementioned NewsComponents are themselves contained in one (and only one) highest level NewsComponent, which acts as their container and is the direct child of the NewsItem element itself.

All the above does not yet tell us enough about a NewsItem to know how to interpret the meaning of the content. How does the subscriber know if some of the information that is present is simply an alternative representation of something else (e.g. language translation or picture resolution) and how does the subscriber know what pieces of information should be chosen to get a complete representation of the NewsItem, without including any redundant content. The answer lies in the following 3 concepts:

  • equivalents
  • complements
  • roles

If a piece of content exists in different versions to accommodate different subscribers (for example, based on language, media types, etc.) then the various versions are referred to as a set of equivalents. The "Main Text" NewsComponent contains 3 ContentItems, each of which is an equivalent to the others (same text, different language). The "Main Text" NewsComponent is structured as an XML element which has an attribute indicating that the ContentItems contained in it are equivalents and the "basis for choice" is the language attribute appearing in each ContentItem. Similarly, the "Primary Photo" NewsComponent is structured as an XML element which has an attribute indicating that the ContentItems contained in it are equivalents and the "basis for choice" is the Format element.

Choosing from alternatives

While "equivalents" tell us how to choose from alternative representations of content and the associated basis for choice, it is "complements" which tell us what content we must choose if we want to obtain all the information provided by the NewsItem. If one excludes a complement, then one is missing information. In order to extract the full information of the example NewsItem without any redundancy, I need to choose one text translation in the "Main Text" NewsComponent, one photo in the "Primary Photo" NewsComponent, one text translation AND one graph in the "Sidebar" NewsComponent (the text and graphs act as complements to one another within the Sidebar) and one video clip from the Video NewsComponent. If I exclude any complement, I will be missing information. The NewsItem structure allows us to indicate the "roles" that complements have in respect to one another.

Assigning

Each NewsComponent is assigned a role to advise the subscriber of its context within the NewsItem. In our example, we have main text, sidebar text, sidebar graph, primary photo, and one video. A NewsItem could have various classes of text or photos or videos, each having varying degrees of significance to the NewsItem. The Role element allows the publisher to advise the subscriber how the various news objects (NewsComponents, ContentItems, other NewsItems) comprising a NewsItem relate to one another.