Network Working Group K. Kobayashi
Request for Comments: 3189 Communication Research Laboratory
Category: Standards Track A. Ogawa
Keio University
S. Casner
Packet Design
C. Bormann
Universitaet Bremen TZI
January 2002
RTP Payload Format for DV (IEC 61834) Video
Status of this Memo
This document specifies an Internet standards track protocol for the
Internet community, and requests discussion and suggestions for
improvements. Please refer to the current edition of the "Internet
Official Protocol Standards" (STD 1) for the standardization state
and status of this protocol. Distribution of this memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract
This document specifies the packetization scheme for encapsulating
the compressed digital video data streams commonly known as "DV" into
a payload format for the Real-Time Transport Protocol (RTP).
1. Introduction
This document specifies payload formats for encapsulating both
consumer- and professional-use DV format data streams into the Real-
time Transport Protocol (RTP), version 2 [6]. DV compression audio
and video formats were designed for helical-scan magnetic tape media.
The DV standards for consumer-market devices, the IEC 61883 and 61834
series, cover many aspects of consumer-use digital video, including
mechanical specifications of a cassette, magnetic recording format,
error correction on the magnetic tape, DCT video encoding format, and
audio encoding format [1]. The digital interface part of IEC 61883
defines an interface on an IEEE 1394 network [2,3]. This
specification set supports several video formats: SD-VCR (Standard
Definition), HD-VCR (High Definition), SDL-VCR (Standard Definition -
Long), PALPlus, DVB (Digital Video Broadcast) and ATV (Advanced
Television). North American formats are indicated with a number of
lines and "/60", while European formats use "/50". DV standards
Kobayashi, et al. Standards Track [Page 1]
RFC 3189 RTP Payload Format for DV (IEC 61834) Video January 2002
extended for professional use were published by SMPTE as 306M and
314M, for different sampling systems, higher color resolution, and
faster bit rates [4,5].
There are two kinds of DV, one for consumer use and the other for
professional. The original "DV" specification designed for
consumer-use digital VCRs is approved as the IEC 61834 standard set.
The specifications for professional DV are published as SMPTE 306M
and 314M. Both encoding formats are based on consumer DV and used in
SMPTE D-7 and D-9 video systems. The RTP payload format specified in
this document supports IEC 61834 consumer DV and professional SMPTE
306M and 314M (DV-Based) formats.
IEC 61834 also includes magnetic tape recording for digital TV
broadcasting systems (such as DVB and ATV) that use MPEG2 encoding.
The payload format for encapsulating MPEG2 into RTP has already been
defined in RFC 2250 [7] and others.
Consequently, the payload specified in this document will support six
video formats of the IEC standard: SD-VCR (525/60, 625/50), HD-VCR
(1125/60, 1250/50) and SDL-VCR (525/60, 625/50), and six of the SMPTE
standards: 306M (525/60, 625/50), 314M 25Mbps (525/60, 625/50) and
314M 50Mbps (525/60, 625/50). In the future it can be extended into
other high-definition formats.
Throughout this specification, we make extensive use of the
terminology of IEC and SMPTE standards. The reader should consult
the original references for definitions of these terms.
1.1 Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119 [8].
2. DV format encoding
The DV format only uses the DCT compression technique within each
frame, contrasted with the interframe compression of the MPEG video
standards [9,10]. All video data, including audio and other system
data, are managed within the picture frame unit of video.
The DV video encoding is composed of a three-level hierarchical
structure. A picture frame is divided into rectangle- or clipped-
rectangle-shaped DCT super blocks. DCT super blocks are divided into
27 rectangle- or square-shaped DCT macro blocks.
Kobayashi, et al. Standards Track [Page 2]
RFC 3189 RTP Payload Format for DV (IEC 61834) Video January 2002
Audio data is encoded with PCM format. The sampling frequency is 32
kHz, 44.1 kHz or 48 kHz and the quantization is 12-bit non-linear,
16-bit linear or 20-bit linear. The number of channels may be up to
8. Only certain combinations of these parameters are allowed
depending upon the video format; the restrictions are specified in
each document.
A frame of data in the DV format stream is divided into several "DIF
sequences". A DIF sequence is composed of an integral number of 80-
byte DIF blocks. A DIF block is the primitive unit for all treatment
of DV streams. Each DIF block contains a 3-byte ID header that
specifies the type of the DIF block and its position in the DIF
sequence. Five types of DIF blocks are defined: DIF sequence header,
Subcode, Video Auxiliary information (VAUX), Audio, and Video. Audio
DIF blocks are composed of 5 bytes of Audio Auxiliary data (AAUX) and
72 bytes of audio data.
Each RTP packet starts with the RTP header as defined in RFC 1889
[6]. No additional payload-format-specific header is required for
this payload format.
2.1 RTP header usage
The RTP header fields that have a meaning specific to the DV format
are described as follows:
Payload type (PT): The payload type is dynamically assigned by means
outside the scope of this document. If multiple DV encoding formats
are to be used within one RTP session, then multiple dynamic payload
types MUST be assigned, one for each DV encoding format. The sender
MUST change to the corresponding payload type whenever the encoding
format is changed.
Timestamp: 32-bit 90 kHz timestamp representing the time at which the
first data in the frame was sampled. All RTP packets within the same
video frame MUST have the same timestamp. The timestamp SHOULD
increment by a multiple of the nominal interval for one frame time,
as given in the following table:
Mode Frame rate (Hz) Increase of one frame
in 90kHz timestamp
525-60 29.97 3003
625-50 25 3600
1125-60 30 3000
1250-50 25 3600
Kobayashi, et al. Standards Track [Page 3]
RFC 3189 RTP Payload Format for DV (IEC 61834) Video January 2002
When the DV stream is obtained from an IEEE 1394 interface, the
progress of video frame times MAY be monitored using the SYT
timestamp carried in the CIP header as specified in IEC 61883 [2].
Marker bit (M): The marker bit of the RTP fixed header is set to one
on the last packet of a video frame, and otherwise, must be zero.
The M bit allows the receiver to know that it has received the last
packet of a frame so it can display the image without waiting for the
first packet of the next frame to arrive to detect the frame change.
However, detection of a frame change MUST NOT rely on the marker bit
since the last packet of the frame might be lost. Detection of a
frame change MUST be based on a difference in the RTP timestamp.
2.2 DV data encapsulation into RTP payload
Integral DIF blocks are placed into the RTP payload beginning
immediately after the RTP header. Any number of DIF blocks may be
packed into one RTP packet, except that all DIF blocks in one RTP
packet must be from the same video frame. DIF blocks from the next
video frame MUST NOT be packed into the same RTP packet even if more
payload space remains. This requirement stems from the fact that the
transition from one video frame to the next is indicated by a change
in the RTP timestamp. It also reduces the processing complexity on
the receiver. Since the RTP payload contains an integral number of
DIF blocks, the length of the RTP payload will be a multiple of 80
bytes.
Audio and video data may be transmitted as one bundled RTP stream or
in separate RTP streams (unbundled). The choice MUST be indicated as
part of the assignment of the dynamic payload type and MUST remain
unchanged for the duration of the RTP session to avoid complicated
procedures of sequence number synchronization. The RTP sender MAY
omit DIF-sequence header and subcode DIF blocks from a stream since
the information is either known out-of-band or may not be required
for RTP transport. When sending DIF-sequence header and subcode DIF
blocks, both types of blocks MUST be included in the video stream.
DV streams include "source" and "source control" packs that carry
information indispensable for proper decoding, such as aspect ratio,
picture position, quantization of audio sampling, number of audio
channels, audio channel assignment, and language of the audio.
However, describing all of these attributes with a signaling protocol
would require large descriptions to enumerate all the combinations.
Therefore, no Session Description Protocol (SDP) [13] parameters for
these attributes are defined in this document. Instead, the RTP
sender MUST transmit at least those VAUX DIF blocks and/or audio DIF
blocks with AAUX information bytes that include "source" and "source
control" packs containing the indispensable information for decoding.
Kobayashi, et al. Standards Track [Page 4]
RFC 3189 RTP Payload Format for DV (IEC 61834) Video January 2002
In the case of one bundled stream, DIF blocks for both audio and
video are packed into RTP packets in the same order as they were
encoded.
In the case of an unbundled stream, only the header, subcode, video
and VAUX DIF blocks are sent within the video stream. Audio is sent
in a different stream if desired, using a different RTP payload type.
It is also possible to send audio duplicated in a separate stream, in
addition to bundling it in with the video stream.
When using unbundled mode, it is RECOMMENDED that the audio stream
data be extracted from the DIF blocks and repackaged into the
corresponding RTP payload format for the audio encoding (DAT12, L16,
L20) [11,12] in order to maximize interoperability with non-DV-
capable receivers while maintaining the original source quality.
In the case of unbundled transmission where both audio and video are
sent in the DV format, the same timestamp SHOULD be used for both
audio and video data within the same frame to simplify the lip
synchronization effort on the receiver. Lip synchronization may also
be achieved using reference timestamps passed in RTCP as described in
RFC 1889 [6].
The sender MAY reduce the video frame rate by discarding the video
data and VAUX DIF blocks for some of the video frames. The RTP
timestamp must still be incremented to account for the discarded
frames. The sender MAY alternatively reduce bandwidth by discarding
video data DIF blocks for portions of the image which are unchanged
from the previous image. To enable this bandwidth reduction,
receivers SHOULD implement an error concealment strategy to
accommodate lost or missing DIF blocks, e.g., repeating the
corresponding DIF block from the previous image.
3. SDP Signaling for RTP/DV
When using SDP (Session Description Protocol) [13] for negotiation of
the RTP payload information, the format described in this document
SHOULD be used. SDP descriptions will be slightly different for a
bundled stream and an unbundled stream.
When a DV stream is sent to port 31394 using RTP payload type
identifier 111, the m=?? line will be like:
m=video 31394 RTP/AVP 111
The a=rtpmap attribute will be like:
a=rtpmap:111 DV/90000
Kobayashi, et al. Standards Track [Page 5]
RFC 3189 RTP Payload Format for DV (IEC 61834) Video January 2002
"DV" is the encoding name for the DV video payload format defined in
this document. The "90000" specifies the RTP timestamp clock rate,
which for the payload format defined in this document is a 90kHz
clock.
In SDP, format-specific parameters are defined as a=fmtp, as below:
a=fmtp:
In the DV video payload format, the a=fmtp line will be used to show
the encoding type within the DV video and will be used as below:
a=fmtp: encode=
The required parameter specifies which type of DV
format is used. The DV format name will be one of the following:
SD-VCR/525-60
SD-VCR/625-50
HD-VCR/1125-60
HD-VCR/1250-50
SDL-VCR/525-60
SDL-VCR/625-50
306M/525-60
306M/625-50
314M-25/525-60
314M-25/625-50
314M-50/525-60
314M-50/625-50
In order to show whether the audio data is bundled into the DV stream
or not, a format specific parameter is defined as below:
a=fmtp: audio=