Internet-Draft VBF July 2022
Li Expires 27 January 2023 [Page]
Intended Status:
Standards Track
D. Li

Video BFrame RTP Header Extension


This document describes an RTP header extension used to convey decoding time information about video when Bi-directional predicted frames exist.It adds CompositionTime(CTS) as value so that receiver can decode video with correct sequence.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 27 January 2023.

Table of Contents

1. Introduction

As video codec, H264/HEVC is widely used in RTP base system. Those codec support I-Frame, B-Frame, and P-frame . Most RTP systems do not support B-Frame, while B-Frame is widely used in streaming systems, with the rapid deploy of Real Time Communication(RTC) in low latency streaming scenario, support for Bi-directional predicted frames in RTP base system are necessary.

Video streams contain a lot of details, including timestamps, so a decoder knows how to handle the content properly. The DTS(DecodingTimeStamp) decides when a frame has to be decoded, while the PTS(PresentationTimeStamp) describes when a frame has to be presented.This difference becomes important when using B-frames, which are frames that can have references to frames in the past, but also to frames in the future. Given that, there will be frames in the future, which a decoder needs to decode first in order to use them as reference. Therefore, decoder needs DTS when B-frames exist, while, the RTP timestamp reflects the presentation time(PTS) only. This document specifies an RTP extension header that allows video rtp senders deliver CTS(CompositionTime) to rtp receiver .

The CTS value is PTS minus DTS. Therefore , the rtp receiver gets DTS value via RTP timestamp adding CTS value.

This new header extension uses the general mechanism for RTP header extensions as described in ([RFC5285])]. Rtp sender only needs to add CTS to the first rtp packet when the video frame contains several packets, which reduces overhead.

2. Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

3. RTP header extension format

The general RTP payload format follows the RTP header format ([RFC3550]) and generic RTP header extensions ([RFC8285]), RTP header extension MAY encoded using the one-byte header or two-byte header as described in ([RFC8285]). The two-byte header format is used as an example in this memo.

The following RTP header extension is RECOMMENDED. The ID is assigned per ([RFC8285]), and format is shown below.

 0                   1                   2
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3
| ID | Len=2 |              cts              |
Figure 1: extension format

ID: extension id.

cts: PTS minus DTS and divide by 90 (Video Clock Rate)

3.1. Video rtp sender

The video sender here MAY be video client or middle box perform RTP switch. Video client MAY encode video with B-frame, it SHOULD add this rtp header extension in the rtp packetization module . Only adding in the first rtp packet is RECOMMENDED when the video frame contains multi rtp packets, which will reduce overhead. The middle box MAY perform RTMP or other streaming video protocols translate to rtp streams work, it SHOULD add this header extension when streaming video contains B-frame.

3.2. Video rtp receiver

The video rtp receiver here is a client which decodes video . It SHOULD extract cts value when this extension exists , and calculate DTS value with rtp timestamp(PTS) and CTS.

DTS = PTS - CTS * 90

90 is video clock rate, Video receiver construction frame and put to jitter buffer, decoder MUST decode frame by DTS sequence, and video render module MUST render the decoded frame with PTS sequence, which come from rtp timestamp.

3.3. Usage considerations

In practice, when receiver that decode video does not support B-frame, In order to successfully decode an incoming video stream, it is RECOMMENDED An RTP middle box discard B-frame when video rtp sender contains B-frame, the decoder at the Endpoint SHOULD add whether it support video B-frame capability in SDP payload format specific paramaters(a=fmtp), and follow the Offer/Answer procedure describe in ([RFC8285]).

4. Session Description Protocol (SDP) Signaling

The URI for declaring this header extension in an extmap attribute is "urn:ietf:params:rtp-hdrext:CompositionTime". It does not contain any extension attributes, It follows the standard mechanism described in ([RFC8285]) An example attribute line in SDP:

a=extmap:19 uri:ietf:rtc:rtp-hdrext:video:CompositionTime;

5. Security Considerations

The security considerations of the RTP specification ([RFC3550]) and the general mechanism for RTP header extensions ([RFC8285]) apply. and all the security considerations of typologies ([RFC7667]) ([RFC7201]) for these two types of RTP intermediaries are applicable to this header extension.

Security considerations for SDP are described in the corresponding section in ([RFC8866]), In the Secure Real-time Transport Protocol (SRTP) ([RFC3711]), RTP header extensions are authenticated but not encrypted. When this header extension is used, cts are therefore visible on a frame-by-frame basis to an attacker passively observing the video stream, In scenarios where this is a concern, additional mechanisms MUST be used to protect the confidentiality of the header extension. This mechanism could be header extension encryption ([RFC6904]), or a lower-level security and authentication mechanism such as IPsec ([RFC4301]).

6. IANA Considerations

IANA has registered the following entry in the "RTP Compact Header Extensions" registry: Extension URI: uri:ietf:rtc:rtp-hdrext:video:CompositionTime Description: video B frame compositionTime Contact:

7. Acknowledgements

8. Normative References

Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <>.
Schulzrinne, H., Casner, S., Frederick, R., and V. Jacobson, "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, DOI 10.17487/RFC3550, , <>.
Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K. Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC 3711, DOI 10.17487/RFC3711, , <>.
Kent, S. and K. Seo, "Security Architecture for the Internet Protocol", RFC 4301, DOI 10.17487/RFC4301, , <>.
Singer, D. and H. Desineni, "A General Mechanism for RTP Header Extensions", RFC 5285, DOI 10.17487/RFC5285, , <>.
Lennox, J., "Encryption of Header Extensions in the Secure Real-time Transport Protocol (SRTP)", RFC 6904, DOI 10.17487/RFC6904, , <>.
Westerlund, M. and C. Perkins, "Options for Securing RTP Sessions", RFC 7201, DOI 10.17487/RFC7201, , <>.
Westerlund, M. and S. Wenger, "RTP Topologies", RFC 7667, DOI 10.17487/RFC7667, , <>.
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <>.
Singer, D., Desineni, H., and R. Even, Ed., "A General Mechanism for RTP Header Extensions", RFC 8285, DOI 10.17487/RFC8285, , <>.
Begen, A., Kyzivat, P., Perkins, C., and M. Handley, "SDP: Session Description Protocol", RFC 8866, DOI 10.17487/RFC8866, , <>.

Author's Address

Deping li