Guidelines for Implementation: DASH-IF Interoperability Points

Living Document,

This version:
https://dashif.org/guidelines/
Issue Tracking:
GitHub
Inline In Spec
Editors:
DASH Industry Forum

1. Purpose

The scope of the DASH-IF InterOperability Points (IOPs) defined in this document is to provide support interoperable services for high-quality video distribution based on MPEG-DASH and related standards. The specified features enable relevant use cases including on-demand and live services, ad insertion, content protection and subtitling. The integration of different media codecs into DASH-based distribution is also defined.

The guidelines are provided in order to address DASH-IF members' needs and industry best practices. The guidelines provide support the implementation of conforming service offerings as well as the DASH client implementation. While alternative interpretations may be equally valid in terms of standards conformance, services and clients created following the guidelines defined in this document can be expected to exhibit highly interoperable behavior between different implementations.

Any identified bugs or missing features may be submitted through the DASH-IF issue tracker at https://gitreports.com/issue/Dash-Industry-Forum/DASH-IF-IOP.

2. Interpretation

Requirements in this document describe required service and client behaviors that DASH-IF considers interoperable:

  1. If a service provider follows these requirements in a published DASH service, that service is likely to experience successful playback on a wide variety of clients and exhibit graceful degradation when a client does not support all features used by the service.

  2. If a client implementer follows the client-oriented requirements described in this document, the client plays the content conforming to this document.

This document uses statements of fact when describing normative requirements defined in referenced specifications such as [MPEGDASH] and [MPEGCMAF]. [[RFC2119!]] statements (e.g. "SHALL", "SHOULD" and "MAY") are used when this document defines a new requirement or further constrains a requirement from a referenced document. In order to clearly separate the requirements of referenced specifications vs. the additional requirements set by this document, the normative statements in each section of this document are separated into two different groups, ones starting with "(referenced specification) requires/recommends:" and the ones starting with "This document requires/recommends:". See also Conformance.

All DASH presentations are assumed to be conforming to an IOP. A service may explicitly signal itself as conforming by including the string https://dashif.org/guidelines/ in MPD@profiles.

There is no strict backward compatibility with previous versions - best practices change over time and what was once considered sensible may be replaced by a superior approach later on. Therefore, clients and services that were conforming to version N of this document are not guaranteed to conform to version N+1.

This is a document made available by DASH-IF. The technology embodied in this document may involve the use of intellectual property rights, including patents and patent applications owned or controlled by any of the authors or developers of this document. No patent license, either implied or express, is granted to you by this document. DASH-IF has made no search or investigation for such rights and DASH-IF disclaims any duty to do so. The rights and obligations which apply to DASH-IF documents, as such rights and obligations are set forth and defined in the DASH-IF Bylaws and IPR Policy including, but not limited to, patent and other intellectual property license rights and obligations. A copy of the DASH-IF Bylaws and IPR Policy can be obtained at http://dashif.org/.

The material contained herein is provided on an "AS IS" basis and to the maximum extent permitted by applicable law, this material is provided AS IS, and the authors and developers of this material and DASH-IF hereby disclaim all other warranties and conditions, either express, implied or statutory, including, but not limited to, any (if any) implied warranties, duties or conditions of merchantability, of fitness for a particular purpose, of accuracy or completeness of responses, of workmanlike effort, and of lack of negligence.

In addition, this document may include references to documents and/or technologies controlled by third parties. Those third party documents and technologies may be subject to third party rules and licensing terms. No intellectual property license, either implied or express, to any third party material is granted to you by this document or DASH-IF. DASH-IF makes no any warranty whatsoever for such third party material.

Note that technologies included in this document and for which no test and conformance material is provided, are only published as a candidate technologies, and may be removed if no test material is provided before releasing a new version of this guidelines document. For the availability of test material, please check http://www.dashif.org.

4. DASH and related standards

DASH is a set of manifest and media formats for adaptive media delivery defined by [MPEGDASH]. Dynamic Adaptive Streaming over HTTP (DASH) is initially defined in the first edition of ISO/IEC 23009-1 which was published in April 2012 and some corrections were done in 2013. In May 2014, ISO/IEC published the second version of ISO/IEC 23009-1 that includes additional features and provide additional clarifications. ISO/IEC published the third and fourth editions of ISO/IEC 23009-1 in 2019 and 2020.

ISO/IEC also published the 1st and 2nd edition of ISO/IEC 23000-19 'Common media application format (CMAF) for segmented media' [MPEGCMAF] in 2018 and 2019. CMAF defines segment and chunk format based on ISO Base Media File Format, optimized for streaming delivery. CMAF defines a set of well defined constraints that allows interoperability for media deliverable objects, which are compatible with [MPEGDASH].

This document is based on the 4th edition DASH [MPEGDASH] and 2nd edition CMAF [MPEGCMAF] specifications.

DASH together with related standards and specifications is the foundation for an ecosystem of services and clients that work together to enable audio/video/text and related content to be presented to end-users.

This document connects DASH with international standards, industry specifications and DASH-IF guidelines.

[MPEGDASH] defines a highly flexible set of building blocks that needs to be constrained to a meaningful subset to ensure interoperable behavior in common scenarios. This document defines constraints that limit DASH features to those that are considered appropriate for use in interoperable clients and services.

This document was generated in close coordination with [DVB-DASH]. The features are aligned to the extent considered reasonable. The tools and features are aligned to the extent considered reasonable. In addition, DASH-IF worked closely with ATSC to develop a DASH profile for ATSC3.0 for broadcast distribution [ATSC3].

Clients consuming DASH content will need to interact with the host device’s media platform. While few constraints are defined on these interactions, this document does assume that the media platform implements APIs that are equivalent to the popular Media Source Extensions (MSE) and Encrypted Media Extensions (EME).

4.1. Relationship to the previous versions of this document

There is no strict backward compatibility with previous versions of this document - best practices change over time and what was once considered sensible may be replaced by a superior approach later on. Therefore, clients and services that were conforming to version N of this document are not guaranteed to conform to version N+1.

The initial two versions of this document where based on the first edition of ISO/IEC 23009-1. Version 4.3 was mostly relying on the third edition of ISO/IEC 23009-1.

This version of the document relies on the 4th edition of ISO/IEC 23009-1 that was technically frozen in July 2019 and is expected to be published by the end of 2019 as ISO/IEC 23009-1:2020.

4.2. Structure of a DASH presentation

[MPEGDASH] specifies the structure of a DASH presentation, which consists primarily of:

  1. The manifest or MPD, which describes the content and how it can be accessed.

  2. Data containers that clients will download over the course of a presentation in order to obtain media samples.

Relationships of primary DASH data structure and the standards they are defined in.

The MPD is an XML file that follows a schema defined by [MPEGDASH]. This schema defines various extension mechanisms for 3rd parties. This document defines some extensions, as do other industry specifications.

[MPEGDASH] defines two data container formats, one based on [ISOBMFF] and the other [MPEG2TS]. However, only the former is used in modern solutions. This document only supports services using the [ISOBMFF] container format.

[!MPEGCMAF] is the constrained media format based on [ISOBMFF], specifically designed for adaptive streaming. This document uses [MPEGCMAF] compatible data containers.

Note: The relationship to [MPEGCMAF] is constrained to the container format. In particular, there is no requirement to conform to [MPEGCMAF] media profiles.

The data container format defines the physical structure of the following elements described by the MPD:

  1. Each representation in the MPD references an initialization segment.

  2. Each representation in the MPD references any number of media segments.

  3. Some representations in the MPD may reference an index segment, depending on the addressing mode used.

Note: HLS (RFC8216) also support ([MPEGCMAF]). Therefore, under certain constraints, the content encoded in ([MPEGCMAF]) can be delivered using MPD or HLS m3u8 manifest format.

[MPEGDASH] [MPEGCMAF] [ISOBMFF]
(media) segment, subsegment CMAF segment
initialization segment CMAF header
index segment, segment index segment index box (sidx)
Quick reference of closely related terms in different standards.

Note: [MPEGDASH] has the concept of "segment" (URL-addressable media object) and "subsegment" (byte range of URL-addressable media object), whereas [MPEGCMAF] does not make such a distinction. This document uses [MPEGCMAF] segment terminology, with the term segment in this document being equivalent to "CMAF segment" which in turns means "DASH media segment or media subsegment", depending the employed DASH profile.

5. Interoperability requirements

The DASH-related standards enable various options for each feature supported by these standards. Limiting options and in some cases additional constraints are needed to establish interoperable behavior between service offerings and client implementations.

This chapter defines the requirements that enable DASH services and clients to provide interoperable behavior. To be compliant to a feature in this document, each service offering or client must conform to specific requirements of that feature, outline in this document.

Need to add a paragraph on interoperability on baseline, if we have any

5.1. CMAF and ISO BMFF Requirements

Media segments SHALL be compliant to [MPEGDASHCMAFPROFILE].

Note: [MPEGDASHCMAFPROFILE] defines the media segment format using [MPEGCMAF], which is largely based on [ISOBMFF].

5.2. Timing model

The purpose of this chapter is to give a holistic overview of DASH presentation timing and related segment addressing. It is not intended to provide details of the timing model and all possible uses of the attributes in [MPEGDASH].

In order to achieve higher interoperability, DASH-IF’s Implementation Guidelines allow considerably limited options than the ones provided by [MPEGDASH], constraining services to a specific set of reasonably flexible behaviors that are highly interoperable with modern client platforms. This chapter covers the timing model and related segment addressing schemes for these common use-cases.

5.2.1. Conformance requirements

This document adds additional constraints to [MPEGDASH] timing requirements.

To be conformant to this document:

5.2.2. MPD Timeline

[MPEGDASH] defines DASH general timing model in its clause 4.3.

The MPD defines the MPD timeline of a Media Presentation, which serves as the baseline for all scheduling decisions made during DASH presentation playback.

There exist two types of Media Presentations, indicated by the MPD@type.

The playback of a static MPD (defined in [MPEGDASH] as a MPD with MPD@type="static") does not depend on the mapping of the MPD timeline to real time. This means that entire presentation is available at any time and a client can play any part of the presentation at any time (e.g. it can start playback at any time and seek freely within the entire presentation).

The MPD timeline of a dynamic MPD (defined in [MPEGDASH] as a MPD with MPD@type="dynamic") has a fixed mapping to wall clock time, with each point on the MPD timeline corresponding to a point in real time. This means that segments of the presentation get available over time. Clients can introduce an additional offset with respect to wall clock time for the purpose of maintaining an input buffer to cop with network bandwidth fluctuations.

Note: In addition to mapping the MPD timeline to wall clock time, a dynamic MPD can be updated during the presentation. Updates may add new periods and remove or modify existing ones including adding new segments with progress in time, though some restrictions apply. See § 5.2.9.5 MPD updates.

The time zero on the MPD timeline of a dynamic MPD is mapped to the point in wall clock time indicated by MPD@availabilityStartTime.

The ultimate purpose of the MPD is to enable the client to obtain media samples for playback. Additionally it may possibly dynamically switch between different bitrate of the same content to adopt to the network bandwidth fluctuation. The following data structures are most relevant to locating and scheduling the samples:

  1. The MPD consists of consecutive periods which map data onto the MPD timeline.

  2. Each period contains of one or more representations, each of which provides media samples inside a sequence of media segments.

  3. Representations within a period are grouped in adaptation sets, which associate related representations and decorate them with metadata.

The primary elements described by an MPD.

5.2.3. Periods

An MPD defines an ordered list of one or more consecutive periods. A period is both a time span on the MPD timeline and a definition of the data to be presented during this time span. Period timing is relative to the zero point of the MPD timeline.

An MPD is a collection of consecutive periods.

Common reasons for defining multiple periods are:

Periods are self-contained - a client is not required to know the contents of another period in order to correctly present a period. Knowledge of the contents of different periods may be used by a client to achieve seamless period transitions, especially when working with period-connected representations.

The below static MPD consists of two 20-second periods. The duration of the first period is calculated using the start point of the second period. The total duration of the presentation is 40 seconds.
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" type="static">
  <Period>
    ...
  </Period>
  <Period start="PT20S" duration="PT20S">
    ...
  </Period>
</MPD>

Parts of the MPD structure that are not relevant for this chapter have been omitted - this is not a fully functional MPD file.

[MPEGDASH] clause 5.3.2 defines the period’s requirements in MPD authoring. Among others it requires the followings:

  1. All periods are consecutive and non-overlapping. A period may have a duration of zero.

Note: A period with a duration of zero might, for example, be the result of ad-insertion logic deciding not to insert any ad.

  1. In a static MPD, the first period starts at the time zero of the MPD timeline. In a dynamic MPD, the first period starts at or after the zero point of the MPD timeline.

  2. In a static MPD, either the last period has a Period@duration or MPD@mediaPresentationDuration exists.

  3. In a dynamic MPD, the last period may have a Period@duration, in which case it has a fixed duration. If without Period@duration, the last period in a dynamic MPD has an unknown duration, which allows to extend the timeline indefinitely.

Note: In a dynamic MPD, a period with an unknown duration may be converted to fixed-duration by an MPD update. Periods in a dynamic MPD can also be shortened or removed entirely under certain conditions. However, Media Presentation is defined until (current wall clock time + MPD@minimumUpdatePeriod), by which the current MPD is still valid. See § 5.2.9.5 MPD updates.

  1. MPD@mediaPresentationDuration may be present. If present, it accurately matches the duration between the time zero on the MPD timeline and the end of the last period. Clients must calculate the total duration of a static MPD by adding up the durations of each period and must rely on the presence of MPD@mediaPresentationDuration.

Note: This calculation is necessary because the durations of XLink periods can only be known after the XLink is resolved. Therefore it is impossible to always determine the total MPD duration on the service side as only the client is guaranteed to have access to all the required knowledge.

5.2.4. Representations

A representation is a sequence of segment as defined by [MPEGDASH] 5.3.1 and 5.3.5. A Representation element is a collection of these segment references and a description of the samples within the referenced media segments.

In practice, each representation usually belongs to exactly one adaptation set and often belongs to exactly one period, although a representation may be connected with a representation in another period.

Each segment reference addresses a media segment that corresponds to a specific time span on the sample timeline. Each media segment contains samples for a specific time span on the sample timeline.

Note: Simple addressing allows the actual time span of samples within a media segment to deviate from the corresponding time span described in the MPD ([MPEGDASH] 7.2.1). All timing-related clauses in this document refer to the timing described in the MPD (i.e. according to MPD timeline)unless otherwise noted.

The exact mechanism used to define segment references depends on the addressing mode used by the representation.

This document requires the following additional requirement:

As recommended by [MPEGDASH] 7.2.1:

This document additionally requires:

In a static MPD, the entire period must be covered with media segments.
In a dynamic MPD, the time shift buffer determines the set of required segment references in each representation. Media segments filled with gray need not be referenced due to falling outside the time shift buffer, despite falling within the bounds of a period.

Note: In a dynamic MPD, each Media segments only become available when its end point is within their availability window (This time may need to be adjusted by availabilityTimeOffset (need to be defined) and @availabilityTimeComplete values) ([MPEGDASH] 5.3.9.5.1 and 5.3.5.3). It is a valid situation that a media segment is required to be referenced but is not yet available.

As required by [MPEGDASH] 5.3.9.5.3:

As allowed by [MPEGDASH] 7.2.1:

An unnecessary segment reference is one that is not defined as required by this chapter.

This document requires the following additional requirements to [MPEGDASH]:

  1. The segment reference is for future content and will eventually become necessary.

  2. The segment reference is defined via indexed addressing.

  3. The segment reference is defined by an <S> element that defines multiple references using S@r, some of which are necessary.

  4. Removal of the segment reference is not allowed by content removal constraints.

This document also requires the following requirements for clients:

Media segments and samples need not align with period boundaries. Some samples may be entirely outside a period (marked gray) and some may overlap the period boundary (yellow).

Note: In the end, which samples are presented is entirely up to the client. It may sometimes be impractical to present media segments only partially, depending on the capabilities of the client platform, the type of media samples involved and any dependencies between samples.

5.2.5. Sample timeline

The samples within a representation exist on a linear sample timeline defined by the encoder that created the samples. One or more sample timelines are mapped onto the MPD timeline by metadata stored in or referenced by the MPD ([MPEGDASH] 7.3.2).
Sample timelines are mapped onto the MPD timeline based on parameters defined in the MPD.

Note: A sample timeline is linear - encoders are expected to use an appropriate timescale and sufficiently large timestamp fields to avoid any wrap-around. If wrap-around does occur, a new period must be started in order to establish a new sample timeline.

The sample timeline is formed after applying any [ISOBMFF] edit lists ([MPEGDASH] 7.3.2).

This document additionally requires:

Note: While optional in [MPEGDASH], the presence of the @timescale attribute is required by the interoperable timing model because the default value of 1 is unlikely to match any real-world content and is far more likely to indicate an unintentional content authoring error.

@presentationTimeOffset is the key component in establishing the relationship between the MPD timeline and a sample timeline.

The point on the sample timeline indicated by @presentationTimeOffset is equivalent to the period start point on the MPD timeline ([MPEGDASH] Table 15). The value is provided by SegmentTemplate@presentationTimeOffset or SegmentBase@presentationTimeOffset, depending on the addressing mode, and has a default value of 0 timescale units.

Note: To transform a sample timeline position SampleTime to an MPD timeline position, use the formula MpdTime = Period@start + (SampleTime - @presentationTimeOffset) / @timescale.

5.2.6. Clock drift is forbidden

Some encoders experience clock drift - they do not produce exactly 1 second worth of output per 1 second of input, either stretching or compressing the sample timeline with respect to the MPD timeline.

This document adds the following requirement:

If a packager receives input from an encoder at the wrong rate, it must take corrective action. For example, it might:

  1. Drop a span of content if input is produced faster than real-time.

  2. Insert regular padding content if input is produced slower than real-time. This padding can take different forms:

    • Silence or a blank picture.

    • Repeating frames.

    • Insertion of short-duration periods where the affected representations are not present.

Of course, such after-the-fact corrective actions can disrupt the end-user experience. The optimal solution is to fix the defective encoder.

5.2.7. Media segments

A media segment is an HTTP-addressable data structure that contains one or more media samples.

Note: Different media segments may be different byte ranges accessed on the same URL.

[MPEGCMAF] requires that Media segments contain one or more consecutive media samples, and consecutive media segments in the same representation contain consecutive media samples.

[MPEGDASH] 7.2.1 requires the followings:

[MPEGCMAF] 7.3.4 and [MPEGDASHCMAFPROFILE] requires the following:

5.2.7.1. Media segment duration deviation

When using simple addressing, the samples contained in a media segment may cover a different time span on the sample timeline than what is indicated by the nominal timing in the MPD timeline. This deviation is defined as the offset between the edges of the nominal time span (as defined by MPD timeline) and the edges of the true time span (as defined by [=sample timeline], and is calculated separately for each edge.

In simple addressing, a media segment may cover a different time span on the sample timeline than what is indicated by the nominal timing in the MPD timeline. Red boxes indicate samples.

[MPEGDASH] 7.2.1 requires: The duration deviation is no more than 50% of the nominal media segment duration and may be in either direction.

This document also recommends:

Note: [DVB-DASH] defines some relevant constraints in section 4.5. Consider obeying these constraints to be compatible with [DVB-DASH].

5.2.7.2. Segments must be aligned

Media segments are said to be aligned if the earliest presentation time of all media segments on the sample timeline is equal in all representations that belong to the same adaptation set.

[MPEGDASHCMAFPROFILE] requires:

5.2.8. Period connectivity

The precise definition of Period connectivity can found in [MPEGDASH] 5.3.2.4. However, generally speaking, in certain circumstances content may be offered such that a representation is technically compatible with the content of a representation in a previous period. Such representations are period-connected.

Any subset of the representations in a period may be period-connected with their counterparts in a future or past period. Period connectivity may be chained across any number of periods.

Note: Connectivity is generally achieved by using the same encoder to encode the content of multiple periods using the same settings. Keep in mind, however, that decryption is also a part of the client media pipeline - it is not only the codec parameters that are configured by the initialization segment; different decryption parameters are likely to break connectivity that would otherwise exist.

For signaling the period connectivity between representation of two periods in a MPD, [MPEGDASH] 5.3.2.4 requires:

Representations can be signaled as period-connected, enabling client optimizations. Arrows on diagram indicate direction of connectivity reference (from future to past), with the implied message being "the client can use the same decoder it used where the arrow points to".

Note: Not all representations in an adaptation set need to be period-connected. For example, if a new period is introduced to add a representation that contains a new video quality level, all other representations will likely be connected but not the one that was added.

Note that [MPEGDASH] allows:

The same media segment will often exist in two periods at a period-connected transition. On the diagram, this is segment 4.

This document recommends:

This document also recommends:

Note: The exact mechanism that ensures seamless playback depends on client capabilities and will be implementation-specific. Any shared media segment overlapping the period boundary may need to be detected and deduplicated to avoid presenting it twice.

5.2.8.1. Period continuity

In addition to period connectivity, [MPEGDASH] 5.3.2.4 defines period continuity, which is a special case of period connectivity where the two samples on the boundary between the connected representations are consecutive on the same sample timeline. Continuity implies connectivity.

Note: The above can only be true if the sample boundary exactly matches the period boundary.

For signaling the period continuity, [MPEGDASH] 5.3.2.4 requires:

This document requires:

This document requires:

5.2.9. Dynamic MPDs

This section only applies to dynamic MPDs.

Three main factors differentiate them from static MPDs:

  1. The segments described in a dynamic MPD may become available over time, i.e. not all segments are available.

  2. Playback of a dynamic MPD is synchronized to a real time clock (with some amount of client-chosen time shift allowed).

  3. A dynamic MPD may change over time, with clients retrieving new snapshots of the MPD when the validity duration of the previous snapshot expires.

[MPEGDASH] 5.4.1 requires:

The MPD validity duration starts when the MPD download is initiated by a client, which may be some time after it is generated/published!

This document requires: DASH clients SHALL support the presentation of dynamic MPDs.

5.2.9.1. Real time clock synchronization

It is critical to synchronize the clocks of the client with the clock of service when using a dynamic MPD. The time indicated by the clock does not necessarily need to match some universal standard as long as the two are mutually synchronized.

The use of UTCTiming is optional in [MPEGDASH].

This document requires:

The use of a "default time source" is not allowed. The mechanism of time synchronization must always be explicitly defined in the MPD by every service.

This document requires:

We could benefit from some detailed examples here, especially as clock sync is such a critical element of live services.

5.2.9.2. Availability

A media segment is available when an HTTP request to acquire the media segment can be started and successfully performed to completion by a client. During playback of a dynamic MPD, new media segments continuously become available and stop being available with the passage of time. [MPEGDASH] defines the segment availability times of a segment as the duration in wall-clock time in which that segment is available.

An availability window is a time span on the MPD timeline that determines which media segments can be expected to be available. Each representation has its own availability window. Consequently, availability window at each moment is defined by the union of segment availability times of all available segments at that moment.

A segment start point (referred to as MPD start time of a segment in [MPEGDASH]) is the presentation start time of the segment in MPD timeline.

A segment end point is defined is the presentation end time of the segment in MPD timeline.

[!MPEGDASH]] requires:

It is the responsibility of the service to ensure that media segments are available to clients when they are described as available by the MPD. Consider that the criterium for availability is a successful download by clients, not successful publishing from a packager.

The availability window is calculated as follows:

  1. Let now be the current wall clock time according to the synchronized clock.

  2. Let AvailabilityWindowStart be now - MPD@timeShiftBufferDepth.

    • If MPD@timeShiftBufferDepth is not defined, let AvailabilityWindowStart be MPD@availabilityStartTime.

  3. Let TotalAvailabilityTimeOffset be the sum of all @availabilityTimeOffset values that apply to the representation (those directly on the Representation element and any of its ancestors).

  4. The availability window is the time span from AvailabilityWindowStart to now + TotalAvailabilityTimeOffset.

The availability window determines which media segments can be expected to be available, based on where their segment end point lies.

This document requires:

5.2.9.3. Time shift buffer

The time shift buffer is a time span on the MPD timeline that defines the set of media segments that a client is allowed to present at the current moment in time according to the synchronized clock (now).

This is the mechanism by which clients can introduce a time shift (an offset) between real time and the MPD timeline when presenting dynamic MPDs. The time shift is zero when a client always chooses to play back the media segment at the end point of the time shift buffer. By playing back media segments from further in the past, a time shift is introduced.

Note: A time shift of 30 seconds means that the client starts presenting a media segment at the moment when its position on the MPD timeline reaches a distance of 30 seconds from the end of the time shift buffer.

The following additional factors further constrain the set of media segments that can be presented at the current time and can force a client to introduce a time shift:

  1. § 5.2.9.2 Availability - not every media segment in the time shift buffer is guaranteed to be available.

  2. § 5.2.9.4 Presentation delay - the service may define a delay that forbids the use of a section of the time shift buffer.

The time shift buffer extends from now - MPD@timeShiftBufferDepth to now. In the absence of MPD@timeShiftBufferDepth the start of the time shift buffer is MPD@availabilityStartTime.

Media segments overlapping the time shift buffer may potentially be presented by a client, if other constraints do not forbid it.

This document requires:

A dynamic MPD SHALL contain a period that ends at or overlaps the end point of the time shift buffer, except when reaching the end of live content in which case the last period MAY end before the end of the time shift buffer.

5.2.9.4. Presentation delay

There is a natural conflict between the availability window and the time shift buffer. It is legal for a client to present media segments as soon as they overlap the time shift buffer, yet such media segments might not yet be available.

Furthermore, the delay between media segments entering the time shift buffer and becoming available might be different for different representations that use different media segment durations. This difference may also change over time if a representation does not use a constant media segment duration.

This document requires:

[MPEGDASH] allows:

This document requires:

Note: As different clients might use different algorithms for calculating the presentation delay, providing MPD@suggestedPresentationDelay enables services to roughly synchronize the playback start position of clients.

The effective time shift buffer is the time span from the start of the time shift buffer to now - PresentationDelay.

Media segments that overlap the effective time shift buffer are the ones that may be presented at time now. Two representations with different segment lengths are shown. Diagram assumes @availabiltiyTimeOffset=0.

This document requires:

5.2.9.5. MPD updates

Dynamic MPDs may change over time. The nature of the change is not restricted unless such a restriction is explicitly defined.

Some common reasons to make changes in dynamic MPDs:

[MPEGDASH] 5.4.1 requires the following restrictions for MPD updates:

Additional restrictions on MPD updates are defined by other parts of this document.

This document requires:

This document also requires:

5.2.9.5.1. Adding content to the MPD

[!MPEGDASH]] allows two mechanisms for adding content:

Multiple content adding mechanisms may be combined in a single MPD update. An MPD update that adds content may be combined with an MPD update that removes content.

MPD updates can add both segment references and periods (additions highlighted in blue).

This document requires:

Note: The duration of the last period cannot change as a result of adding segment references. A live service will generally use a period with an unlimited duration to continuously add new segment references.

When using simple addressing or explicit addressing, it is possible for a period to define an infinite sequence of segment references that extends to the end of the period (e.g. using SegmentTemplate@duration or r="-1"). Such self-extending reference sequences are equivalent to explicitly defined segment reference sequences that extend to the end of the period and clients MAY obtain new segment references from such sequences even between MPD updates.

5.2.9.5.2. Removing content from the MPD

Removal of content is only allowed if the content to be removed is not yet available to clients and guaranteed not to become available until clients receive the MPD update. See § 5.2.9.2 Availability.

To determine the content that may be removed, let EarliestRemovalPoint be availability window end + MPD@minimumUpdatePeriod.

Note: As each representation has its own availability window, so does each representation have its own EarliestRemovalPoint.

MPD updates can remove both segment references and periods (removals highlighted in red).

An MPD update removing content MAY remove any segment references to media segments that start after EarliestRemovalPoint at the time the update is published.

Media segments that overlap or end before EarliestRemovalPoint might be considered by clients to be available at the time the MPD update is processed and therefore SHALL NOT be removed by an MPD update.

The following mechanisms exist removing content:

Multiple content removal mechanisms MAY be combined in a single MPD update.

Note: When using indexed addressing or simple addressing, removal of segment references from the end of the period only requires changing Period@duration. When using explicit addressing, pruning some S elements may be appropriate to avoid leaving unnecessary segment references.

Clients SHALL NOT fail catastrophically if an MPD update removes already buffered data but MAY incur unexpected time shift or a visible transition at the point of removal. It is the responsibility of the service to avoid removing data that may already be in use.

In addition to editorial removal from the end of the MPD, content naturally expires due to the passage of time. Expired content also needs to be removed:

An MPD update that removes content MAY be combined with an MPD update that adds content.

5.2.9.5.3. End of live content

Live services can reach a point where no more content will be produced - existing content will be played back by clients and once they reach the end, playback will cease.

This document requires:

5.2.9.6. MPD refreshes

To stay informed of the MPD updates, clients need to perform MPD refreshes at appropriate moments to download the updated MPD snapshots.

This document requires:

  1. When an MPD snapshot is downloaded, it is valid for the present moment and at least MPD@minimumUpdatePeriod after that.

  2. A client can expect to be able to successfully download any media segments that the MPD defines as available at any point during the MPD validity duration.

  3. The clients MAY refresh the MPD at any point. Typically this will occur because the client wants to obtain more segment references or make more media segments (for which it might already have references) available by extending the MPD validity duration.

    • This may result in a different MPD snapshot being downloaded, with updated information.

    • Or it may be that the MPD has not changed, in which case its validity period is extended to now + MPD@minimumUpdatePeriod.

Note: There is no requirement that clients poll for updates at MPD@minimumUpdatePeriod interval. They can do so as often or as rarely as they wish - this attribute simply defines the MPD validity duration.

Services may publish in-band events to explicitly signal MPD validity instead of expecting clients to regularly refresh on their own initiative. This enables finer control by the service but might not be supported by all clients.

This document requires:

5.2.10. Timing of stand-alone IMSC1 and WebVTT text files

Some services store text adaptation sets in stand-alone IMSC1 or WebVTT files, without segmentation or [ISOBMFF] encapsulation.

This document requires:

IMSC1 subtitles in stored in a stand-alone XML file.
<AdaptationSet mimeType="application/ttml+xml" lang="en-US">
  <Role schemeIdUri="urn:mpeg:dash:role:2011" value="subtitle" />
  <Representation>
    <BaseURL>subtitles_en_us.xml</BaseURL>
  </Representation>
</AdaptationSet>

Parts of the MPD structure that are not relevant for this chapter have been omitted - this is not a fully functional AdaptationSet element.

5.2.11. Forbidden techniques

Some aspects of [MPEGDASH] are not compatible with the interoperable timing model defined in this document. In the interest of clarity, they are explicitly listed here:

5.2.12. Examples

This section is informative.

5.2.12.1. Offer content with imperfectly aligned tracks

It may be that for various content processing workflow reasons, some tracks have a different duration from others. For example, the audio track might start a fraction of a second before the video track and end some time before the video track ends.

Content with different track lengths, before packaging as DASH.

You now have some choices to make in how you package these tracks into a DASH presentation that conforms to this document. Specifically, there exists the requirement that every representation must cover the entire period with media samples.

Content may be cut (indicated in black) to equalize track lengths.

The simplest option is to define a single period that contains representations resulting from cutting the content to match the shortest common time span, thereby covering the entire period with samples. Depending on the nature of the data that is removed, this may or may not be acceptable.

Content may be padded (indicated in green) to equalize track lengths.

If you wish to preserve track contents in their entirety, the most interoperable option is to add padding samples (e.g. silence or black frames) to all tracks to ensure that all representations have enough data to cover the entire period with samples. This may require customization of the encoding process, as the padding must match the codec configuration of the real content and might be impractical to add after the real content has already been encoded.

New periods may be started at any change in the set of available tracks.

Another option that preserves track contents is to split the content into multiple periods that each contain a different set of representations, starting a new period whenever a track starts or ends. This enables you to ensure every representations covers its period with samples. The upside of this approach is that it can be done easily, requiring only manipulation of the MPD. The downside is that some clients may be unable to seamlessly play across every period transition.

You may combine the different approaches, cutting in some places (black), padding in others (green) and defining multiple periods as needed.

You may wish to combine the different approaches, depending on the track, to achieve the optimal result.

Some clients are known to fail when transitioning from a period with audio and video to a period with only one of these components. You should avoid such transitions unless you have exact knowledge of the capabilities of your clients.

5.2.12.2. Split a period

There exist scenarios where you would wish to split a period in two. Common reasons would be:

This example shows how an existing period can be split in a way that clients capable of seamless period-connected playback do not experience interruptions in playback among representations that are present both before and after the split.

Our starting point is a presentation with a single period that contains an audio representation with short samples and a video representation with slightly longer samples, so that media segment start points do not always overlap.

Presentation with one period, before splitting. Blue is a segment, yellow is a sample. Duration in arbitrary units is listed on samples. Segment durations are taken to be the sum of sample durations. presentationTimeOffset may have any value - it is listed because will be referenced later.

Note: Periods may be split at any point in time as long as both sides of the split remain in conformance to this document (e.g. each contains at least 1 media segment). Furthermore, period splitting does not require manipulation of the segments themselves, only manipulation of the MPD.

Let’s split this period at position 220. This split occurs during segment 3 for both representations and during sample 8 and sample 5 of the audio and video representation, respectively.

The mechanism that enables period splitting in the middle of a segment is the following:

After splitting the example presentation, we arrive at the following structure.

Presentation with two periods, after splitting. Audio segment 3 and video segment 3 are shared by both periods, with the connectivity signaling indicating that seamless playback with de-duplicating behavior is expected from clients.

If indexed addressing is used, both periods will reference all segments as both periods will use the same unmodified index segment. Clients are expected to ignore media segments that fall outside the period bounds.

Simple addressing has significant limitations on alignment at period start, making it unsuitable for some multi-period scenarios. See § 5.3.4.2 Moving the period start point (simple addressing).

Other periods (e.g. ads) may be inserted between the two periods resulting from the split. This does not affect the addressing and timing of the two periods.

5.2.12.3. Change the default_KID

In encrypted content, the default_KID of a representation might need to be changed at certain points in time. Often, the changes are closely synchronized in different representations.

To perform the default_KID change, start a new period on every change, treating each representation as an independently changing element. With proper signaling, clients can perform this change seamlessly.

What about period connectivity? #238

A change in default_KID starts a new period. Orange indicates audio and yellow video representation.

The same pattern can also be applied to other changes in representation configuration.

5.3. Segment addressing modes

This section defines the addressing modes that can be used for referencing media segments, initialization segments and index segments in interopreable DASH presentations.

Addressing modes not defined in this chapter SHALL NOT be used by DASH services. Clients SHOULD support all addressing modes defined in this chapter.

All representations in the same adaptation set SHALL use the same addressing mode. Representations in different adaptation sets MAY use different addressing modes. Period-connected representations SHALL use the same addressing mode in every period.

You SHOULD choose the addressing mode based on the nature of the content:

Content generated on the fly

Use explicit addressing.

Content generated in advance of publishing

Use indexed addressing or explicit addressing.

A service MAY use simple addressing which enables the packager logic to be very simple. This simplicity comes at a cost of reduced applicability to multi-period scenarios and reduced client compatibility.

Note: Future updates to [MPEGDASH] are expected to eliminate the critical limitations of simple addressing, enabling a wider range of applicable use cases.

Update to match [MPEGDASH] 4th edition.

Indexed addressing enables all data associated with a single representation to be stored in a single CMAF track file from which byte ranges are served to clients to supply media segments, the initialization segment and the index segment. This gives it some unique advantages:

5.3.1. Indexed addressing

A representation that uses indexed addressing consists of a CMAF track file containing an index segment, an initialization segment and a sequence of media segments.

Note: This addressing mode is sometimes called "SegmentBase" in other documents.

Clauses in section only apply to representations that use indexed addressing.

Note: [MPEGDASH] makes a distinction between "segment" (HTTP-addressable entity) and "subsegment" (byte range of an HTTP-addressable entity). This document does not make such a distinction and has no concept of subsegments. Usage of "segment" here matches the definition of CMAF segment [MPEGCMAF].

Indexed addressing is based on an index segment that references all media segments.

The MPD defines the byte range in the CMAF track file that contains the index segment. The index segment informs the client of all the media segments that exist, the time spans they cover on the sample timeline and their byte ranges.

Multiple representations SHALL NOT be stored in the same CMAF track file (i.e. no multiplexed representations are to be used).

At least one Representation/BaseURL element SHALL be present in the MPD, containing a URL pointing to the CMAF track file.

The SegmentBase@indexRange attribute SHALL be present in the MPD. The value of this attribute identifies the byte range of the index segment in the CMAF track file. The value is a byte-range-spec as defined in [RFC7233], referencing a single range of bytes.

The SegmentBase@timescale attribute SHALL be present and its value SHALL match the value of the timescale field in the index segment (in the [ISOBMFF] sidx box) and the value of the timescale field in the initialization segment (in the [[!ISOBMFF tkhd box)]]).

The SegmentBase/Initialization@range attribute SHALL identify the byte range of the initialization segment in the CMAF track file. The value is a byte-range-spec as defined in [RFC7233], referencing a single range of bytes. The Initialization@sourceURL attribute SHALL NOT be used.

Below is an example of common usage of indexed addressing.

The example defines a timescale of 48000 units per second, with the period starting at position 8100 (or 0.16875 seconds) on the sample timeline. The client can use the index segment referenced by indexRange to determine where the media segment containing position 8100 (and all other media segments) can be found. The byte range of the initialization segment is also provided.

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011">
  <Period>
    <AdaptationSet>
      <Representation>
        <BaseURL>showreel_audio_dashinit.mp4</BaseURL>
        <SegmentBase timescale="48000" presentationTimeOffset="8100" indexRange="848-999">
          <Initialization range="0-847"/>
        </SegmentBase>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Parts of the MPD structure that are not relevant for this chapter have been omitted - this is not a fully functional MPD file.

5.3.2. Structure of the index segment

The index segment SHALL consist of a single Segment Index Box (sidx) as defined by [ISOBMFF]. The field layout is as follows:

aligned(8) class SegmentIndexBox extends FullBox('sidx', version, 0) {
  unsigned int(32) reference_ID;
  unsigned int(32) timescale;

  if (version==0) {
    unsigned int(32) earliest_presentation_time;
    unsigned int(32) first_offset;
  }
  else {
    unsigned int(64) earliest_presentation_time;
    unsigned int(64) first_offset;
  }

  unsigned int(16) reserved = 0;
  unsigned int(16) reference_count;

  for (i = 1; i <= reference_count; i++)
  {
    bit (1) reference_type;
    unsigned int(31) referenced_size;
    unsigned int(32) subsegment_duration;
    bit(1) starts_with_SAP;
    unsigned int(3) SAP_type;
    unsigned int(28) SAP_delta_time;
  }
}

The values of the fields are determined as follows:

reference_ID

The track_ID of the [ISOBMFF] track that contains the data of this representation.

timescale

Same as the timescale field of the Media Header Box and same as the SegmentBase@timescale attribute in the MPD.

earliest_presentation_time

The start timestamp of the first media segment on the sample timeline, in timescale units.

first_offset

Distance from the end of the index segment to the first media segment, in bytes. For example, 0 indicates that the first media segment immediately follows the index segment.

reference_count

Total number of media segments referenced by the index segment.

reference_type

0

referenced_size

Size of the media segment in bytes. Media segments are assumed to be consecutive, so this is also the distance to the start of the next media segment.

subsegment_duration

Duration of the media segment in timescale units.

starts_with_SAP

1

SAP_type

Either 1 or 2, depending on the sample structure in the media segment.

SAP_delta_time

0

We need to clarify how to determine the right value for SAP_type. #235

5.3.2.1. Moving the period start point (indexed addressing)

When splitting periods in two or performing other types of editorial timing adjustments, a service might want to start a period at a point after the "natural" start point of the representations within.

For representations that use indexed addressing, perform the following adjustments to set a new period start point:

  1. Update SegmentBase@presentationTimeOffset to indicate the desired start point on the sample timeline.

  2. Update Period@duration to match the new duration.

5.3.3. Explicit addressing

A representation that uses explicit addressing consists of a set of media segments accessed via URLs constructed using a template defined in the MPD, with the exact time span covered by each media segment described in the MPD.

Note: This addressing mode is sometimes called "SegmentTemplate with SegmentTimeline" in other documents.

Clauses in section only apply to representations that use explicit addressing.

Explicit addressing uses a segment template that is combined with explicitly defined time spans for each media segment in order to reference media segments, either by start time or by sequence number.

The MPD SHALL contain a SegmentTemplate/SegmentTimeline element, containing a set of segment references, which satisfies the requirements defined in this document. The segment references exist as a sequence of S elements, each of which references one or more media segments with start time S@t and duration S@d timescale units on the sample timeline. The SegmentTemplate@duration attribute SHALL NOT be present.

To enable concise segment reference definitions, an S element may represent a repeating segment reference that indicates a number of repeated consecutive media segments with the same duration. The value of S@r SHALL indicate the number of additional consecutive media segments that exist.

Note: Only additional segment references are counted, so S@r=5 indicates a total of 6 consecutive media segments with the same duration.

The start time of a media segment is calculated from the start time and duration of the previous media segment if not specified by S@t. There SHALL NOT be any gaps or overlap between media segments.

The value of S@r is nonnegative, except for the last S element which MAY have a negative value in S@r, indicating that the repeated segment references continue indefinitely up to a media segment that either ends at or overlaps the period end point.

Updates to a dynamic MPD MAY add more S elements, remove expired S elements, increment SegmentTemplate@startNumber, add the S@t attribute to the first S element or increase the value of S@r on the last S element but SHALL NOT otherwise modify existing S elements.

The SegmentTemplate@media attribute SHALL contain the URL template for referencing media segments, using either the $Time$ or $Number$ template variable to unique identify media segments. The SegmentTemplate@initialization attribute SHALL contain the URL template for referencing initialization segments.

If using $Number$ addressing, the number of the first segment reference is defined by SegmentTemplate@startNumber (default value 1). The S@n attribute SHALL NOT be used - segment numbers form a continuous sequence starting with SegmentTemplate@startNumber.

Below is an example of common usage of explicit addressing.

The example defines 225 media segments starting at position 900 on the sample timeline and lasting for a total of 900.225 seconds. The period ends at 900 seconds, so the last 0.225 seconds of content is clipped (out of bounds samples may also simply be omitted from the last media segment). The period starts at position 900 which matches the start position of the first media segment found at the relative URL video/900.m4s.

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011">
  <Period duration="PT900S">
    <AdaptationSet>
      <Representation>
        <SegmentTemplate timescale="1000" presentationTimeOffset="900"
            media="video/$Time$.m4s" initialization="video/init.mp4">
          <SegmentTimeline>
            <S t="900" d="4001" r="224" />
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Parts of the MPD structure that are not relevant for this chapter have been omitted - this is not a fully functional MPD file.

Below is an example of explicit addressing used in a scenario where different media segments have different durations (e.g. due to encoder limitations).

The example defines a sequence of 11 media segments starting at position 120 on the sample timeline and lasting for a total of 95520 units at a timescale of 1000 units per second (which results in 95.52 seconds of data). The period starts at position 810, which is within the first media segment, found at the relative URL video/120.m4s. The fifth media segment repeats once, resulting in a sixth media segment with the same duration.

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011">
  <Period>
    <AdaptationSet>
      <Representation>
        <SegmentTemplate timescale="1000" presentationTimeOffset="810"
            media="video/$Time$.m4s" initialization="video/init.mp4">
          <SegmentTimeline>
            <S t="120" d="8520"/>
            <S d="8640"/>
            <S d="8600"/>
            <S d="8680"/>
            <S d="9360" r="1"/>
            <S d="8480"/>
            <S d="9080"/>
            <S d="6440"/>
            <S d="10000"/>
            <S d="8360"/>
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Parts of the MPD structure that are not relevant for this chapter have been omitted - this is not a fully functional MPD file.

5.3.3.1. Moving the period start point (explicit addressing)

When splitting periods in two or performing other types of editorial timing adjustments, a service might want to start a period at a point after the "natural" start point of the representations within.

For representations that use explicit addressing, perform the following adjustments to set a new period start point:

  1. Update SegmentTemplate@presentationTimeOffset to indicate the desired start point on the sample timeline.

  2. Update Period@duration to match the new duration.

  3. Remove any unnecessary segment references.

  4. If using the $Number$ template variable, increment SegmentTemplate@startNumber by the number of media segments removed from the beginning of the representation.

Note: See § 5.2.4 Representations and § 5.2.9.5.2 Removing content from the MPD to understand the constraints that apply to segment reference removal.

5.3.4. Simple addressing

Once we have a specific @earliestPresentationTime proposal submitted to MPEG we need to update this section to match. See #245. This is now done in [MPEGDASH] 4th edition - need to synchronize this text.

A representation that uses simple addressing consists of a set of media segments accessed via URLs constructed using a template defined in the MPD, with the nominal time span covered by each media segment described in the MPD.

Simple addressing defines the nominal time span of each media segment in the MPD. The true time span covered by samples within the media segment can be slightly different than the nominal time span. See § 5.3.4.1 Inaccuracy in media segment timing when using simple addressing.

Note: This addressing mode is sometimes called "SegmentTemplate without SegmentTimeline" in other documents.

Clauses in section only apply to representations that use simple addressing.

Simple addressing uses a segment template that is combined with approximate first media segment timing information and an average media segment duration in order to reference media segments, either by start time or by sequence number.

The SegmentTemplate@duration attribute SHALL define the nominal duration of a media segment in timescale units.

The set of segment references SHALL consist of the first media segment starting exactly at the period start point and all other media segments following in a consecutive series of equal time spans of SegmentTemplate@duration timescale units, ending with a media segment that ends at or overlaps the period end time.

The SegmentTemplate@media attribute SHALL contain the URL template for referencing media segments, using either the $Time$ or $Number$ template variable to uniquely identify media segments. The SegmentTemplate@initialization attribute SHALL contain the URL template for referencing initialization segments.

If using $Number$ addressing, the number of the first segment reference is defined by SegmentTemplate@startNumber (default value 1).

Below is an example of common usage of simple addressing.

The example defines a sample timeline with a timescale of 1000 units per second, with the period starting at position 900. The average duration of a media segment is 4001. Media segment numbering starts at 800, so the first media segment is found at the relative URL video/800.m4s. The sequence of media segments continues to the end of the period, which is 900 seconds long, making for a total of 225 defined segment references.

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011">
  <Period duration="PT900S">
    <AdaptationSet>
      <Representation>
        <SegmentTemplate timescale="1000" presentationTimeOffset="900"
            media="video/$Number$.m4s" initialization="video/init.mp4"
            duration="4001" startNumber="800" />
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Parts of the MPD structure that are not relevant for this chapter have been omitted - this is not a fully functional MPD file.

5.3.4.1. Inaccuracy in media segment timing when using simple addressing

When using simple addressing, the samples contained in a media segment MAY cover a different time span on the sample timeline than what is indicated by the nominal timing in the MPD, as long as no constraints defined in this document are violated by this deviation.

Simple addressing relaxes the requirement on media segment contents matching the sample timeline. Red boxes indicate samples.

The allowed deviation is defined as the maximum offset between the edges of the nominal time span (as defined by the MPD) and the edges of the true time span (as defined by the contents of the media segment). The deviation is evaluated separately for each edge.

This allowed deviation does not relax any requirements that do not explicitly define an exception. For example, periods must still be covered with samples for their entire duration, which constrains the flexibility allowed for the first and last media segment in a period.

The deviation SHALL be no more than 50% of the nominal media segment duration and MAY be in either direction.

Note: This results in a maximum true duration of 200% (+50% outward extension on both edges) and a minimum true duration of 1 sample (-50% inward from both edges would result in 0 duration but empty media segments are not allowed).

Allowing inaccurate timing is intended to enable reasoning on the sample timeline using average values for media segment timing. If the addressing data says that a media segment contains 4 seconds of data on average, a client can predict with reasonable accuracy which samples are found in which media segments, while at the same time the service is not required to publish per-segment timing data in the MPD. It is expected that the content is packaged with this contraint in mind (i.e. every segment cannot be inaccurate in the same direction - a shorter segment now implies a longer segment in the future to make up for it).

Consider a media segment with a nominal start time of 8 seconds from period start and a nominal duration of 4 seconds, within a period of unlimited duration.

The following are all valid contents for such a media segment:

Near period boundaries, all the constraints of timing and addressing must still be respected! Consider a media segment with a nominal start time of 0 seconds from period start and a nominal duration of 4 seconds. If such a media segment contained samples from 1 to 5 seconds (offset of 1 second away from zero point at both ends, which is within acceptable limits) it would be non-conforming because of the requirement in § 5.2.7 Media segments that the first media segment contain a media sample that starts at or overlaps the period start point. This severely limits the usefulness of simple addressing.

5.3.4.2. Moving the period start point (simple addressing)

When splitting periods in two or performing other types of editorial timing adjustments, a service might want to start a period at a point after the "natural" start point of the representations within.

Simple addressing is challenging to use in such scenarios. You SHOULD convert simple addressing representations to use explicit addressing before adjusting the period start point or splitting a period. See § 5.3.4.3 Converting simple addressing to explicit addressing.

The rest of this chapter provides instructions for situations where you choose not to convert to explicit addressing.

To move the period start point, for representations that use simple addressing:

Note: If you are splitting a period, also keep in mind the requirements on period end point sample alignment for the period that remains before the split point.

Finding a suitable new start point that conforms to the above requirements can be very difficult. If inaccurate timing is used, it may be altogether impossible. This is a limitation of simple addressing.

Having ensured conformance to the above requirements for the new period start point, perform the following adjustments:

  1. Update SegmentTemplate@presentationTimeOffset to indicate the desired start point on the sample timeline.

  2. If using the $Number$ template variable, increment SegmentTemplate@startNumber by the number of media segments removed from the beginning of the representation.

  3. Update Period@duration to match the new duration.

5.3.4.3. Converting simple addressing to explicit addressing

It may sometimes be desirable to convert a presentation from simple addressing to explicit addressing. This chapter provides an algorithm to do this.

Simple addressing allows for inaccuracy in media segment timing. No inaccuracy is allowed by explicit addressing. The mechanism of conversion described here is only valid when there is no inaccuracy. If the nominal time spans in original the MPD differ from the true time spans of the media segments, re-package the content from scratch using explicit addressing instead of converting.

To perform the conversion, execute the following steps:

  1. Calculate the number of media segments in the representation as SegmentCount = Ceil(AsSeconds(Period@duration) / ( SegmentTemplate@duration / SegmentTemplate@timescale)).

  2. Update the MPD.

    1. Add a single SegmentTemplate/SegmentTimeline element.

    2. Add a single SegmentTimeline/S element.

    3. Set S@t to equal SegmentTemplate@presentationTimeOffset.

    4. Set S@d to equal SegmentTemplate@duration.

    5. Remove SegmentTemplate@duration.

    6. Set S@r to SegmentCount - 1.

Below is an example of a simple addressing representation before conversion.
<MPD xmlns="urn:mpeg:dash:schema:mpd:2011">
  <Period duration="PT900S">
    <AdaptationSet>
      <Representation>
        <SegmentTemplate timescale="1000" presentationTimeOffset="900"
            media="video/$Number$.m4s" initialization="video/init.mp4"
            duration="4001" startNumber="800" />
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

As part of the conversion, we calculate SegmentCount = Ceil(900 / (4001 / 1000)) = 225.

After conversion, we arrive at the following result.

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011">
  <Period duration="PT900S">
    <AdaptationSet>
      <Representation>
        <SegmentTemplate timescale="1000" presentationTimeOffset="900"
            media="video/$Number$.m4s" initialization="video/init.mp4"
            startNumber="800">
          <SegmentTimeline>
            <S t="900" d="4001" r="224" />
          </SegmentTimeline>
        </SegmentTemplate>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

Parts of the MPD structure that are not relevant for this chapter have been omitted - the above are not fully functional MPD files.

5.4. Adaptation set contents

Adaptation sets SHALL contain media segments compatible with a single decoder, although services MAY require the decoder to be re-initialized when switching to a new representation. See also § 6.4 Bitstream switching.

All representations in the same adaptation set SHALL have the same timescale, both in the MPD and in the initialization segment tkhd boxes.

[ISOBMFF] edit lists SHALL be identical for all representations in an adaptation set.

Note: [DVB-DASH] defines some relevant constraints in section 4.5. Consider obeying these constraints to be compatible with [DVB-DASH].

5.5. Adaptation set types

Each adaptation set SHALL match exactly one category from among the following:

What exactly is metadata @codecs supposed to be? https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues/290

The adaptation set type SHALL be used by a DASH client to identify the appropriate handler for rendering. Typically, a DASH client selects at most one adaptation set of each type.

In addition, a DASH client SHOULD use the value of the @codecs parameter to determine whether the underlying media playback platform can play the media contained within the adaptation set.

See § 11 Media coding technologies for detailed codec-specific constraints.

5.6. Video adaptation set constraints

All representations in the same video adaptation set SHALL be alternative encodings of the same source content, encoded such that switching between them does not produce visual glitches due to picture size or aspect ratio differences.

An illustration here would be very useful.

https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues/284

To avoid visual glitches you must ensure that the sample aspect ratio is set correctly. For reasons of coding efficiency and due to technical constraints, different representations might use a different picture aspect ratio. Each representation signals a sample aspect ratio (e.g. in an [MPEGAVC] aspect_ratio_idc) that is used to scale the picture so that every representation ends up at the same display aspect ratio. The formula is display aspect ratio = picture aspect ratio / sample aspect ratio.

In the MPD, the display aspect ratio is AdaptationSet@par and the sample aspect ratio is Respresentation@sar. The picture aspect ratio is not directly present but is derived from Representation@width and Representation@height.

The encoded picture SHALL only contain the active video area, so that clients can frame the height and width of the encoded video to the size and shape of their currently selected display area without extraneous padding in the decoded video, such as "letterbox bars" or "pillarbox bars".

Representations in the same video adaptation set SHALL NOT differ in any of the following parameters:

If different video adaptation sets differ in any of the above parameters, these parameters SHOULD be signaled in the MPD on the adaptation set level by a supplemental property descriptor or an essential property descriptor with @schemeIdUri="urn:mpeg:mpegB:cicp:<Parameter>" as defined in [iso23001-8] and <Parameter> being one of the following: ColourPrimaries, TransferCharacteristics, or MatrixCoefficients. The @value attribute SHALL be set as defined in [iso23001-8].

Why is the above a SHOULD? If it matters enough to signal, we should make it SHALL? https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues/286

In any video adaptation set, the following SHALL be present:

Note: @width and @height indicate the number of encoded pixels. @par indicates the final intended display aspect ratio and @sar is effectively the ratio of aspect ratios (ratio of @width x @height to @par).

Given a coded picture of 720x576 pixels with an intended display aspect ratio of 16:9, we would have the following values:

This chapter already includes changes from #274

In any video adaptation set, the following SHOULD NOT be present and SHALL be ignored by clients if present:

The above min/max values are trivial to determine at runtime, so can be calculated by the client when needed.

@scanType SHOULD NOT be present and if present SHALL have the value progressive. Non-progressive video is not interoperable.

5.7. Audio adaptation set constraints

AdaptationSet@lang SHALL be present on every audio adaptation set.

@audioSamplingRate SHALL be present either on the adaptation set or representation level (but not both).

The AudioChannelConfiguration element SHALL be present either on the adaptation set or representation level (but not both). The scheme and value SHALL conform to ChannelConfiguration as defined in [iso23001-8].

5.8. Text adaptation set constraints

Text adaptation sets SHOULD be annotated using descriptors defined by [MPEGDASH], specifically Role, Accessibility, EssentialProperty and SupplementalProperty descriptors.

Guidelines for annotation are provided in § 7 Content annotation and selection and section 7.1.2 of [DVB-DASH].

5.9. Accessing resources over HTTP

[MPEGDASH] defines the structure of DASH presentations. Combined with an understanding of the [addressing modes], this enables DASH clients to determine a set of HTTP requests that must be made to acquire the resources needed for playback of a DASH presentation. This section defines rules for performing the HTTP requests and signaling the relevant parameters in an interoperable manner.

https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues/333

5.9.1. MPD URL resolution

A service MAY use the MPD/Location element to redirect clients to a different URL to perform MPD refreshes. HTTP redirection MAY be used when responding to client requests.

A DASH client performing an MPD refresh SHALL determine the MPD URL according to the following algorithm:

  1. If at least one MPD/Location element is present, the value of any MPD/Location element is used as the MPD URL. Otherwise the original MPD URL is used as the MPD URL.

  2. If the HTTP request results in an HTTP redirect using a 3xx response code, the redirected URL replaces the MPD URL.

The MPD URL as defined by the above algorithm SHALL be used as an implicit base URL for media segment requests.

Any present BaseURL element SHALL NOT affect MPD location resolution.

5.9.2. Segment URL resolution

A service MAY publish media segments on URLs unrelated to the MPD URL. A service MAY use multiple BaseURL elements on any level of the MPD to offer content on multiple URLs (e.g. via multiple CDNs). HTTP redirection MAY be used when responding to client requests.

For media segment requests, the DASH client SHALL determine the URL according to the following algorithm:

  1. If an absolute media segment URL is present in the MPD, it is used as-is (after template variable substitution, if appropriate).

  2. If an absolute BaseURL element is present in the MPD, it is used as the base URL.

  3. Otherwise the MPD URL is used as the base URL, taking into account any MPD URL updates that occurred due to MPD refreshes.

  4. The base URL is combined with the relative media segment URL.

Note: The client may use any logic to determine which BaseURL to use if multiple are provided.

The same logic SHALL be used for initialization segments and index segments.

What do relative BaseURLs do? Do they just incrementally build up the URL? Or are they ignored? This algorithm leaves it unclear, only referencing absolute BaseURLs. We should make it explicit.

5.9.3. Conditional MPD downloads

It can often be the case that a live service signals a short MPD validity period to allow for the possibility of terminating the last period with minimal end-to-end latency. At the same time, generating future segment references might not require any additional information to be obtained by c7lients. That is, a situation might occur where constant MPD refreshes are required but the MPD content rarely changes.

Clients using HTTP to perform MPD refreshes SHOULD use conditional GET requests as specified in [RFC7232] to avoid unnecessary data transfers when the contents of the MPD do not change between refreshes.

5.9.4. Expanding URL template variables

This section clarifies expansion rules for URL template variables such as $Time$ and $Number, defined by [MPEGDASH].

The set of string formatting suffixes used SHALL be restricted to %0[width]d.

Note: The string format suffixes are not intended for general-purpose string formatting. Restricting it to only this single suffix enables the functionality to be implemented without a string formatting library.

5.10. Minimum buffer time signaling

The text here is technically correct but could benefit from being reworded in a simpler and more understandable way. If anyone finds themselves with the time, an extra pass over this would be helpful.

The MPD contains a pair of values for a bandwidth and buffering description, namely the Minimum Buffer Time (MBT) expressed by the value of MPD@minBufferTime and bandwidth (BW) expressed by the value of Representation@bandwidth. The following holds:

In a simple and straightforward implementation, a DASH client decides downloading the next segment based on the following status information:

The task of the client is to select a suitable Representation i.

The relevant issue is that starting from a SAP on, the DASH client can continue to playout the data. This means that at the current time it does have buffer data in the buffer. Based on this model the client can download a Representation i for which BW[i] ≤ rate*buffer/MBT without emptying the buffer.

Note that in this model, some idealizations typically do not hold in practice, such as constant bitrate channel, progressive download and playout of Segments, no blocking and congestion of other HTTP requests, etc. Therefore, a DASH client should use these values with care to compensate such practical circumstances; especially variations in download speed, latency, jitter, scheduling of requests of media components, as well as to address other practical circumstances.

One example is if the DASH client operates on media segment granularity. As in this case, not only parts of the media segment (i.e., MBT worth of data) needs to be downloaded, but the entire Segment, and if the MBT is smaller than the media segment duration, then rather the media segment duration needs to be used instead of the MBT for the required buffer size and the download scheduling, i.e. download a Representation i for which BW[i] ≤ rate*buffer/max_segment_duration.

5.11. Large timescales and time values

[ECMASCRIPT] is unable to accurately represent numeric values greater than 253 using built-in types. Therefore, interoperable services cannot use such values.

All timescales are start times used in a DASH presentations SHALL be sufficiently small that no timecode value exceeding 253 will be encountered, even during the publishing of long-lasting live services.

Note: This may require the use of 64-bit fields, although the values must still be limited to under 253.

5.12. MPD size

No constraints are defined on MPD size, or on the number of elements. However, services SHOULD NOT create unnecessarily large MPDs.

Note: [DVB-DASH] defines some relevant constraints in section 4.5. Consider obeying these constraints to be compatible with [[DVB DASH]].

5.13. Representing durations in XML

All units expressed in MPD fields of datatype xs:duration SHALL be treated as fixed size:

MPD fields having datatype xs:duration SHALL NOT use the year and month units and SHOULD be expressed as a count of seconds, without using any of the larger units.

6. Commonly used features

This chapter describes some features of DASH presentations in their common implementations.

Not every DASH client will support each of these features. Compatibility of different clients and services can verified by comparing the feature sets supported by clients and used by services (and may require experimentation and testing).

6.1. Seamless switching

A key feature of DASH is the ability for clients to seamlessly switch between compatible representations at predetermined points on the MPD timeline, enabling content from different representations to be interleaved according to the wishes of the client. This enables adaptive streaming - changing the active quality level in accordance with dynamically changing network conditions. Most DASH presentations define switching points at 1-10 second intervals.

Note: Decoder reinitialization during representation switches may result in visible or audible artifacts on some clients.

There are IDR-like SAPs (i.e. SAPs of type 1 or 2) at the start of each media segment. This enables seamless switching. The presence of such SAPs is be signaled in the MPD by providing a value of 1 or 2, depending on the sample structure of the media segments, for either AdaptationSet@subsegmentStartsWithSAP (if indexed addressing is used) or AdaptationSet@segmentStartsWithSAP (if any other addressing mode is used).

We need to clarify how to determine the right value for startsWithSAP. #235

Add a reference here to help readers understand what are "IDS-like SAPs (i.e. SAPs of type 1 or 2)".

See also § 6.4 Bitstream switching.

6.2. Preview thumbnails for seeking and navigation

Clients may wish to show timecode-associated preview thumbnails as part of the seeking experience. A typical use case is for enhancing a scrub bar with visual cues. Services that wish to support this SHOULD provide an adaptation set with thumbnails.

The thumbnails are published as a sequence of jpeg/png images containing grids of thumbnails. One grid of thumbnails is one media segment. To ensure efficent transfer, a thumbnail media segment SHOULD be at least 1 minute in duration.

A thumbnail adaptation set MAY offer multiple representations with different spatial resolutions.

The addressing mode SHALL be restricted to simple addressing with only the $Number$ templating variable.

Note: The constraint on allowed addressing modes exists to limit the effort required to implement this feature in clients.

Detailed requirements on the thumbnail representations are defined in § 11.5 Thumbnail images.

6.3. Trick mode

Trick modes are used by DASH clients in order to support fast forward, seek, rewind and other operations in which typically the media, especially video, is displayed in a speed other than the normal playout speed. In order to support such operations, it is recommended that the content author adds representations at lower frame rates in order to support faster playout with the same decoding and rendering capabilities.

However, representations targeted for trick modes are typically not be suitable for regular playout. If the content author wants to explicitly signal that a representation is only suitable for trick mode cases, but not for regular playout, the service SHOULD be structured as follows:

If an adaptation set in annotated with the essential property descriptor with URI http://dashif.org/guidelines/trickmode then the DASH client SHALL NOT select any of the contained representations for regular playout.

6.4. Bitstream switching

Bitstream switching if a feature that allows a switched sequence of media segments from different representations in the same adaptation set to be decoded without resetting the decoder at switch points by ensuring that the resulting stream of media segments can be successfully decoded without the decoder even being aware of a switch.

An adaptation set that supports bitstream switching is a bitstream switching adaptation set.

The AdaptationSet@bitstreamSwitching attribute SHOULD be set to true on a bitstream switching adaptation set. Services SHALL NOT require clients to support bitstream switching in order to correctly present a bitstream switching adaptation set.

The [ISOBMFF] track_id SHALL be equal for all representations in the same bitstream switching adaptation set.

The AdaptationSet@codecs attribute SHALL be present on a bitstream switching adaptation set and indicate the maximum profile and level of any representation.

The Representation@codecs attribute MAY be present on representations that belong to a bitstream switching adaptation set. If present, it SHALL indicate the maximum profile and level of any media segment in the representation.

Allowing Representation@codecs to be absent might make it more difficult to make bitstream-switching-oblivious clients. If we require Representation@codecs to always be present, client developer life could be made simpler.

Clients that support bitstream switching SHALL initialize the decoder using the initialization segment of the representation with the highest Representation@bandwidth in a bitstream switching adaptation set.

Note: A bitstream switching adaptation set fulfills the requirements of [DVB-DASH].

6.5. Switching across adaptation sets

Note: This technology is expected to be available in [MPEGDASH] Amd 4. Once published by MPEG, this section is expected to be replaced by a reference to the MPEG-DASH standard.

Representations in two or more adaptation sets may provide the same content. In addition, the content may be time-aligned and may be offered such that seamless switching across representations in different adaptation sets is possible. Typical examples are the offering of the same content with different codecs, for example H.264/AVC and H.265/HEVC and the content author wants to provide such information to the receiver in order to seamlessly switch representations across different adaptation sets. Such switching permission may be used by advanced clients.

A content author may signal such seamless switching property across adaptation sets by providing a supplemental property descriptor along with an adaptation set with @schemeIdUri set to urn:mpeg:dash:adaptation-set-switching:2016 and the @value is a comma-separated list of adaptation set IDs that may be seamlessly switched to from this adaptation set.

If the content author signals the ability of adaptation set switching and as @segmentAlignment or @subsegmentAlignment are set to true for one adaptation set, the (sub)segment alignment shall hold for all representations in all adaptation sets for which the @id value is included in the @value attribute of the supplemental property descriptor.

As an example, a content author may signal that seamless switching across an H.264/AVC adaptation set with AdaptationSet@id="264" and an HEVC adaptation set with AdaptationSet@id="265" is possible by adding a supplemental property descriptor to the H.264/AVC adaptation set with @schemeIdUri set to urn:mpeg:dash:adaptationset-switching:2016 and the @value="265" and by adding a supplemental property descriptor to the HEVC adaptation set with @schemeIdUri set to urn:mpeg:dash:adaptationset-switching:2016 and the @value="264".

In addition, if the content author signals the ability of adaptation set switching for:

What is the above talking about?

Note: This constraint may result that the switching may only be signaled with one adaptation set, but not with both as for example one adaptation set signaling may include all spatial resolutions of another one, whereas it is not the case the other way round.

Some XML elements in an MPD may be external to the MPD itself, delay-loaded by clients based on different triggers. This mechanism is called XLink and it enables client-side MPD composition from different sources. For the purposes of timing and addressing, it is important to ensure that the duration of each period can be accurately determined both before and after XLink resolution.

Note: XLink functionality in DASH is defined by [MPEGDASH] and [XLINK]. This document provides a high level summary of the behavior and defines interoperability requirements.

XLink elements are those in the MPD that carry the xlink:href attribute. When XLink resolution is triggered, the client will query the URL referenced by this attribute. What happens next depends on the result of this query:

Non-empty result containing a valid XML fragment

The entire XLink element is replaced with the query result. A single XLink element MAY be replaced with multiple elements of the same type.

Empty result or query failure

The XLink element remains as-is with the XLink attributes removed.

When XLink resolution is triggered depends on the value of the xlink:actuate attribute. A value of onLoad indicates resolution at MPD load-time, whereas a value of onRequest indicates resolution on-demand at the time the client wishes to use the element. The default value is onRequest.

Services SHALL publish MPDs that conform to the requirements in this document even before XLink resolution. This is necessary because the behavior in case of XLink resolution failure is to retain the element as-is.

The below MPD example contains an XLink period. The real duration of the XLink period will only become known once the XLink is resolved by the client and the XLink element replaced with real content.

The first period has an explicit duration defined because the XLink resolver has no knowledge of the MPD and is unlikely to know the appropriate value to define for the second period’s Period@start (unless this data is provided in the XLink URL as a parameter).

The explicitly defined duration of the second period will only be used as a fallback if the XLink resolver decides not to define a period. In this case the existing element in the MPD is preserved.

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" xmlns:xlink="http://www.w3.org/1999/xlink" type="static">
  <Period duration="PT30S">
    ...
  </Period>
  <Period duration="PT0S" xlink:href="https://example.com/256479/clips/53473/as_period">
  </Period>
</MPD>

After XLink resolving, the entire <Period> element will be replaced, except when the XLink result is empty, in which case the client preserves the existing element (which in this case is a period with zero duration, ignored by clients).

Parts of the MPD structure that are not relevant for this chapter have been omitted - this is not a fully functional MPD file.

6.7. Update signaling via in-band events

Services MAY signal the MPD validity duration by embedding in-band messages into representations instead of specifying a fixed validity duration in the MPD. This allows services to trigger MPD refreshes at exactly the desired time and to avoid needless MPD refreshes.

The rest of this chapter only applies to services and clients that use in-band MPD validity signaling.

Services SHALL define MPD@minimumUpdatePeriod=0 and add an in-band event stream to every audio representation or, if no audio representations are present, to every video representation. The in-band event stream MAY also be added to other representations. The in-band event stream SHALL be identical in every representation where it is present.

The in-band event stream SHALL be signaled on the adaptation set level by an InbandEventStream element with @scheme_id_uri="urn:mpeg:dash:event:2012" and a @value of 1 or 3, where:

Services SHALL update MPD@publishTime to an unique value after every MPD update.

Note: MPD@publishTime is merely a version label. The value is not used in timing calculations.

Using in-band signaling and MPD@minimumUpdatePeriod=0, each media segment increases the validity period of the MPD by the duration of the media segment by default. When a validity event arrives, it carries the validity end timestamp of the MPD, enabling the client to determine when a new MPD refresh is needed.

For a detailed definition of the mechanism and the event message data structures, see [MPEGDASH]. This chapter is merely a high level summary of the most important aspects relevant to interoperability.

Illustration of MPD expiration signaling using in-band events.

Services SHALL emit in-band events as [MPEGDASH] emsg boxes to signal the MPD validity duration using the following logic:

The in-band events used for signaling MPD validity duration SHALL have scheme_id_uri and value matching the InbandEventStream element. Clients SHALL NOT use in-band events for MPD validity update signaling if these fields on the events do not match the InbandEventStream element or if the InbandEventStream element is not present in the MPD.

In-band events with value=3 SHALL provide an updated MPD in the event’s mpd field as UTF-8 encoded text without a byte order mark.

Clients MAY perofrm MPD refreshes or process an event-embedded MPD immediately upon reading the event, without waiting for the moment signaled by the event timestamp. Services SHALL ensure that an updated MPD is available and valid starting from the moment a validity event is signaled.

Multiple media segments MAY signal the same validity update event (identified by matching id field on event), enabling the signal to be delivered several segments in advance of the MPD expiration.

In-band MPD validity events SHALL NOT be signaled in a static MPD but MAY be present in the media segments referenced by a static MPD, in which case they SHALL be ignored by clients.

Note: The above may happen when a live service is converted to an on-demand service for catchup/recording purposes.

6.8. Specifying initial position in presentation URL

This section could use another pass to make it easier to read.

By default, a client would want to start playback from the start of the presentation (if MPD@type="static") or from near the live edge (if MPD@type="dynamic"). However, in some situations it may be desirable to instruct clients to start playback from a specific position. In live services, where content has a fixed mapping to real time, this means an initial time-shift is applied.

The interoperable mechanism for this is to add an MPD anchor to the presentation URL. Details of this feature are defined in [MPEGDASH], with this chapter offering a summary of the feature and constraining its use to interoperable cases.

An initial position MAY be signalled to the DASH client by including an MPD anchor in the presentation URL. If an anchor is used, it SHALL be specified with one of the following sets of parameters:

The t parameter indicates offset from period start or a moment in real-time, with period referencing a Period@id (defaulting to the first period).

The value of Period@id must be URL-encoded.

The time indicated using the t parameter SHALL be a single npttime value as specified in [media-frags]. This is a narrower definition than accepted by [MPEGDASH].

To start from the beginning of the first period the following would be added to the end of the MPD URL provided to the DASH client: #t=0

To start with a fixed offset from the start of a specific period, in this case 50 minutes from the beginning of the period with ID program_part_2, use the following syntax: #period=program_part_2&t=50:00

When accessing a live service, you can instruct the client to use an initial time-shift so that content from a specific moment is played back by providing a POSIX timestamp with the t parameter. For example, starting playback from Wed, 08 Jun 2016 17:29:06 GMT would be expressed as #t=posix:1465406946. Starting playback from the live edge can be signaled as #t=posix:now.

When referencing a moment in real time using t=posix, the period parameter SHALL NOT be used.

How do leap seconds tie into this? See #161

7. Content annotation and selection

[MPEGDASH] enables a service to annotate adaptation sets to enable clients to make an informed decision on which adaptation set to select for presentation from among the alternatives offered for each adaptation set type. The selection is based on client capabilities, client preferences, user preferences and possibly also interactive choices presented to the user. Typically, the signalling and selection is independent of the codec in use.

This chapter defines requirements and recommendations for annotating adaptation sets with interoperable descriptive information.

A service may offer multiple adaptation sets of the same type to provide the same content in different encodings or different source formats (e.g. one adaptation set encoded from a standard dynamic range master and another encoded from a high dynamic range video master). Alternatively, adaptation sets may describe different content (e.g. different languages or different camera views).

Note: While the typical situation is that a client selects one adaptation set per adaptation set type, there may be cases where multiple adaptation sets of the same type are chosen for playback (e.g. § 6.5 Switching across adaptation sets).

Proper annotation of adaptation sets in MPDs is essential in order to enable interoperable client implementations.

7.1. Annotations for content selection

[MPEGDASH] provides many options for annotating adaptation sets. This document defines a restricted subset considered interoperable by DASH-IF members.

The table below lists the permitted annotations for each adaptation set type. It is expected that interoperable DASH clients recognize the descriptors, elements, and attributes as documented in this chapter.

Content selection annotations SHALL be defined by a service in sufficient detail to differentiate every adaptation set from others of the same type. A service SHALL limit content selection annotations to those defined in this chapter.

Many of these annotations are defined by [MPEGDASH]. Other organizations may define additional descriptors or elements. Some are defined by IOP.

Note: Supplemental property descriptors are intended for presentation optimization and are intentionally not listed as annotations to be used for content selection.

Attribute or element Use Usage requirements
@profiles O If not present, it is inherited from the MPD or period. This may be used for example to signal extensions for new media profiles in the MPD.
@group OD
default=unique (see [MPEGDASH])
The attribute MAY be used. If present, it SHALL be greater than 0.

The value SHALL be different for different adaptation set types and MAY be different for adaptation sets of the same type.

This attribute enables a service to define logical groupings of adaptation sets. A client SHALL select either zero or one adaptation sets from each group.

@selectionPriority OD
default=1
This attribute SHOULD be used to expresses the preferences of the service on selecting adaptation sets for which the DASH client does make a decision otherwise.

Examples include two video codecs providing the same content, but one of the two provides higher compression efficiency and is therefore preferred.

ContentProtection 0...N If this element is present, then the content is protected. If not present, no content protection is applied.

See § 12 Content protection and security

EssentialProperty 0...N Defines an annotation that is considered essential for processing the adaptation set. See also essential property descriptor.

Clients SHALL NOT select adaptation sets that are annotated with any instances of this element that are not understood by the client.

The following schemes are expected to be recognized by a client independent of the adaptation set type:

Viewpoint 0...N Indicates that adaptation set differentiates by a different viewpoint or combination of viewpoints.

If present then all adaptation sets of the same type SHALL carry this descriptor with the same @schemeIdUri and different @value.

Label 0...N Provides a textual description of the content. This element SHOULD be used if content author expects a client to support a UI for selection.

If present then all adaptation sets of the same type SHALL carry this element with different values.

This element SHALL NOT be used as the sole differentiating element, as scenarios with no user interaction must still lead to umanbiguous selection.

Content selection annotations for any adaptation set type.

The following annotations are specific to an adaptation set type.

https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues/274

Attribute or element Use Usage requirements specific to video adaptation sets
@codecs 1...N Defines the codec that is necessary to present one or more representations in an adaptation set.

This attribute can be present on either the adaptation set level (as a single value) or the representation level (in which case multiple values might be present).

See § 11 Media coding technologies for a description of interoperable codecs.

@par M The display aspect ratio at which content is intended to be displayed.
@maxWidth O This attribute should be present to express the maximum width in samples after decoder sample cropping of any representation contained in the adaptation set.

The value should be the maximum horizontal sample count of any SPS in the contained bitstream.

@maxHeight O This attribute should be present to express the maximum height in pixel of any representation contained in the adaptation set.

The value should be the maximum horizontal sample count of any SPS in the contained bitstream.

@maxFrameRate O This attribute should be present to express the maximum frame rate, i.e. the maximum value of any entry in the decoder configuration record of the signaled frame rate, if constant frame rate is provided. contained in the adaptation set.
EssentialProperty 0...N Defines an annotation that is considered essential for processing the adaptation set. See also essential property descriptor.

Clients SHALL NOT select adaptation sets that are annotated with any instances of this element that are not understood by the client.

The following schemes are expected to be recognized by a client for video adaptation sets:

  • urn:mpeg:mpegB:cicp:<Parameter> as defined in [iso23001-8] and <Parameter> being one of the following: ColourPrimaries, TransferCharacteristics, or MatrixCoefficients

Accessibility 0...N Defines the type of accessibility-relevant content present in the adaptation set.

The set of descriptors SHALL be restricted to the following:

Role 0...N Defines the role of the content in the adaptation set.

The set of descriptors SHALL be restricted to the "Role" scheme as defined by [MPEGDASH] with @schemeIdUri="urn:mpeg:dash:role:2011" MAY be used for differentiation. A client is expected to recognize the following values when this scheme is used in the Role descriptor:

  • caption

  • subtitle

  • main

  • alternate

  • supplementary

  • sign

  • emergency

Clients SHALL consider there to be an implicit Role descriptor with the "Role" scheme and the value main if no explicitly defined Role descriptor with the "Role" scheme is present.

Annotations for video adaptation sets.
Attribute or element Use Usage requirements specific to audio adaptation sets
@codecs 1...N Defines the codec that is necessary to present one or more representations in an adaptation set.

This attribute can be present on either the adaptation set level (as a single value) or the representation level (in which case multiple values might be present).

See § 11 Media coding technologies for a description of interoperable codecs.

@lang M The language of the audio stream.
@audioSamplingRate M The audio sampling rate.
AudioChannelConfiguration 1...N specifies information about the audio channel configuration. The following schemes are expected to be recognized by a client:
EssentialProperty 0...N Defines an annotation that is considered essential for processing the adaptation set. See also essential property descriptor.

Clients SHALL NOT select adaptation sets that are annotated with any instances of this element that are not understood by the client.

The following schemes are expected to be recognized by a client for audio adaptation sets:

  • urn:mpeg:dash:audio-receiver-mix:2014 as defined in [MPEGDASH]

Accessibility 0...N Defines the type of accessibility-relevant content present in the adaptation set.

The set of descriptors SHALL be restricted to the "Role" scheme as defined by [MPEGDASH], with @schemeIdUri="urn:mpeg:dash:role:2011". A client is expected to recognize the following values when this scheme is used in the Accessibility descriptor:

  • description

  • enhanced-audio-intelligibility

Role 0...N The set of descriptors SHALL be restricted to the "Role" scheme as defined by [MPEGDASH] with @schemeIdUri="urn:mpeg:dash:role:2011" MAY be used for differentiation. A client is expected to recognize the following values when this scheme is used in the Role descriptor:
  • main

  • alternate

  • supplementary

  • commentary

  • dub

  • emergency

Clients SHALL consider there to be an implicit Role descriptor with the "Role" scheme and the value main if no explicitly defined Role descriptor with the "Role" scheme is present.

Annotations for audio adaptation sets.
Attribute or element Use Usage requirements specific to text adaptation sets
@codecs 0...N Defines the codec that is necessary to present one or more representations in an adaptation set.

This attribute can be present on either the adaptation set level (as a single value) or the representation level (in which case multiple values might be present).

The attribute SHALL be present, except when IOP does not define a @codecs value for the used text codec and encapsulation mode combination, in which case it SHALL be omitted.

See § 11 Media coding technologies for a description of interoperable codecs.

@lang M The text language.
Accessibility 0...N Defines the type of accessibility-relevant content present in the adaptation set.

The set of descriptors SHALL be restricted to the "Role" scheme as defined by [MPEGDASH], with @schemeIdUri="urn:mpeg:dash:role:2011". A client is expected to recognize the following values when this scheme is used in the Accessibility descriptor:

  • sign

  • caption

Role 0...N Defines the role of the content in the adaptation set.

The set of descriptors SHALL be restricted to the "Role" scheme as defined by [MPEGDASH] with @schemeIdUri="urn:mpeg:dash:role:2011" MAY be used for differentiation. A client is expected to recognize the following values when this scheme is used in the Role descriptor:

  • main

  • alternate

  • subtitle

  • supplementary

  • commentary

  • dub

  • description

  • emergency

Clients SHALL consider there to be an implicit Role descriptor with the "Role" scheme and the value main if no explicitly defined Role descriptor with the "Role" scheme is present.

Annotations for text adaptation sets.

7.2. Content model

In order to support the content author in providing content in a consistent manner, this chapter provides a conceptual content model for DASH content in one period of an MPD. The content may be described by an asset identifier as a whole and may contain different adaptation set types.

Model for content selection.

Within each adaptation set type, the content author may want to offer alternative content that is time-aligned but where each alternative represents different content (e.g. multiple camera angles). Automatic selection of the alternative content is not expected to be done by the DASH client as the client would not have sufficient information to make such decisions. However, the selection is expected to be done by communication with an application or the user, typically using a user interface appropriate for selection.

In the absence of user indication to select from among the alternatives, the DASH client still needs to select content to be presented. A DASH service must therefore signal the preferred default content. The preferred content is referred to as main content, whereas any content that is not main content is referred to as alternative content. There may be multiple alternatives which may need to be distinguished. See § 7.2.1 Signaling alternative content.

Furthermore, it may be that content of different [[#adaptation-set-types|adaptation set types] is linked by the content author, to express that two content of different adaptation set type are preferably played together. We define associated content for this purpose. As an example, there may be a main commentator associated with the main camera view, but for a different camera view, a different associated commentary is provided. See § 7.2.2 Signaling associated content.

In addition to semantical content level differentiation, each alternative content may be provided in different variants, based on content preparation properties (downmix, subsampling, translation, suitable for trick mode, etc.), client preferences (decoding or rendering preferences, e.g. codec), client capabilities (DASH profile support, decoding capabilities, rendering capabilities) or user preferences (accessibility, language, etc.). Both main content and alternative content in all their variants are differentiated in the MPD as defined in § 7.1 Annotations for content selection.

7.2.1. Signaling alternative content

If a period contains alternative content for one adaptation set type , then the alternatives SHALL be differentiated according to § 7.1 Annotations for content selection and one of the alternatives SHALL be provided as main content.

Main content is signaled by using the Role descriptor with scheme urn:mpeg:dash:role:2011 and value set to main. Alternative content is signaled by using the Role descriptor with scheme urn:mpeg:dash:role:2011 and value set to alternative.

7.2.2. Signaling associated content

A Viewpoint descriptor with the same @schemeIdUri and @value SHALL be used by services to signal associated content.

Clients SHALL use identical Viewpoint descriptors for determining associated content even if they do not understand the @schemeIdUri.

7.3. Client processing reference model

The following client model serves two purposes:

In the model it is assumed that the client can get sufficient information on at least the following properties:

Note: If any of these functionalities are not fulfilled, then the client may still be functional, but it may not result in the full experience as provided by the content author. As an example, if the DASH client cannot determine the preferred language, it may just use the selection priority for language selection.

The DASH client uses the MPD and finds the period that it likes to join, typically the first one for On-Demand content and the one at the live edge for live content. In order to select the media to be played, the DASH client assumes that the content is offered according to the content model above.

  1. The DASH client looks for main content, i.e. any adaptation set with annotation Role@schemeIdUri="urn:mpeg:dash:role:2011" and Role@value="alternative" is excluded initially for selection. Note that in this model it is assumed that immediate startup is desired. If the DASH client wants to go over the alternatives upfront before starting the service, then the sequence is slightly different, but still follows the remaining principles.

  2. DASH Client checks each adaptation set for the supported capabilities of the platform. If any of the capabilities are not supported, then the adaptation set is excluded from the selection process.

    • Codec support

    • DRM support

    • Rendering capabilities

  3. The DASH client checks if it supports for CEA-608 rendering as defined in clause § 11.8 CEA-608/708 Digital Television (DTV) Closed Captioning. If not supported, any accessibility descriptor with @schemeIdUri="urn:scte:dash:cc:cea-608:2015" is removed. Note that the adaptation set is maintained as it may used for regular video decoding.

  4. DASH Client checks is there are any specific settings for accessibility in the user preferences

    • If captions are requested by the system, the DASH client extracts

      • all video adaptation sets that have an Accessibility descriptor assigned with either the @schemeIdUri="urn:mpeg:dash:role:2011" and @value="caption" or @schemeIdUri="urn:scte:dash:cc:cea-608:2015" (burned-in captions and SEI-based), as well as

        • all text adaptation sets that have an Accessibility descriptor assigned with either the @schemeIdUri="urn:mpeg:dash:role:2011" and @value="caption"

        • and makes those available for adaptation sets that can be selected by the DASH client for caption support.

      • If multiple text adaptation sets remain, the DASH client removes all adaptation sets from the selection that are not in the preferred language, if language settings are provided in the system. If no language settings in the system are provided, or none of the adaptation sets meets the preferred languages, none of the adaptation sets are removed from the selection. Any adaptation sets that do not contain language annotation are removed, if any of the remaining adaptation sets provides proper language settings.

      • If still multiple text adaptation sets remain, then the ones with the highest value of @selectionPriority are chosen.

      • If still multiple text adaptation sets remain, then the DASH client makes a random choice on which caption to enable.

    • else if no captions are requested

      • the Accessibility element signaling captions may be removed from the adaptation set before continuing the selection.

    • If sign language is requested

      • all video adaptation sets that have an Accessibility descriptor assigned with @schemeIdUri="urn:mpeg:dash:role:2011" and @value="sign" are made available for sign language support.

    • else if no sign language is requested

      • the adaptation set signaling sign language with the Accessibility element may be removed from the adaptation set before continuing the selection

    • If audio descriptions are requested

      • all video adaptation sets that have an Accessibility descriptor assigned with @schemeIdUri="urn:mpeg:dash:role:2011" and @value="description" are made available for audio description support.

    • else if no audio descriptions are requested

      • the adaptation set signaling audio descriptions with the Accessibility element may be removed from the adaptation set before continuing the selection.

    • If enhanced audio intelligibility is requested

      • all audio adaptation sets that have an Accessibility descriptor assigned with @schemeIdUri="urn:mpeg:dash:role:2011" and @value="enhanced-audio-intelligibility" are made available for enhanced audio intelligibility support.

    • else if no enhanced audio intelligibility is requested

      • the Accessibility element may be removed from the adaptation set before continuing the selection.

  5. If video rendering is enabled, based on the remaining video adaptation sets the client selects one as follows:

  6. If audio rendering is enabled, based on the remaining audio adaptation sets the client selects one as follows:

  7. If text rendering is enabled, based on the text adaptation sets the client selects one as follows:

    • Any adaptation set for which an essential property descriptor is present for which the scheme or value is not understood by the DASH client, is excluded from the selection

    • If multiple text adaptation sets remain, the DASH client removes all adaptation sets from the selection that are not in the preferred language, if language settings are provided in the system. If no language settings in the system are provided, or none of the adaptation sets meets the preferred languages, none of the adaptation sets are removed from the selection. Any adaptation set that does not contain language annotation are removed, if any of the remaining adaptation sets provides proper language settings.

    • If still multiple text adaptation sets remain, then the ones with the highest value of @selectionPriority are chosen.

    • If still multiple text adaptation sets remain, then the DASH client makes a choice for itself, possibly on a random basis.

  8. If the DASH client has the ability to possibly switch to alternative content, then alternative content may be selected either through the Label function or the Viewpoint functionality. This selection may be done dynamically during playout and the DASH client is expected to switch to the alternative content. Once all alternative content is selected, the procedures following from step 2 onwards apply.

  9. At period boundary a DASH client initially looks for period continuity or connectivity, i.e. does the period include an adaptation set that is a continuation of the existing one. If not present it will go back to step 1 and execute the decision logic.

8. On-demand services

An on-demand service is one that is published with a static MPD (MPD@type="static").

On-demand services have an infinite availability window and have no fixed mapping to real time - clients may present any part at any time and may use trick mode support to alter the playback rate.

Note: An on-demand service may be created by transforming what was previously a live service into an on-demand service for viewing as a catch-up presentation or a recording. See § 9.10 Converting a live service to an on-demand service.

On-demand services MAY use any addressing mode or even a combination of multiple addressing modes.

MPD elements or attributes only relevant for dynamic MPDs SHALL NOT be present in MPDs of on-demand services. Clients SHALL ignore any such elements or attributes if present.

8.1. Surviving transforming boxes and other adaptation middleboxes

A number of video transcoding proxies (aka "middleboxes") are deployed on the Internet that may silently transcode DASH presentations. Specifically, a middlebox may see a video/mp4 HTTP response, transcode that video into a different format (perhaps using a lower bitrate or a different codec), then forward the transcoded video to the DASH client. This will break byte range based operations, as byte ranges from the original video are not valid in the transcoded video.

If such a threat is encountered, the following options may prevent proxies from transcoding DASH presentations:

insert reference to encryption.

In all cases the operational impacts on caching and implementations should be considered when using any of the above technologies. The same methods may also need to be applied to prevent middleboxes manipulating the MPD.

9. Live services

A live service is one that is published with a dynamic MPD (MPD@type="dynamic").

Live services have a strict mapping between the MPD timeline and real time and are often available only for a limited time. The MPD of a live service may change over time, for example as more content is appended and expired content is removed. Clients are forced into a timed schedule for the playout, such that they follow the schedule as desired by the content author (with some amount of client-controlled time-shift allowed).

A live service has a live edge, which is the most recent moment on the MPD timeline for which the MPD guarantees that media segments are available for all representations. See § 9.7 Determining the live edge.

Live services MAY use either explicit addressing or simple addressing or a combination of the two. Indexed addressing is not meaningful in a live service context.

Note: In previous versions of IOP a distinction was made between "simple live" and "main live" services. The latter simply refers to live services that signal MPD updates using in-band events.

There are multiple different types of live services:

Scheduled playback of prepared content

The content is prepared in advance but playback is scheduled for a specific time span in real time.

MPD-controlled live service

The content is generated on the fly and the MPD receives constant updates to reflect the latest state of the service offering. The DASH client behavior is driven solely by the MPD contents, which it regularly refreshes.

MPD- and segment-controlled live service

The content is generated on the fly and clients are kept informed of MPD validity by in-band events in the media segments. MPD downloads are only initiated when the need for updates is detected. Services can signal the need for updates on short notice.

For initial access to the service and joining the service, an MPD is required. MPDs may be accessed at join time or may have been provided earlier, for example along with an Electronic Service Guide. An MPD anchor MAY be used when referencing the MPD to specify an initial time-shift that clients are expected to apply.

Note: Support for MPD anchors is an optional client feature - a service should consider clients that lack an implementation.

The initial MPD or join MPD is accessed and processed by the client and, having an accurate clock that is synchronized with the server, the client can analyze the MPD and extract suitable information in order to initiate playback of the service. This includes, but is not limited to:

The MPD may be updated on the server based on certain rules and clients consuming the service are expected to update MPDs based on certain triggers. The triggers may be provided by the MPD itself or by information included in media segments. See § 5.2.9.5 MPD updates and § 6.7 Update signaling via in-band events.

9.1. Selecting the time shift buffer size

Recommended configuration for time shift buffer size:

If playback should only occur near the live edge, without significant time shift possibility.

MPD@timeShiftBufferDepth SHOULD be short but with a lower limit of 4 times media segment duration or 6 seconds (whichever is largest). This gives the client some opportunity to time-shift for buffering purposes, to overcome difficult network conditions.

If playback is not limited to near-live-edge.

MPD@timeShiftBufferDepth MAY have an arbitrarily large value, including a value greater than the total duration of periods in the presentation.

9.2. Selecting the suggested presentation delay

Recommended configuration for presentation delay:

If the service wishes to explicitly synchronize playback of different clients.

MPD@suggestedPresentationDelay SHOULD be set to the desired presentation delay but with a lower limit of 4 seconds or 2-4 times the media segment duration (whichever is largest).

If the service does not wish to explicitly synchronize playback of different clients.

Omit MPD@suggestedPresentationDelay and let each client determine the optimal presentation delay based on its own heuristics (which may lead different clients to choosing a different presentation delay).

The limitations imposed by the following factors SHOULD be considered when selecting the value for the presentation delay:

9.3. Selecting the media segment duration

The media segment duration SHOULD be between 1 and 10 seconds. The duration influences the end-to-end latency but also the switching and random access granularity as in DASH-264/AVC each media segment starts with a stream access point which can also be used as a switching point. The service provider should set the value taking into account at least the following:

9.4. Safety margins in availability timing

There exists unavoidable jitter and occur occasional delays in most content delivery architectures. A DASH client SHOULD avoid being too aggressive in requesting media segments as soon as they become available. If a DASH client observes issues, such as 404 responses, it SHOULD back up slightly in the requests.

Services SHALL be published so that all timing promises made by the MPD hold under normal operating conditions. Services MAY indicate an availability window that includes a safety margin. However, such a safety margin will lead to increased end-to-end latency, so it is a balance to be taken into account.

If a service wishes to impose a safety margin of N seconds, it SHOULD offset MPD@availabilityStartTime into the future by N seconds when starting the presentation.

9.5. Selecting the minimum update period

The minimum update period signals that MPD@minimumUpdatePeriod worth of future media segments are guaranteed to become available over that time span after retrieving the MPD.

Setting the value of the minimum update period primarily affects two main aspects of a service:

The downside of a small minimum update period is that a large number of MPD download requests will be made by clients. This overhead can be minimized by conditional GET requests and/or in-band MPD update signaling.

If in-band MPD validity signaling is used, MPD@minimumUpdatePeriod SHALL be 0.

9.6. Robust and seamless period transitions

Multilanguage live services are likely to encounter experience transitions from one period to another. For example, due to changes in the set of available audio/text languages or due to ad insertion.

To ensure robust client operation at period transitions, ensure that all the requirements of the timing model are satisfied. In particular, periods must be fully covered by media segment references and media samples, including immediately before/after a period transition. No gaps can occur in any representation!

In many of these cases, some adaptation sets are likely to continue seamlessly across period boundaries, in which case they SHOULD be marked as period-connected or period-continuous.

9.7. Determining the live edge

If a service does not declare a suggested presentation delay or if the client chooses to ignore it, the client will likely want to know the position of the live edge in order to perform its own presentation delay calculations.

The live edge is affected by the following factors:

Accordingly, the live edge can be calculated as follows:

  1. Determine the maximum media segment length segment_length_max for each representation.

  2. Determine the availability window end position availability_end for each representation.

  3. Determine the minimum guaranteed start of the most recent available media segment available_segment_start for each representation as available_segment_start = availability_end - segment_length_max.

  4. The live edge is min(available_segment_start).

A client MAY exclude some representations from live edge calculation if it considers them optional for successful playback. For example, trick mode representations may become available in a delayed fashion and would needlessly delay the live edge. See also § 9.8 Trick mode for live services.

Note: When determining the presentation delay a client should also consider other aspects besides the live edge such as clock synchronization accuracy, expected network performance jitter and desired buffer size.

See also § 9.4 Safety margins in availability timing.

9.8. Trick mode for live services

If trick mode is to be supported for live services, the trick mode representations SHOULD be offered using the same media segment duration as in the main adaptation set or each media segment duration should aggregate an integer multiple of the media segments in the main adaptation set.

The content author needs to find a balance between the media segment duration affecting the amount of requests in fast forward or fast rewind and the availability timing of trick mode media segments. Longer media segment durations for the trick mode representation delay the availability time of such media segments by the duration of the media segment - i.e. at the live edge the trick mode may not be fully supported.

Based on this it is a content author’s decision to provide one or more of the following alternatives for trick mode for live services:

Combinations of different alternatives are possible.

If a client wants to access a trick mode adaptation set in a live service, it SHOULD attempt to minimize the amount of requests to the network by preferring media segments with longer duration (if multiple choices are provided).

If a service is converted from live to on-demand, trick mode adaptation sets SHOULD be converted to use indexed addressing.

9.9. DVB-DASH alignment

For alignment with [DVB-DASH], the following should be considered:

[DVB-DASH] also provides recommendations in order to apply weights and priorities to different networks in a multi-BaseURL offering in section 10.8.2.1.

9.10. Converting a live service to an on-demand service

The major difference between live and on-demand services is that live services have their timeline mapped to a real time clock and have an MPD that may change. This behavior is signaled by MPD@type="dynamic". To transform a live service to an on-demand service, it may often be sufficient to set MPD@type="static" and to remove any signaling in the MPD that is restircted to dynamic MPDs.

There is no need to alter media segments when transforming a live service to an on-demand service.

Consider the time span of available content. A live service has a time shift buffer that may only allow a recent time span of content to be presented as a live service. If you wish to publish a larger time span as a recording, creating a separate on-demand variant of the MPD in parallel with the on-demand variant may be sensible.

A live service MAY be converted to an on-demand service without changing the URL, by simply replacing the dynamic MPD with a static MPD. Maintaining the same URLs for media segments might be beneficial in terms of CDN efficiency.

See also § 5.2.9.5.3 End of live content.

9.11. Reliable and consistent-delay live service

This and everything below needs to be updated to conform to timing model

Needs proper Bikeshed formatting and references

A service provider wants to run a live DASH service according to the below Figure 8. As an example, a generic encoder for a 24/7 linear program or a scheduled live event provides aproduction encoded stream. Such streams typically includ inband events to signal program changes, ad insertion opportunities and other program changes. An example for such signalling are SCTE-35 [54] messages. The stream is then provided to one or more Adaptive Bitrate (ABR) encoders, which transcodes the incoming stream into multiple bitrates and also conditions the stream for segmentation and program changes. These multiple encoders may be used for increased ABR stream density and/are then distributed downstream for redundancy purposes. The resultant streams are received by theDASH generation engines that include: MPD generator, packager and segmenter.Typically the following functions are applied by the MPD packager:

Downstream, the segments may be hosted on a single origin server, or in one or multiple CDNs. The MPD may even be further customized downstream, for example to address specific receivers. Customization may include the removal of certain Adaptation Sets that are not suitable for the capabilities of downstream clients. Specific content may be spliced based on regional services, targeted ad insertion, media blackouts or other information. Events carried from the main encoder may be interpreted and removed by the MPD packager, or they may be carried through for downstream usage. Events may also added as MPD events to the MPD.

In different stages of the encoding and distribution, errors may occur (as indicated by lightning symbols in the diagram), that for itself need to be handled by the MPD Generator and packager, the DASH client, or both of them. The key issue for this section is the ability for the DASH Media Presentation Generator as shown in to generate services that can handle the incoming streams and provide offerings such that DASH clients following DASH-IF IOPs can support.

Hence this section primarily serves to provide guidelines for implementation on MPD Generators and Packagers.

Example live service deployment architecture.

The following scenarios are considered in the service setup:

Check and align references in original text.

The subchapters here outline some possibilities for solving the above challenges.

9.11.1. Consistent latency

The scenario does not ask for very low latency, but for consistent latency. Latency can primarily be controlled by the following means:

9.11.2. Unanticipated new periods

An MPD has a certain duration after download during which the service guarantees that the information within remains valid, signaled by MPD@minimumUpdatePeriod. To avoid that the clients take future segment existence for granted even if a sudden change on the service offering is necessary, the MPD service provider must set to the MPD@minimumUpdatePeriod to a low value.

In the most conservative case, [[#live-mup-zero|the MPD author sets the MPD@minimumUpdatePeriod to 0]]. Then no promise for future segments is provided. The DASH client is forced to revalidate the MPD prior to any new Segment request.

For controlling future MPD validity, basically two options exist:

  1. Client downloads a fresh MPD before every Segment request (or batch of requests), preferably using a conditional GET in order to avoid unnecessary downlink traffic and processing in the client.

  2. Client relies on MPD validity expiration events in event messages, if content provider announces those in the MPD and by this, it can revalidate.

The two methods are not mutually exclusive.

9.11.3. Media segment duration variations

Variable media segment durations need to be correctly signed in the MPD. The mechanism depends on the addressing mode:

  1. Simple addressing allows for a deviation of up to 50% segment duration in segment start points, allowing for some drift to be compensated.

  2. Explicit addressing allows the duration of each media segment to be defined explicitly.

Media segments SHALL NOT have a duration greater than MPD@maxSegmentDuration in any situation.

9.11.4. Losses and operational failures

One of the most complex aspects are occasional operational issues, such as losses, outages, failovers of input streams, encoders, packagers and distribution links. Section 4.8 provides detailed overview on available tools that should be used by network service offering and clients in order to deal with operational issues. Several types of losses may occur:

Examples of different types of data loss.

Losses may occur in the middle of a Segment, at the end of a Segment, at the start of a new Segment. At the elementary stream level, losses may be within a compressed access unit (AU), producing a syntactically corrupt bitstream, or may be the result of the ABR encoder simply not encoding a source frame in which case the duration of the prior AU is extended producing a conforming bitstreams. Losses may impact an entire Segment or may just impact a part of the Segment. Typically, service oriented losses will occur until the next Random access point, i.e. a loss is to be signaled from the start of the lost sample up to the next random access point, typically coinciding with the start of a new Segment.

IOP defines some basic constraints in the timing model:

Deviation from these constraints is not allowed, even in case of data loss. This means that theer are basically two options:

  1. A service MAY replace lost data with padding data.

  2. A service MAY start a new period when the data loss starts and ends, removing the affected representations for the duration of the loss.

Of course, it is not possible for a service to compensate for data loss in the CDN layer. Clients are expected to survive arbitrary 404 errors that occur due to CDN faults, either by retrying, switching to another CDN (base URL), switching to another representation or automatically seeking forward.

Is there something that goes into more depth about 404s? These statements need a better home.

9.11.5. Minimizing MPD updates

MPD updates, the frequency of MPD updates and the actions included in MPD updates are different ones, and their effects may have different impacts on deployments. To avoid confusion on the generally overloaded term, some more details are discussed in the following section. In non-DASH adaptive streaming solutions, MPD updates result in the following additional processing and delivery overhead:

  1. The client sends an uplink requests for the MPD. At least from a CDN perspective, this is issue is considered less critical, typically the bounds of operation are reached by throughout, not by the number of requests.

  2. The server needs to send a full MPD with every request, which for itself causes overhead from all the way of the origin server to the client. This is in particular relevant if the manifest contains a list of URLs, and some timeshift buffer is maintained.

  3. Yet another aspect is the regular parsing and processing of the manifest in the client. Whereas the processing is likely less of a burden, the consistency across two parsing instances is relevant and requires to keep state.

  4. MPD updates may also result in writing a new MPD on the server. This may be less problematic for certain cases, especially for unicast, but it results in significant overhead if DASH formats are used for broadcast.

DASH-IF IOP provides different means to avoid one or the more of the above issues. Assuming that the MPD@minimumUpdatePeriod is set to a low value for reasons documented above, then issues mentioned above can be addressed by the following means in DASH-IF IOP:

  1. Client Requests: can be avoided by signalling inband that an MPD is has expired. The most obvious tool is the use of Inband Events with MPD expiry. However, this requires inband events being added during packaging.

  2. Sending Full MPD: Instead of requesting the full MPD, the client can support this operation by issuing a conditional GET. If the MPD has not changed, no MPD needs to be sent and the downlink rate is small. However, this requires the usage of @duration or SegmentTimeline with @r=-1.

  3. MPD Parsing and Processing: This can be avoided by using either of the solutions documented above.

  4. MPD writing on server: This goes hand-in-hand with 2, i.e. the usage of @duration or SegmentTimeline with @r=-1.

Generally, DASH-IF IOP provide several tools to address different aspects of minimizing MPD updates. Based on the deployment scenario, the appropriate tools should be used. However, it is preferable that DASH clients support different tools in order to provide choices for the service offering.

9.11.6. Proposed service configuration and MPD generation logic

The core concept is the availability of a segment stream at the input to a packager. The segment stream may be made available as individual segments or as boundary markers in a continuous stream. In addition, the stream may contain information that is relevant for the packager, such as program changes. The segment stream determines for each segment the earliest presentation time, the presentation duration, as well as boundaries in program offerings.

Furthermore, it is assumed that multiple bitrates may exist that are switchable. In the following we focus on one segment stream, but assume that in the general case multiple bitrates are available and the encoding and segment streams are generated such that they can be switched.

The high-level assumptions for the service are summarized in 4.11.2. Based on these assumptions, a more detailed model is provided.

The different scenarios are summarized in Figure 10. For the third part, it shows the notion of the change lead time. Segment of Period with index j are provided. In this case, at the start of segment (j, i+1) (i.e. its earliest presentation time) an indication is provided that the media will change after segment (j, i+2), i.e. the change lead time is d[j, i+1] + d[j, i+2]. A new Period j+1 is generated that starts with a new segment numbering.

Different properties of a segment stream.

Based on the discussions in 4.11.2, proposed service configuration for such a service are proposed. The service configuration differentiates two deployment scenarios:

  1. Clients implementing the simple live client, i.e. no emsg support and no segment parsing is implemented.

  2. Clients implementing the main client, i.e. emsg is supported and segment parsing is implemented.

9.11.6.1. Service configuration for simple live

Assuming that the input stream is a segment stream with the properties documented above is received by the DASH packager.

The DASH packager may operate as follows:

9.11.6.2. Service configuration for main live

Assuming that the input stream is a segment stream with the properties documented above is received by the DASH packager.

The DASH packager may operate as follows:

The DASH client having received an MPD that signals gaps is expected to either look for alternative Representations that are not affected by the loss, or if not possible, do some appropriate error concealment. The DASH client also should go back regularly to check for MPD updates whether the Representation gets available again.

10. Ad insertion

Needs to be checked for conformance with timing model.

Needs proper Bikeshed formatting and referencing

Needs deduplication of DASH concepts that are re-defined here.

This section provides recommendations for implementing ad insertion in DASH. Specifically, it defines the reference architecture and interoperability points for a DASH-based ad insertion solution.

The baseline reference architecture addresses both server-based and app-based scenarios. The former approach is what is typically used for Apple HLS, while the latter is typically used with Microsoft SmoothStreaming and Adobe HDS.

The following definitions are used in this section:

Ad Break

A location or point in time where one or more ads may be scheduled for delivery; same as avail and placement opportunity.

Ad Decision Service

functional entity that decides which ad(s) will be shown to the user. It interfaces deployment-specific and are out of scope for this document.

Ad Management Module

logical service that, given cue data, communicates with the ad decision service and determines which advertisement content (if at all) should be presented during the ad break described in the cue data.

Cue

indication of time and parameters of the upcoming ad break. Note that cues can indicate a pending switch to an ad break, pending switch to the next ad within an ad break, and pending switch from an ad break to the main content.

CDN node

functional entity returning a segment on request from DASH client. There are no assumptions on location of the node.

Packager

functional entity that processes conditioned content and produces media segments suitable for consumption by a DASH client. This entity is also known as fragmenter, encapsulater, or segmenter. Packager does not communicate directly with the origin server – its output is written to the origin server’s storage.

Origin

functional entity that contains all media segments indicated in the MPD, and is the fallback if CDN nodes are unable to provide a cached version of the segment on client request. Splice Point: point in media content where its stream may be switched to the stream of another content, e.g. to an ad.

MPD Generator

functional entity returning an MPD on request from DASH client. It may be generating an MPD on the fly or returning a cached one.

XLink resolver

functional entity which returns one or more remote elements on request from DASH client.

DASH ad insertion relies on several DASH tools defined in [MPEGDASH], which are introduced in this section. The correspondence between these tools and ad insertion concepts are explained below.

10.1. Remote elements

Remote elements are elements that are not fully contained in the MPD document but are referenced in the MPD with an HTTP-URL using a simplified profile of XLink.

A remote element has two attributes, @xlink:href and @xlink:actuate. @xlink:href contains the URL for the complete element, while @xlink:actuate specifies the resolution model. The value onLoad requires immediate resolution at MPD parse time, while onRequest allows deferred resolution at a time when an XML parser accesses the remote element. In this text we assume deferred resolution of remote elements, unless explicitly stated otherwise. While there is no explicit timing model for earliest time when deferred resolution can occur, the specification strongly suggests it should be close to the expected playout time of the corresponding Period. A reasonable approach is to choose the resolution at the nominal download time of the Segment.

XLink resolution

Resolution (a.k.a. dereferencing) consists of two steps. Firstly, a DASH client issues an HTTP GET request to the URL contained in the @xlink:href, attribute of the in-MPD element, and the XLink resolver responds with a remote element entity in the response content. In case of error response or syntactically invalid remote element entity, the @xlink:href and @xlink:actuate attributes the client shall remove the in-MPD element.

If the value of the @xlink:href attribute is urn:mpeg:dash:resolve-to-zero:2013, HTTP GET request is not issued, and the in-MPD element shall be removed from the MPD. This special case is used when a remote element can be accessed (and resolved) only once during the time at which a given version of MPD is valid.

If a syntactically valid remote element entity was received, the DASH client will replace in-MPD element with remote period entity. Once a remote element entity is resolved into a fully specified element, it may contain an @xlink:href attribute with @xlink:actuate set to onRequest, which contains a new XLink URL allowing repeated resolution. Note that the only information passed from the DASH client to the XLink resolver is encoded within the URL. Hence there may be a need to incorporate parameters into it, such as splice time (i.e., PeriodStart for the remote period) or cue message.

Note: In ISO/IEC 23009-1:2014/Cor.3 it is clarified that if multiple top-level remote elements are included, the remote element entity is not a valid XML document.

10.2. Periods

Periods are time-delimited parts of a DASH Media Presentation. The value of PeriodStart can be explicitly stated using the Period@start attribute or indirectly computed using Period@duration of the previous Periods.

Precise period duration of period i is given by PeriodStart(i+1) – PeriodStart(i). This can accommodate the case where media duration of period i is slightly longer than the period itself, in which case a client will schedule the start of media presentation for period i+1 at time PeriodStart(i+1).

Representation@presentationTimeOffset specifies the value of the presentation time at PeriodStart(i).

10.3. Segment availability

In case of dynamic MPDs, Period-level BaseURL@availabilityTimeOffset allow earlier availability start times. A shorthand notation @availabilityTimeOffset="INF" at a Period-level BaseURL indicates that the segments within this period are available at least as long as the current MPD is valid. This is the case with stored ad content. Note that DASH also allows specification of @availabilityTimeOffset at Adaptation Set and Representation level.

10.4. Seamless transition

The DASH specification says nothing about Period transitions – i.e., there are no guarantees for seamless continuation of playout across the period boundaries. Content conditioning and receiver capability requirements should be defined for applications relying on this functionality. However, Period continuity or connectivity should be used and signaled as defined in section 3.2.12 and ISO/IEC 23009-1:2014/Amd.3 [4].

10.5. Period labeling

Period-level AssetIdentifier descriptors identify the asset to which a given Period belongs. Beyond identification, this can be used for implementation of client functionality that depends on distinguishing between ads and main content (e.g. progress bar and random access).

10.6. DASH events

DASH events are messages having type, timing and optional payload. They can appear either in MPD (as period-level event stream) or inband, as ISO-BMFF boxes of type emsg. The emsg boxes shall be placed at the very beginning of the Segment, i.e. prior to any media data, so that DASH client needs a minimal amount of parsing to detect them.

DASH defines three events that are processed directly by a DASH client: MPD Validity Expiration, MPD Patch and MPD Update. All signal to the client that the MPD needs to be updated – by providing the publish time of the MPD that should be used, by providing an XML patch that can be applied to the client’s in-memory representation of MPD, or by providing a complete new MPD. For details please see section 4.5.

User-defined events are also possible. The DASH client does not deal with them directly – they are passed to an application, or discarded if there is no application willing or registered to process these events. A possible client API would allow an application to register callbacks for specific event types. Such callback will be triggered when the DASH client parses the emsg box in a Segment, or when it parses the Event element in the MPD.

In the ad insertion context, user-defined events can be used to signal information, such as cue messages (e.g. SCTE 35 [54])

10.7. MPD updates

If MPD@minimumUpdatePeriod is present, the MPD can be periodically updated. These updates can be synchronous, in which case their frequency is limited by MPD@minimumUpdatePeriod. In case of the main live profiles MPD updates may be triggered by DASH events. Fir details refer to section 4.5.

When new period containing stored ads is inserted into a linear program, and there is a need to unexpectedly alter this period the inserted media will not carry the emsg boxes – these will need to be inserted on-the-fly by proxies. In this case use of synchronous MPD updates may prove simpler.

MPD@publishTime provides versioning functionality: MPD with later publication times include all information that was included all MPDs with earlier publication times.

10.8. Session information

In order to allow fine-grain targeting and personalization, the identity of the client/viewer, should be known i.e. maintain a notion of a session.

HTTP is a stateless protocol, however state can be preserved by the client and communicated to the server.

The simplest way of achieving this is use of cookies. According to RFC 6265 [41], cookies set via 2xx, 4xx, and 5xx responses must be processed and have explicit timing and security model.

10.9. Tracking and reporting

The simplest tracking mechanism is server-side logging of HTTP GET requests. Knowing request times and correspondence of segment names to content constitutes an indication that a certain part of the content was requested. If MPDs (or remote element entities) are generated on the fly and identity of the requester is known, it is possible to provide more precise logging. Unfortunately this is a non-trivial operation, as same user may be requesting parts of content from different CDN nodes (or even different CDNs), hence log aggregation and processing will be needed.

Another approach is communicating with existing tracking server infrastructure using existing external standards. An IAB VAST-based implementation is shown in section 5.3.3.7.

DASH Callback events are defined in ISO/IEC 23009-1:2014 AMD3 [4], are a simple native implementation of time-based impression reporting (e.g., quartiles). A callback event is a promise by the DASH client to issue an HTTP GET request to a provided URL at a given offset from PeriodStart. The body of HTTP response is ignored. Callback events can be both, MPD and inband events.

10.10. Ad insertion architectures

The possible architectures can be classified based on the location of component that communicates with the ad decision service: a server-based approach assumes a generic DASH client and all communication with ad decision services done at the server side (even if this communication is triggered by a client request for a segment, remote element, or an MPD. The app-based approach assumes an application running on the end device and controlling one or more generic DASH clients.

Yet another classification dimension is amount of media engines needed for a presentation – i.e., whether parallel decoding needs to be done to allow seamless transition between the main and the inserted content, or content is conditioned well enough to make such transition possible with a single decoder.

Workflows can be roughly classified into linear and elastic. Linear workflows (e.g., live feed from an event) has ad breaks of known durations which have to be taken: main content will only resume after the end of the break and the programmer / operator needs to fill them with some inserted content. Elastic workflows assume that the duration of an ad break at a given cue location not fixed, thus the effective break length can vary (and can be zero if a break is not taken).

10.11. Server-based architecture

Server-based architecture

In the server-based model, all ad-related information is expressed via MPD and segments, and ad decisions are triggered by client requests for MPDs and for resources described in them (Segments, remote periods).

The server-based model is inherently MPD-centric – all data needed to trigger ad decision is concentrated in the MPD. In case where ad break location (i.e., its start time) is unknown at the MPD generation time, it is necessary to rely on MPD update functionality. The two possible ways of achieving these are described in 5.1.3.5.

In the live case, packager receives feed containing inband cues, such as MPEG-2 TS with SCTE 35 cue messages [54]. The packager ingests content segments into the CDN. In the on demand case, cues can be provided out of band.

Ad management is located at the server side (i.e., in the cloud), thus all manifest and content conditioning is done at the server side.

10.11.1. Implementation basics

A single ad is expressed as a single Period element.

Periods with content that is expected to be interrupted as a result of ad insertion should contain explicit start times (Period@start), rather than durations. This allows insertion of new periods without modifying the existing periods. If a period has media duration longer then the distance between the start of this period and the start of next period, use of start times implies that a client will start the playout of the next period at the time stated in the MPD, rather than after finishing the playout of the last segment.

An upcoming ad break is expressed as Period element(s), possibly remote.

Remote Periods are resolved on demand into one or more than one Period elements. It is possible to embed parameters from the cue message into the XLink URL of the corresponding remote period, in order to have them passed to the ad decision system via XLink resolver at resolution time.

In an elastic workflow, when an ad break is not taken, the remote period will be resolved into a period with zero duration. This period element will contain no adaptation sets.

If a just-in-time remote Period dereferencing is required by use of @xlink:actuate="onRequest", MPD update containing a remote period should be triggered close enough to the intended splice time. This can be achieved using MPD Validity events and full-fledged MPD update, or using MPD Patch and MPD Update events (see sec. 5.1.3.5 and 5.1.3.4). However, due to security reasons MPD Patch and MPD Update events should only be used with great care.

In case of Period@xlink:actuate="onRequest", MPD update and XLink resolution should be done sufficiently early to ensure that there are no artefacts due to insufficient time given to download the inserted content. Care needs to be taken so that the client is given a sufficient amount of time to (a) request and receive MPD update, and (b) dereference the upcoming remote period.

Note: It may be operationally simpler to avoid use of Period@xlink:actuate="onRequest", dereferencing in case of live content.

10.11.3. Timing and dereferencing

The only interface between DASH client and the XLink resolver is the XLink URL (i.e., the Period@xlink:href attribute).After resolution, the complete remote Period element is replaced with Period element(s) from the remote entity (body of HTTP response coming from XLink resolver). This means that the XLink resolver is (in the general case) unaware of the exact start time of the ad period.

In case of linear content, start of the ad period is only known a short time before the playback. The recommended implementation is to update the MPD at the moment the start of the ad period is known to the MPD generator.

The simplest approach for maintaining time consistency across dereferencing is to have the MPD update adding a Period@duration attribute to the latest (i.e., the currently playing) main content period. This means that the MPD resolver needs to include the Period@duration attribute into each of the Period elements returned in the remote entity. The downside of this approach is that the DASH client needs to be able to update the currently playing period.

An alternative approach is to embed the desired value of Period@start of the first period of the remote entity in the XLink URL (e.g., using URL query parameters). This approach is described in clause 5.3.5. The downside of this alternative approach is that the DASH specification does not constrain XLink URLs in any way, hence the XLink resolver needs to be aware of this URL query parameter interface defined in clause 5.3.5.

10.11.4. Asset identifiers

AssetIdentifier descriptors identify the asset to which a Period belongs. This can be used for implementation of client functionality that depends on distinguishing between ads and main content (e.g. progress bar).

Periods with same AssetIdentifier should have identical Adaptation Sets, Initialization Segments and same DRM information (i.e., DRM systems, licenses). This allows reuse of at least some initialization data across periods of the same asset, and ensures seamless continuation of playback if inserted periods have zero duration. Period continuity or connectivity should be signaled, if the content obeys the rules.

Using an asset identifier

10.11.5. MPD updates

MPD updates are used to implement dynamic behavior. An updated MPD may have additional (possibly – remote) periods. Hence, MPD update should be triggered by the arrival of the first cue message for an upcoming ad break. Ad breaks can also be canceled prior to their start, and such cancellation will also trigger an MPD update.

Frequent regular MPD updates are sufficient for implementing dynamic ad insertion. Unfortunately they create an overhead of unnecessary MPD traffic – ad breaks are rare events, while MPD updates need to be frequent enough if a cue message is expected to arrive only several seconds before the splice point. Use of HTTP conditional GET requests (i.e., allowing the server to respond with "304 Not Modified" if MPD is unchanged) is helpful in reducing this overhead, but asynchronous MPD updates avoid this overhead entirely.

DASH events with scheme "urn:mpeg:dash:event:2013" are used to trigger asynchronous MPD updates.

The simple mapping of live inband cues in live content into DASH events is translating a single cue into an MPD Validity expiration event (which will cause an MPD update prior to the splice time). MPD Validity expiration events need to be sent early enough to allow the client request a new MPD, resolve XLink (which may entail communication between the resolver and ADS), and, finally, download the first segment of the upcoming ad in time to prevent disruption of service at the splice point.

If several emsg boxes are present in a segment and one of them is the MPD Validity Expiration event, emsg carrying it shall always appear first.

10.11.6. MPD events

In addition to tracking events (ad starts, quartile tracking, etc.) the server may also need to signal additional metadata to the video application. For example, an ad unit may contain not only inline linear ad content (that is to be played before, during, or after the main presentation), it may also contain a companion display ad that is to be shown at the same time as the video ad. It is important that the server be able to signal both the presence of the companion ad and the additional tracking and click-through metadata associated with the companion.

With that said, there is no need to have a generic DASH client implement this functionality – it is enough to provide opaque information that the client would pass to an external module. Event @schemeIdUri provides us with such addressing functionality, while MPD events allow us to put opaque payloads into the MPD.

10.11.7. Workflows

In the workflows below we assume that our inputs are MPEG-2 transport streams with embedded SCTE 35 cue messages [54]. In our opinion this will be a frequently encountered deployment, however any other in-band or out-of-band method of getting cue messages and any other input format lend themselves into the same model.

10.11.8. Linear workflow

A real-time MPEG-2 TS feed arrives at both packager and MPD generator. While real-time multicast feeds are a very frequently encountered case, the same workflow can apply to cases such as ad replacement in a pre-recorded content (e.g., in time-shifting or PVR scenarios).

MPD generator generates dynamic MPDs. Packager creates DASH segments out of the arriving feed and writes them into the origin server. Client periodically requests the MPDs so that it has enough time to transition seamlessly into the ad period.

Packager and MPD generator may be tightly coupled (e.g. co-located on the same physical machine), or loosely coupled as they both are synchronized only to the clock of the feed.

Live workflow
10.11.8.1. Cue interpretation by the MPD generator

When an SCTE 35 cue message indicating an upcoming splice point is encountered by the MPD generator, the latter creates a new MPD for the same program, adding a remote period to it.

The Period@start attribute of the inserted period has splice_time() translated into the presentation timeline. Parameters derived from the cue message are inserted into the Period@xlink:href attribute of the inserted period. Examples below show architectures that allow finer targeting.

Immediate ad decision.

MPD generator keeps an up-to-date template of an MPD. At each cue message arrival, the generator updates its template. At each MPD request, the generator customizes the request based on the information known to it about the requesting client. The generator contacts ad decision server and produces one or more non-remote ad periods. In this case XLink is not needed.

Stateful cue translation.

MPD generator keeps an up-to-date template of an MPD. At each cue message arrival, the generator updates its template. At each MPD request, the generator customizes the request based on the information known to it about the requesting client.

The operator targets separately male and female audiences. Hence, the generator derives this from the information it has regarding the requesting client (see 5.1.3.6), and inserts an XLink URL with the query parameter ?gender=male for male viewers, and ?gender=female for the female viewers.

Note that this example also showcases poor privacy practices – would such approach be implemented, both parameter name and value should be encrypted or TLS-based communication should be used

Stateless cue translation.

At cue message arrival, the MPD generator extracts the entire SCTE 35 splice_info_section (starting at the table_id and ending with the CRC_32) into a buffer. The buffer is then encoded into URL-safe base64url format according to RFC 4648 [60], and inserted into the XLink URL of a new remote Period element. splice_time is translated into Period@start attribute. The new MPD is pushed to the origin.

Note: this example is a straightforward port of the technique defined for SCTE 67 [55], but uses base64url and not base64 encoding as the section is included in a URI.

10.11.8.2. Cue interpretation by the packager

Cue interpretation by the packager is optional and is an optimization, rather than core functionality. On reception of an SCTE 35 cue message signaling an upcoming splice, an emsg with MPD Validity Expiration event is inserted into the first available segment. This event triggers an MPD update, and not an ad decision, hence the sum of the earliest presentation time of the emsgbearing segment and the emsg.presentation_time_delta should be sufficiently earlier than the splice time. This provides the client with sufficient time to both fetch the MPD and resolve XLink.

splice_time() of the cue message is translated into the media timeline, and last segment before the splice point is identified. If needed, the packager can also finish the segment at the splice point and thus having a segment shorter than its target duration.

10.11.8.3. Multiple cue messages

There is a practice of sending several SCTE 35 cue messages for the same splice point (e.g., the first message announces a splice in 6 seconds, the second arrives 2 seconds later and warns about the same splice in 4 seconds, etc.). Both the packager and the MPD generator react on the same first message (the 6-sec warning in the example above), and do nothing about the following messages.

10.11.8.4. Cancelation

It is possible that the upcoming (and announced) insertion will be canceled (e.g., ad break needed to be postponed due to overtime). Cancelation is announced in a SCTE 35 cue message.

When cancelation is announced, the packager will insert the corresponding emsg event and the MPD generator will create a newer version of the MPD that does not contain the inserted period or sets its duration to zero. This implementation maintains a simpler less-coupled server side system at the price of an increase in traffic.

10.11.8.5. Early termination

It is also possible that a planned ad break will need to be cut short – e.g., an ad will be cut short and there will be a switch to breaking news. The DASH translation of this would be creating an emsg at the packager and updating the MPD appropriately. Treatment of early termination here would be same as treatment of a switch from main content to an ad break.

It is easier to manipulate durations when Period@duration is absent and only Period@start is used – this way attributes already known to the DASH client don’t change.

10.11.8.6. Informational cue messages

SCTE 35 can be used for purposes unrelated to signaling of placement opportunities. Examples of such use are content identification and time-of-day signaling. Triggering MPD validity expiration and possibly XLink resolution in this case may be an overreaction.

10.11.8.7. Ad decision
Ad decision

A client will attempt to dereference a remote period element by issuing an HTTP GET for the URL that appears in Period@xlink:href. The HTTP server responding to this request (XLink resolver) will contact the ad decision service, possibly passing it parameters known from the request URL and from client information available to it from the connection context. In case described in 5.3.3.2.1.3, the XLink resolver has access to a complete SCTE 35 message that triggered the splice.

The ad decision service response identifies the content that needs to be presented, and given this information the XLink resolver can generate one or more Period elements that would be then returned to the requesting DASH client.

A possible optimization is that resolved periods are cached – e.g. in case of 5.3.3.2.1.1 "male" and "female" versions of the content are only generated once in T seconds, with HTTP caching used to expire the cached periods after T seconds.

10.11.9. On demand workflow

In a VoD scenario, cue locations are known ahead of time. They may be available multiplexed into the mezzanine file as SCTE 35 or SCTE 104, or may be provided via an out-of-band EDL.

In VoD workflows both cue locations and break durations are known, hence there is no need for a dynamic MPD. Thus cue interpretation (which is same as in 5.3.3.2) can occur only once and result in a static MPD that contains all remote elements with all Period elements having Period@start attribute present in the MPD.

In elastic workflows ad durations are unknown, thus despite our knowledge of cue locations within the main content it is impossible to build a complete presentation timeline. Period@duration needs to be used. Remote periods should be dereferenced only when needed for playout. In case of a “jump” – random access into an arbitrary point in the asset – it is a better practice not to dereference Period elements when it is possible to determine the period from which the playout starts using Period@duration and asset identifiers. The functionality described in 5.3.3.2 is sufficient to address on-demand cases, with the only difference that a client should be able to handle zero-duration periods that are a result of avails that are not taken.

10.11.9.1. Capture to VoD

Capture to VoD use case is a hybrid between pure linear and on demand scenarios: linear content is recorded as it is broadcast, and is then accessible on demand. A typical requirement is to have the content available with the original ad for some time, after which ads can be replaced.

There are two possible ways of implementing the capture-to-VoD workflow.

The simplest is treating capture-to-VoD content as plain VoD, and having the replacement policy implemented on the XLink resolver side. This way the same Period element(s) will be always returned to the same requester within the window where ad replacement is disallowed; while after this window the behavior will be same as for any on-demand content. An alternative implementation is described in 5.3.3.5 below.

10.11.9.2. Slates and ad replacement

A content provider (e.g., OTT) provides content with ad breaks filled with its own ads. An ISP is allowed to replace some of these with their own ads. Conceptually there is content with slates in place of ads, but all slates can be shown and only some can be replaced.

An ad break with a slate can be implemented as a valid in-MPD Period element that also has XLink attributes. If a slate is replaceable, XLink resolution will result in new Period element(s), if not – the slate is played out.

10.11.9.3. Blackouts and alternative content

In many cases broadcast content cannot be shown to a part of the audience due to contractual limitations (e.g., viewers located close to an MLB game will not be allowed to watch it, and will be shown some alternative content). While unrelated to ad insertion per se, this use case can be solved using the same “default content” approach, where the in-MPD content is the game and the alternative content will be returned by the XLink resolver if the latter determines (in some unspecified way) that the requester is in the blackout zone.

10.11.9.4. Tracking and reporting

A Period, either local or a remote entity, may contain an EventStream element with an event containing IAB VAST 3.0 Ad element [53]. DASH client does not need to parse the information and act accordingly – if there is a listener to events of this type, this listener can use the VAST 3.0 Ad element to implement reporting, tracking and companion ads. The processing done by this listener does not have any influence on the DASH client, and same content would be presented to both “vanilla” DASH client and the player in which a VAST module registers with a DASH client a listener to the VAST 3.0 events. VAST 3.0 response can be carried in an Event element where EventStream@schemeIdUri value is http://dashif.org/identifiers/vast30.

An alternative implementation uses DASH Callback events to point to the same tracking URLs. While DASH specification permits both inband and MPD Callback events, inband callback events shall not be used.

10.11.10. Examples

MPD with mid-roll ad breaks and default content.

In this example, a movie (“Top Gun”) is shown on a linear channel and has two mid-roll ad breaks. Both breaks have default content that will be played if the XLink resolver chooses not to return new Period element(s) or fails.

In case of the first ad break, SCTE 35 cue message is passed completely to the XLink resolver, together with the corresponding presentation time.

In case of the second ad break, proprietary parameters u and z describe the main content and the publishing site.

<MPD xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns="urn:mpeg:dash:schema:mpd:2011"
    xsi:schemaLocation="urn:mpeg:dash:schema:mpd:2011 DASH-MPD.xsd"
    type="dynamic"
    minimumUpdatePeriod="PT2S"
    timeShiftBufferDepth="PT600S"
    minBufferTime="PT2S"
    profiles="urn:mpeg:dash:profile:isoff-live:2011"
    availabilityStartTime="2012-12-25T15:17:50">
    <BaseURL>http://cdn1.example.com/</BaseURL>
    <BaseURL>http://cdn2.example.com/</BaseURL>

    <Period start="PT0.00S" duration="PT600.6S" id="movie period #1">
        <AssetIdentifier schemeIdUri="urn:org:dashif:asset-id:2013"
            value="md:cid:EIDR:10.5240%2f0EFB-02CD-126E-8092-1E49-W"/>
        <AdaptationSet mimeType="video/mp4" codecs="avc1.640828"
            frameRate="24000/1001" segmentAlignment="true" startWithSAP="1">
            <BaseURL>video_1/</BaseURL>
            <SegmentTemplate timescale="90000" initialization="$Bandwidth%/init.mp4v"
                media="$Bandwidth$/$Number%05d$.mp4v"/>
            <Representation id="v0" width="320" height="240" bandwidth="250000"/>
            <Representation id="v1" width="640" height="480" bandwidth="500000"/>
            <Representation id="v2" width="960" height="720" bandwidth="1000000"/>
        </AdaptationSet>
    </Period>

    <Period duration="PT60.6S" id="ad break #1"
            xlink:href="https://adserv.com/avail.mpd?scte35-time=PT600.6S&                  scte35-cue=DAIAAAAAAAAAAAQAAZ_I0VniQAQAgBDVUVJQAAAAH+cAAAAAA%3D%3D"
            xlink:actuate="onRequest" >

        <AdaptationSet mimeType="video/mp4" codecs="avc1.640828"
                       frameRate="30000/1001"
            segmentAlignment="true" startWithSAP="1">
            <BaseURL availabilityTimeOffset="INF">default_ad/</BaseURL>
            <SegmentTemplate timescale="90000" initialization="$Bandwidth%/init.mp4v"
                media="$Bandwidth%/$Time$.mp4v"/>
            <Representation id="v0" width="320" height="240" bandwidth="250000"/>
            <Representation id="v1" width="640" height="480" bandwidth="500000"/>
            <Representation id="v2" width="960" height="720" bandwidth="1000000"/>
        </AdaptationSet>
    </Period>

    <!—Movie, cont’d -->
    <Period duration="PT600.6S" id="movie period #2">
        <AssetIdentifier schemeIdUri="urn:org:dashif:asset-id:2013"
            value="md:cid:EIDR:10.5240%2f0EFB-02CD-126E-8092-1E49-W"/>
        <AdaptationSet mimeType="video/mp4" codecs="avc1.640828"
                       frameRate="24000/1001"
            segmentAlignment="true" startWithSAP="1">
            <BaseURL>video_2/</BaseURL>
            <SegmentTemplate timescale="90000" initialization="$Bandwidth%/init.mp4v"
                media="$Bandwidth%/$Time$.mp4v"/>
            <Representation id="v0" width="320" height="240" bandwidth="250000"/>
            <Representation id="v1" width="640" height="480" bandwidth="500000"/>
            <Representation id="v2" width="960" height="720" bandwidth="1000000"/>
        </AdaptationSet>
    </Period>

    <Period duration="PT60.6S" id="ad break #2"
        xlink:href=”https://adserv.com/avail.mpd?u=0EFB-02CD-126E-8092-1E49-W&z=spam”
        xlink:actuate="onRequest" >

        <AdaptationSet mimeType="video/mp4" codecs="avc1.640828"
                       frameRate="30000/1001"
            segmentAlignment="true" startWithSAP="1">
            <BaseURL availabilityTimeOffset="INF">default_ad2/</BaseURL>
            <SegmentTemplate timescale="90000" initialization="$Bandwidth%/init.mp4v"
                media="$Bandwidth%/$Time$.mp4v"/>
            <Representation id="v0" width="320" height="240" bandwidth="250000"/>
            <Representation id="v1" width="640" height="480" bandwidth="500000"/>
            <Representation id="v2" width="960" height="720" bandwidth="1000000"/>
        </AdaptationSet>
    </Period>
</MPD>

10.11.11. Use of query parameters

Parameters can be passed into the XLink resolver as a part of the XLink URL. Clause 5.3.3.2.1.3 shows an example of this approach when an SCTE 35 cue message is embedded into the XLink URL.

This approach can be generalized and several parameters (i.e., name-value pairs) can be defined. SCTE 214-1 2016 [56] takes this approach and defines parameters expressing splice time (i.e., Period@start of the earliest ad period), SCTE 35 cue message, and syscode (a geolocation identifier used in US cable industry). The first two parameters are also shown in example in clause 5.3.4.1 of this document.

Note: Effectively this creates a RESTful API for XLink dereferencing. While discussion above implies that these parameters are embedded by the MPD generator into the XLink URL, the parameter values may as well be calculated by the client or the embedded values may be modified by the client.

Note: The same RESTful API approach can be used with MPD URLs as well.

Note: More parameters may be defined in the future version of these guidelines.

10.12. App-based architecture

App-based architecture

Inputs in this use case are same as the ones described in sec. 5.3. At the packaging stage, cues are translated into a format readable by the app or/and DASH client and are embedded into media segments or/and into the manifest.

Ad management module is located at the client side. The DASH client receives manifest and segments, with cues embedded in either one of them or in both.

Cue data is passed to the ad management module, which contacts the ad decision service and receives information on content to be played. This results in an MPD for an inserted content and a splice time at which presentation of main content is paused and presentation of the inserted content starts.

Note that this architecture does not assume multiple decoders – with careful conditioning it is possible to do traditional splicing where inserted content is passed to the same decoder. In this case it is necessary to keep a player state and be able to initialize a player into this state.

10.12.1. Implementation basics

Each ad decision results in a separate MPD. A single MPD contains either main content or inserted content; existence of multiple periods or/and remote periods is possible but not essential.

10.12.2. SCTE 35 events

Cue messages are mapped into DASH events, using inband emsg boxes and/or in-MPD events. Note that SCTE 35 cue message may not be sufficient by itself.

The examples below show use of SCTE 35 in user-defined events, and presentation time indicates the timing in within the Period.

Figure 18 below shows the content of an emsg box at the beginning of a segment with earliest presentation time T. There is a 6-sec warning of an upcoming splice – delta to splice time is indicated as 6 seconds – and duration is given as 1 minute. This means that an ad will start playing at time T + 6 till T + 66. This example follows a practice defined in SCTE 214-3 [57].

Inband carriage of SCTE 35 cue messages

Figure 19 below shows the same example with an in-MPD SCTE35 cue message. The difference is in the in-MPD event the splice time is relative to the Period start, rather than to the start of the event-carrying segment. This figure shows a one-minute ad break 10 minutes into the period.

<EventStream schemeIdUri="urn:scte:scte35:2014:xml+bin">
    <Event timescale="90000" presentationTime="54054000" duration="5400000" id="1">
        <scte35:Signal>
             <scte35:Binary>
                 /DAIAAAAAAAAAAAQAAZ/I0VniQAQAgBDVUVJQAAAAH+cAAAAAA==
             </scte35:Binary>
         </scte35:Signal>
    </Event>
</EventStream>
In-MPD carriage of SCTE 35 cue message

Note: for brevity purposes SCTE 35 2014 allows use of base64-encoded section in Signal.Binary element as an alternative to carriage of a completely parsed cue message.

Normative definitions of carriage of SCTE 35 cue messages are in ANSI/SCTE 214-1 [56] sec 6.8.4 (MPD) and SCTE 214-3 [57] sec 8.3.3.

10.12.3. Asset identifiers

See sec. 5.3.2.2 for details.

10.12.4. Linear workflow

Linear workflow for app-driven architecture

A real-time MPEG-2 TS feed arrives at a packager. While real-time multicast feeds are a very frequently encountered case, the same workflow can apply to cases such as ad replacement in a pre-recorded content (e.g., in time-shifting or PVR scenarios).

Packager creates DASH segments out of the arriving feed and writes them into the origin server. The packager translates SCTE 35 cue messages into inband DASH events, which are inserted into media segments.

MPD generator is unaware of ad insertion functionality and the packager does the translation of SCTE 35 cue messages into inband user-defined DASH events. On reception of an SCTE 35 cue message signaling an upcoming splice, a emsg with a translation of the cue message in its emsg.message_data[] field is inserted into the most recent Segment. This event triggers client interaction with an ad decision server, hence the sum of the earliest presentation time of the emsg-bearing segment and the emsg.presentation_time_delta should be a translation of splice_time() into the media timeline.

An alternative implementation which is more compatible with server-based architecture in section 5.3, an MPD generator can generate separate MPDs for both server-based and app-based architectures creating remote periods for server-based and in-MPD SCTE 35 events for app-based architectures, while a packager can insert inband MPD validity expiration events.

A DASH client will pass the event to the app controlling it (e.g., via a callback registered by the app). The app will interpret the event and communicate with the ad decision server using some interface (e.g., VAST). This interface is out of the scope of this document.

The communication with ad decision service will result in an MPD URL. An app will pause the presentation of the main content and start presentation of the inserted content. After presenting the inserted content the client will resume presentation of the main content. This assumes either proper conditioning of the main and inserted content or existence of separate client and decoder for inserted content. The way pause/resume is implemented is internal to the API of the DASH client. Interoperability may be achieved by using the DASH MPD fragment interface, see ISO/IEC 23009-1 [4], Annex C.4

10.12.5. On demand workflow

As in the server-based case, functionality defined for the live case is sufficient. Moreover, the fact that that app-based implementation relies heavily on app’s ability to pause and resume the DASH client, support for elastic workflows is provided out of the box.

In the on demand case, as cue locations are well-known, it is advantageous to provide a static MPD with SCTE 35 events than run a dynamic service that relies on inband events.

10.13. AssetIdentifier extensions

What are "extensions"? Move this to features/constraints chapters?

AssetIdentifier descriptor shall be used for distinguishing parts of the same asset within a multi-period MPD, hence it shall be used for main content and may be used for inserted content. In order to enable better tracking and reporting, unique IDs should be used for different assets.

Use of EIDR and Ad-ID identification schemes is recommended. The value of @schemeIdUri set to "urn:eidr" signals use of EIDR. The value of @value attribute shall be a valid canonical EIDR entry as defined in [67].

Use of Ad-ID for asset identification is signaled by setting the value of @schemeIdUri to "urn:smpte:ul:060E2B34.01040101.01200900.00000000" ("designator" URN defined in SMPTE 2092-1 [68]). The value of @value attribute shall be a canonical full Ad-ID identifier as defined in SMPTE 2092-1 [68].

Other schemes may be used, including user private schemes, by using appropriately unique values of @schemeIdUri.

In the absence of other asset identifier schemes, a DASH-IF defined scheme may be used with the value of @schemeIdUri set to "urn:org:dashif:asset-id:2014". If used, the value of @value attribute descriptor shall be a MovieLabs ContentID URN ([58], 2.2.1) for the content. It shall be the same for all parts of an asset. Preferred schemes are EIDR (main content) and AdID (advertising).

If a Period has one-off semantics (i.e., an asset is completely contained in a single period, and its continuation is not expected in the future), the author shall not use asset identifier on these assets.

Periods that do not contain non-remote AdaptationSet elements, as well as zero-length periods shall not contain the AssetIdentifier descriptor.

An MPD may contain remote periods, some of which may have default content. Some of which are resolved into multiple Period elements.

After dereferencing MPD may contain zero-length periods or/and remote Periods.

In case of Period@xlink:actuate="onRequest", MPD update and XLink resolution should be done sufficiently early to ensure that there are no artefacts due to insufficient time given to download the inserted content.

Period@xlink:actuate="onRequest" shall not be used if MPD@type ="dynamic" 5

10.15. User-defined event extensions

10.15.1. Cue message

Cue messages used in app-driven architecture shall be SCTE 35 events [54]. SCTE 35 event carriage is defined in ANSI/SCTE 214-1 (MPD) and ANSI/SCTE 214-3 (inband). For MPD events, the XML schema is defined in SCTE 35 2014 [54] and allows either XML representation or concise base64-coded representation.

NOTE: PTS offset appearing in SCTE 35 shall be ignored, and only DASH event timing mechanism may be used to determine splice points.

10.15.2. Reporting

MPD events with embedded IAB VAST 3.0 [53] response may be used for reporting purposes.

If only time-based reporting is required (e.g., reporting at start, completion, and quartiles), use of DASH callback event may be a simpler native way of implementing tracking. Callback events are defined in ISO/IEC 23009-1:2014 AMD3 [4].

10.15.3. Ad insertion event streams

Recommended Event Stream schemes along with their scheme identifier for app-driven ad insertion are:

  1. "urn:scte:scte35:2013:bin" for inband SCTE 35 events containing a complete SCTE 35 section in binary form, as defined in ANSI/SCTE 214-3.

  2. “urn:scte:scte35:2014:xml+bin” for SCTE 35 MPD events containing only base64 cue message representation, as defined in ANSI/SCTE 214-1. NOTE: the content of Event element is an XML representation of the complete SCTE 35 cue message, that contains Signal.Binary element rather than the Signal.SpliceInfoSection element, both defined in SCTE 35 2014.

  3. "http://dashif.org/identifiers/vast30" for MPD events containing VAST3.0 responses [53].

  4. urn:mpeg:dash:event:callback:2015 for DASH callback events.

11. Media coding technologies

This chapter describes the constraints that apply to media codecs when used in interoperable services.

Services SHALL use only the media codecs described in this chapter, in conformance with the requirements defined here.

Clients MAY support any set of codecs described in this chapter and SHALL NOT attempt to play back representations for which they do not have codec support.

11.1. H.264 (AVC)

The H.264 (AVC) codec [MPEGAVC] MAY be used by services for video adaptation sets. Clients SHOULD support this codec.

For representations up to 1280x720p resolution and up to 30 fps, the H.264 (AVC) Progressive High Profile Level 3.1 decoder SHALL be used.

For representations up to 1920x1080p resolution and up to 30 fps, the H.264 (AVC) Progressive High Profile Level 4.0 decoder SHALL be used.

The encapsulation of H.264 data in DASH containers SHALL conform to [iso14496-15].

Clients SHALL support SPS/PPS storage both in the initialization segment (sample entry avc1) and inband storage (sample entry avc3). Services MAY use either form.

Note: Use of avc3 is one of the factors that enables bitstream switching.

The below table lists examples of @codecs strings for H.264 (AVC) that match the decoders defined in this chapter.

Profile Level @codecs
H.264 (AVC) Progressive High Profile 3.1 avc1.64Y01F
avc3.64Y01F
4.0 avc1.64Y028
avc3.64Y028
Example @codecs strings for H.264 (AVC)

Note: Other @codecs strings may also be compatible (a higher level decoder can typically decode content intended for a lower level decoder).

For a detailed description on how to derive the signaling for the codec profile for H.264/AVC, see [DVB-DASH] section 5.1.3.

11.2. H.265 (HEVC)

The H.265 (HEVC) codec [MPEGHEVC] MAY be used by services for video adaptation sets.

For representations up to 1280x720p at up to 30 fps, the HEVC Main Profile Main Tier Level 3.1 decoder SHALL be used.

For representations up to 2048x1080 at up to 60 fps at 8-bit frame depth, the HEVC Main Profile Main Tier Level 4.1 decoder SHALL be used.

For representations up to 2048x1080 at up to 60 fps at 10-bit frame depth, the HEVC Main10 Profile Main Tier Level 4.1 decoder SHALL be used.

The encapsulation of H.265 data in DASH containers SHALL conform to [iso14496-15].

Clients SHALL support VPS/SPS/PPS storage both in the initialization segment (sample entry hvc1) and inband storage (sample entry hev1). Services MAY use either form.

Note: Use of hev1 is one of the factors that enables bitstream switching.

Where does UHD fit? Why is it in a separate chapter? We should unify.

The [ISOBMFF] sync sample signaling and [MPEGDASH] SAP type signaling SHALL be derived from the following table.

NAL unit type [ISOBMFF] sync sample flag [MPEGDASH] SAP type
IDR_N_LP true 1
IDR_W_RADL true 2 (if the IRAP has associated RADL pictures)
1 (if the IRAP has no associated RADL pictures)
BLA_N_LP true 1
BLA_W_RADL true 2 (if the IRAP has associated RADL pictures)
1 (if the IRAP has no associated RADL pictures)
BLA_W_LP false 3 (if the IRAP has associated RASL pictures)
true 2 (if the IRAP has no associated RASL pictures but has associated RADL pictures
true 1 (if the IRAP has no associated leading pictures)
CRA false 3 (if the IRAP has associated RASL pictures)
true 2 (if the IRAP has no associated RASL pictures but has associated RADL pictures)
true 1 (if the IRAP has no associated leading pictures)
Signaling dependent on HEVC IRAP pictures in [ISOBMFF] and [MPEGDASH].

IOP requires that each media segment start with SAP type 1 or 2. If the above table indicates SAP type 3, the content is not conforming to IOP.

When the table above lists multiple possible values for a given NAL unit type and the entity creating the signaling is not able to determine correctly which values to use, it SHALL use the first value listed in the table for that NAL unit type.

The below table lists examples of @codecs strings for H.265 (HEVC) that match the decoders defined in this chapter.

Profile Level @codecs
HEVC Main 3.1 hev1.1.2.L93.B0
hvc1.1.2.L93.B0
4.1 hev1.1.2.L123.B0
hvc1.12.L123.B0
HEVC Main-10 4.1 hev1.2.4.L123.B0
hvc1.2.4.L123.B0
Example @codecs strings for H.265 (HEVC)

Note: Other @codecs strings may also be compatible (a higher level decoder can typically decode content intended for a lower level decoder).

For a detailed description on how to derive the signaling for the codec profile for H.265/HEVC, see [DVB-DASH] section 5.2.2.

11.3. Decoder configuration with H.264 and H.265

This chapter applies only to video adaptation sets that use H.264 or H.265.

All initialization segments in the same video adaptation set SHALL use the same sample description (i.e. no mixing of avc1 and avc3 is allowed).

In representations using avc1 or hvc1 sample description:

In representations using avc3 or hev1 sample description:

11.4. Bitstream switching with H.264 and H.265

This chapter applies only to bitstream switching adaptation sets that use H.264 or H.265.

All representations SHALL be encoded using the avc3 or hev1 sample description.

The first presented sample’s composition time SHALL equal the first decoded sample’s decode time, which equals the baseMediaDecodeTime in the Track Fragment Decode Time Box (tfdt).

Note: This requires the use of negative composition offsets in a v1 Track Run Box (trun) for video samples, otherwise video sample reordering will result in a delay of video relative to audio.

What is the correct scoping for the above requirement? Is the composition time requirement specific to H.264/H.265? Or does it apply to all bitstream switching video? Or does it apply to all bitstream switching, not only video?

11.5. Thumbnail images

This chapter defines constraints for thumbnail adaptation sets.

Media segments SHALL be either JPEG or PNG images, using @mimeType of image/jpeg or image/png.

The adaptation set SHALL carry an essential property descriptor with @schemeIdUri="http://dashif.org/guidelines/thumbnail_tile". The @value SHALL indicate the numer of thumbnails in each media segment, with the syntax being HxV, where:

Descriptive attributes on the representation SHALL describe an entire grid of thumbnails (one media segment), not an individual thumbnail.

Note: JPEG images have a maximum width and height of 64K pixels.

Thumbnails stored in one grid SHALL be evenly distributed in time across the time span covered by the media segment on the MPD timeline, from left to right, then top to bottom.

Thumbnail presentation order for a 3x3 grid:
1 2 3
4 5 6
7 8 9
The following thumbnail adaptation set defines one media segment for every 125 seconds, containing a 25x1 image grid (25 columns, 1 row) with each image being 256x180 pixels. The display duration of each thumbnail image is 5 seconds. The single thumbnail representation requires 10 Kbps of bandwidth on average.
<AdaptationSet mimeType="image/jpeg">
  <SegmentTemplate media="thumbnails_$Number$.jpg" timescale="1" duration="125" />
  <Representation bandwidth="10000" width="6400" height="180">
    <EssentialProperty schemeIdUri="http://dashif.org/guidelines/thumbnail_tile" value="25x1" />
  </Representation>
</AdaptationSet>

Parts of the MPD structure that are not relevant for this chapter have been omitted - this is not a fully functional MPD file.

11.6. HE-AACv2 audio (stereo)

The codec for basic stereo audio support is MPEG-4 High Efficiency AAC v2 Profile, level 2 [MPEGAAC].

Note: HE-AACv2 is also standardized as Enhanced aacPlus in 3GPP TS 26.401.

HE-AACv2 Profile decoder can also decode any content that conforms to:

Therefore, services are free to use any AAC version. Typical clients are expected to play AAC-LC, HE-AAC and HE-AACv2 encoded content.

For content with SBR, i.e. @codecs=mp4a.40.5 or @codecs=mp4a.40.29, @audioSamplingRate signals the resulting sampling rate after SBR is applied, e.g. 48 kHz even if the AAC-LC core operates at 24 kHz.

For content with PS, i.e. @codecs=mp4a.40.29, the AudioChannelConfiguration element signals the resulting channel configuration after PS is applied, e.g. stereo even if the AAC-LC core operates at mono.

The encapsulation of HE-AACv2 data in DASH containers SHALL conform to [MP4].

SAP type SHALL be 1. The @codecs string SHALL have a value from the below table.

Profile @codecs
MPEG-4 AAC Profile [11] mp4a.40.2
MPEG-4 HE-AAC Profile [11] mp4a.40.5
MPEG-4 HE-AAC v2 Profile [11] mp4a.40.29
Permitted HE-AACv2 @codecs values.

To conform to [DVB-DASH], explicit backwards compatible signaling SHALL be used to indicate the use of the SBR and PS coding tools for all HE-AAC and HE-AACv2 bitstreams.

What does the above requirement actually mean - what does an implementation have to do? Unclear right now.

11.7. HE-AACv2 audio (multichannel)

This chapter extends HE-AACv2 requirements with multichannel scenarios. All constraints defined for the stereo scenario also apply here.

Support for multichannel content is available in the HE-AACv2 Profile, starting with level 4 for 5.1 and level 6 for 7.1. Decoders implementing MPEG-4 HE-AACv2 multichannel profiles are fully compatible with content encoded in conformance to HE-AACv2 stereo requirements defined in IOP.

The content SHOULD be prepared incorporating loudness and dynamic range information into the bitstream also considering DRC Presentation Mode in [iso14496-3-2009-amd4-2013].

Decoders SHALL support decoding of loudness and dynamic range related information, i.e. dynamic_range_info() and MPEG4_ancillary_data() in the bitstream.

11.8. CEA-608/708 Digital Television (DTV) Closed Captioning

This chapter defines requirements for interoperable use of CEA-608/708 Digital Television (DTV) Closed Captioning [CEA708] in DASH presentations.

Note: This chapter is compatible with draft SCTE specification DVS 1208 and therefore SCTE URNs are used for the descriptor @schemeIdUri.

CEA-608/708 captions SHALL be carried in SEI messages embedded in representations of a video adaptation set, with the encapsulation as defined in [SCTE128-1], section 8.1. The SEI message payload_type=4 is used to indicates that Rec. ITU-T T.35 based SEI messages are in use.

ITU-T T.35 referenced above seems unrelated to the topic. What is the correct reference?

Is the payload_type sentence meant to be a requirement or a description of the referenced spec or what is the utility of this statement in IOP?

Every representation in the video adaptation set SHALL have identical CEA-608/708 captions. Both CEA-608 and CEA-708 MAY be present simultaneously in the same video adaptation set.

The presence of CEA-608/708 captions SHALL be signaled by an Accessibility descriptor on the adaptation set level, with @schemeIdUri="urn:scte:dash:cc:cea-608:2015" or @schemeIdUri="urn:scte:dash:cc:cea-708:2015", with an optional @value.

When present for CEA-608 captions, the @value of this descriptor SHALL describe the caption streams and languages in conformance to the ABNF below.

@value          = (channel *3 [";" channel]) / (language *3[";" language])
channel         = channel-number "=" language
channel-number  = CC1 | CC2 | CC3 | CC4
language        = 3ALPHA ; language code per ISO 639.2/B [45]

Two variants of @value syntax for CEA-608 are described above - a variant with plain language codes and a variant with caption channel numbers. Services SHOULD use the variant with channel numbers.

Note: IOP does not provide the @value syntax for CEA-708. See [SCTE214-1].

Signaling of presence of CEA-608 closed caption service in English and German
<Accessibility schemeIdUri="urn:scte:dash:cc:cea-608:2015" value="CC1=eng;CC3=deu"/>

11.9. Timed Text (IMSC1)

This chapter defines requirements for using IMSC1 text [61] in DASH presentations.

W3C TTML [ttml2] and its various profiles - W3C IMSC1 [ttml-imsc1.1] (text and image profiles), SMPTE Timed Text [SMPTE2052-1-2013], and EBU Timed Text [EBU-TT] - provide a rich feature set for text tracks. Beyond basic subtitles and closed captioning, for example, graphics-based subtitles and closed captioning are also supported by IMSC1.

Many clients only implement a subset of IMSC1. The exact feature sets used by clients and services may need careful alignment to ensure mutual compatibility. Do not assume that all of IMSC1 is supported by typical clients - this is unlikely.

Conversion of CEA-608 and CEA-708 into IMSC1 SHALL be done according to [SMPTE2052-10] and [SMPTE2052-11], respectively.

One of the following storage formats SHALL be used for IMSC1 representations:

The ISO BMFF encapsulated form SHOULD be used, as stand-alone XML file storage has significant limitations. See also § 5.2.10 Timing of stand-alone IMSC1 and WebVTT text files.

Note: [DVB-DASH] only supports the ISO BMFF encapsulated form.

The signaling in the MPD SHALL conform to the below table.

Codec Storage @mimeType @codecs
IMSC1 Timed Text [61] Stand-alone XML file application/ttml+xml See W3C TTML Profile Registry [62]
IMSC1 Timed Text [61] ISO BMFF encapsulation
[ISOBMFF]
[iso14496-30][29]
application/mp4
IMSC1 signaling parameters.

11.10. Enhanced AC-3 (Dolby Digital Plus)

The @codecs parameter SHALL be ec-3. SAP type SHALL be 1.

The AudioChannelConfiguration element SHALL use @schemeIdUri="tag:dolby.com,2014:dash:audio_channel_configuration:2011" with @value as defined in the DASH-IF identifier registry.

Signaling and encapsulation SHALL conform to [ETSI102366] Annex F.

11.11. Dolby TrueHD

The @codecs parameter SHALL be mlpa. SAP type SHALL be 1.

Signaling and encapsulation SHALL conform to [Dolby-TrueHD].

11.12. AC-4

The @codecs parameter SHALL be ac-4. SAP type SHALL be 1.

The AudioChannelConfiguration element SHALL use @schemeIdUri="tag:dolby.com,2014:dash:audio_channel_configuration:2011" with @value as defined in the DASH-IF identifier registry.

Signaling and encapsulation SHALL conform to [ETSI103190-1] Annex E.

11.13. DTS-HD

DTS-HD [ETSI102114] comprises a number of profiles optimized for specific applications. More information about DTS-HD and the DTS-HD profiles can be found at https://dts.com/.

For all DTS formats SAP is always 1.

The signaling and encapsulation SHALL conform to [DTS9302J81100], [DTS9302K62400] and to the below table.

Codec @codecs
DTS Digital Surround dtsc
DTS-HD High Resolution and DTS-HD Master Audio dtsh
DTS Express dtse
DTS-HD Lossless (no core) dtsl
DTS @codecs values

11.14. MPEG Surround

MPEG Surround [iso23003-1] is a scheme for coding multichannel signals based on a down-mixed signal of the original multichannel signal, and associated spatial parameters. The down-mix SHALL be coded with MPEG-4 High Efficiency AAC v2.

MPEG Surround used in DASH SHALL comply with level 4 of the Baseline MPEG Surround profile.

SAP type SHALL be 1. @codecs SHALL be mp4a.40.30.

11.15. MPEG-H 3D Audio

MPEG-H 3D Audio [iso23008-3] encoded content SHALL comply with Level 1, 2 or 3 of the MPEG-H Low Complexity (LC) Profile.

In addition to the requirements in [iso23008-3], the following constraints SHALL apply to storage of raw MPEG-H audio frames in DASH containers:

Note: The mpegh3daConfig() structure is expected to be different for each representation in an adaptation set.

SAP type shall be 1.

ISO BMFF encapsulation SHALL conform to [iso23008-3].

Codec @codecs
MPEG-H 3D audio LC profile level 1 mhm1.0x0B
MPEG-H 3D audio LC profile level 2 mhm1.0x0C
MPEG-H 3D audio LC profile level 3 mhm1.0x0D
Permitted @codecs values

11.16. MPEG-D Unified Speech and Audio Coding

MPEG-D Unified Speech and Audio Coding (USAC) has been designed to provide consistently high audio quality with a variety of content that comprises a mixture of audio and speech signals. Using such a codec in a DASH streaming environment enables adaptive switching capability from 12 kbps stereo up to transparency.

[iso23000-19-2018-amd2-2019] defines a media profile xHE-AAC for MPEG-D USAC that is suitable for streaming applications.

Usage of USAC in DASH presentations SHALL conform to [iso23000-19-2018-amd2-2019], providing support up to 5.1 multichannel coding.

SAP type SHALL be 1. @codecs SHALL be mp4a.40.42.

11.17. UHD HEVC 4K

For the support of broad set of use cases the DASH-IF IOP HEVC 4k Extension is defined. UHD HEVC 4k video encoded with H.265/HEVC is an advanced distribution format for TV services that enables higher resolution experiences in an efficient manner.

This extension describes requirements for content at 4k resolutions up to 60fps, and defines the required codec support as HEVC Main 10 Level 5.1.

The conformance to DASH-IF IOP HEVC 4k may be signaled by a @profile attribute with the value http://dashif.org/guidelines/dash-if-uhd#hevc-4k.

NAL Structured Video streams conforming to this Media Profile SHALL NOT exceed the following coded picture format constraints:

There is a bunch of stuff below with no obvious connection to UHD. Should this not also be in the non-UHD HEVC chapter?

Additional coded picture format constraints:

The bitstream SHALL comply with the Main10 Tier Main Profile Level 5.1 restrictions as specified in [MPEGHEVC].

UHD HEVC 4k bitstreams SHALL set vui_parameters_present_flag to 1 in the active Sequence Parameter Set, i.e. HEVC bitstreams shall contain a Video Usability Information syntax structure.

The sample aspect ratio information shall be signaled in the bitstream using the aspect_ratio_idc value in the Video Usability Information (see [MPEGHEVC] table E1). UHD HEVC 4k bitstreams SHALL represent square pixels. Therefore, aspect_ratio_idc SHALL be set to 1.

The following restrictions SHALL apply for the fields in the sequence parameter set:

The following restrictions SHALL apply for the fields in the profile_tier_level syntax structure in the sequence parameter set:

UHD HEVC 4k bitstreams shall obey the limits in [MPEGHEVC] table A.1 and table A.2 associated to Level 5.1. general_level_idc shall be less than or equal to 153 (level 5.1).

Bitstreams which are compliant with the Main or Main10 profile SHOULD set general_profile_compatibility_flag[1] to 1.

The chromaticity coordinates of the ideal display, opto-electronic transfer characteristic of the source picture and matrix coefficients used in deriving luminance and chrominance signals from the red, green and blue primaries SHALL be explicitly signaled in the encoded HEVC Bitstream by setting the appropriate values for each of the following 3 parameters in the VUI: colour_primaries,transfer_characteristics, and matrix_coeffs.

[ITU-R-BT.709] colorimetry usage SHALL be signaled by setting colour_primaries to the value 1, transfer_characteristics to the value 1 and matrix_coeffs to the value 1.

The bitstream MAY contain SEI messages as permitted by [MPEGHEVC] and described in [MPEGHEVC] Annex D.

The @codecs parameter SHALL be set to either "hvc1.2.4.L153.B0" or "hev1.2.4.L153.B0" and SHALL NOT exceed the capabilities described by these values.

Bitstreams conforming to this chapter MAY contain one or more sets of optional dynamic metadata. The presence of dynamic metadata is signalled by a supplemental property descriptor with @schemeIdUri="http://dashif.org/metadata/hdr" and @value from the following table:

Scheme @value
ETSI TS 103.433 SEI messages TS103433
HEVC HDR dynamic metadata schemes.

11.17.1. TS 103.433 HDR dynamic metadata

This chapter applies to video adaptation sets that carry a supplemental property descriptor with @schemeIdUri="http://dashif.org/metadata/hdr" and @value="TS103433".

The bitstream SHALL contain one or more SL-HDR Information SEI messages, as defined in clause A.2.2 of [ETSI103433-1], and MAY contain one or more Mastering Display Colour Volume SEI messages, as defined in [MPEGHEVC].

The SL-HDR Information SEI message SHALL be present at least with every SAP type 1 or type 2.

When carried, the Mastering Display Colour Volume SEI message SHALL be present at least with every SAP type 1 or type 2 and SHALL be used as specified in clause A.3 of [ETSI103433-1].

11.17.2. HEVC UHD compatibility aspects

This specification is designed such that UHD content that is authored in conformance to IOP is expected to conform to the media profile defined by [DVB-DASH] and following the 3GPP H.265/HEVC UHD Operation Point in section 5.6 of [3GPP26.116]. However, in contrast to DVB and 3GPP, only BT.709 may be used and not BT.2020.

In addition, clients conforming to this extension are expected to be capable of playing content authored as conform to the media profile defined by [DVB-DASH] and following the 3GPP H.265/HEVC UHD Operation Point in section 5.6 of [3GPP26.116], if BT.709 colour space is used.

11.18. HEVC HDR PQ10

For the support of broad set of use cases addressing higher dynamic range (HDR) and wide colour gamut (WCG), the DASH-IF IOP HEVC HDR Perceptual Quantization (PQ) 10 Extension is defined. This interoperability point allows for additional UHD features including Wide Color Gamut, High Dynamic Range and a new electro-optical transfer curve. These features are in addition to the existing features described in the DASH-IF UHD 4k interoperability point, except that that this profile is designed for HDR, and requires the use of SMPTE ST 2084 [71] and Rec. BT-2020 [74] colour space. Note that this is identical to Rec. BT-2100 [80], PQ transfer function, Y’C’BC’R color difference formats, with 10 bit signal representation and narrow range.

Note that this Extension does not require the use of the maximum values, such as 60fps or 4K resolution. The content author may offer lower spatial and temporal resolutions and may use the regular DASH signalling to indicate the actual format of the source and rendering format. Typical cases may be to use HDR together with an HD 1080p signal. Note also that Adaptation Set Switching as defined in section 3.8 may be used to separate different spatial resolutions in different Adaptation Sets to address different capabilities, but still permit the use of lower resolutions for service continuity of higher resolutions.

The compliance to DASH-IF IOP HEVC HDR PQ10 may be signaled by a @profile attribute with the value http://dashif.org/guidelines/dash-if-uhd#hevc-hdr-pq10.

The same requirements as for UHD HEVC 4k as documented in section 10.2 hold, expect for the changes as detailed below.

The changes in the HEVC HDR PQ10 profile that extend it beyond the HEVC 4K profile include:

Optional metadata may be present in form SEI messages defined in ITU-T H.265 /ISO/IEC 230082:2015 [19].

A bitstream conforming to the HEVC HDR PQ10 media profile shall comply with the Main Tier Main10 Profile Level 5.1 restrictions, as specified in Recommendation ITU-T H.265 / ISO/IEC 23008-2 [19].

In addition the requirements in section 10.2.2.2 apply, except that this profile requires the use of Recommendation ITU-R BT.2020 [74] non-constant luminance colorimetry and SMPTE ST 2084 [71].

SMPTE ST 2084 [71] usage shall be signaled by setting colour_primaries to the value 9, transfer_characteristics to the value 16 and matrix_coeffs to the value 9.

The bitstream may contain SEI messages as permitted by the Recommendation ITU-T H.265 / ISO/IEC 23008-2:2015 [19]. Details on these SEI messages are specified in Recommendation ITU-T H.265 / ISO/IEC 23008-2 / Annex D. SEI message may for example support adaptation of the decoded video signals to different display capabilities or more detailed content description, in particular those specified in Recommendation ITU-T H.265 / ISO/IEC 23008-2 / Annex D in relation to HDR. Other SEI Messages defined in ITU-T H.265 / ISO/IEC 23008-2 / Annex D may be present as well.

Receivers conforming to the HEVC HDR PQ10 media profile shall support decoding and displaying HEVC HDR PQ10 bitstreams as defined in section 10.3.2.2.

No additional processing requirements are defined, for example processing of SEI messages is out of scope.

If all Representations in an Adaptation Set conforms to the elementary stream constraints for the Media Profile as defined in clause 10.3.3.2 and the Adaptation Set conforms to the MPD signalling according to clause 10.3.3.2 and 10.3.3.4, and the Representations conform to the file format constraints in clause 10.3.3.3, then the @profiles parameter in the Adaptation Set may signal conformance to this operation point by using "http://dashif.org/guidelines/dashif-uhd#hevc-hdr-pq10".

The MPD shall conform to DASH-IF HEVC Main IOP as defined with the additional constraints defined in clause 10.3.3.4. The @codecs parameter shall not exceed and should be set to either "hvc1.2.4.L153.B0" or "hev1.2.4.L153.B0".

Content authored according to this extensions is expected to be interoperable with the HDR10 profile defined in the DECE CFF Content Specification v2.2 [78], although it should be noted that the DECE CFF profile may have additional constraints, such as bitrate restrictions and required metadata.

Content authored according to this extensions is expected to be interoperable with the PQ10 package defined in the UHD Forum Guidelines phase A [79].

11.18.1. HEVC PQ10 HDR dynamic metadata

Bitstreams conforming to the HEVC HDR PQ10 media profile may contain one or more sets of optional dynamic metadata. Details of the various metadata schemes are detailed below.

The presence of dynamic metadata is signalled by a Supplemental Descriptor with @schemeIdUri set to "http://dashif.org/metadata/hdr", the @value set to once of the values in the following table:

Scheme @value
SMPTE 2094-10 SEI messages SMPTE2094-10
SMPTE 2094-40 SEI messages SMPTE2094-40
TS 103.433 SEI messages TS103433
HEVC HDR PQ10 dynamic metadata schemes

11.18.2. SMPTE 2094-10 HDR dynamic metadata

When the Adaptation Set contains a Supplemental Descriptor with @schemeIdUri set to "http://dashif.org/metadata/hdr" and @value set to "SMPTE2094-10", then the bitstream shall contain SMPTE 2094-10 [83] metadata, provided as a Supplemental Enhancement Information (SEI) message containing a DM_data() message (as defined in SMPTE 2094-10 [83] Annex C- Display Management Message) in accordance with “User data registered by Recommendation ITU-T T.35 SEI message” syntax element.

In addition to the Bitstream Requirements defined above in Section 10.3.2.2, when ST2094-40 dynamic metadata is carried, exactly one ST 2094-10 SEI message shall be sent for every access unit of the bitstream.

11.18.3. SMPTE 2094-40 HDR dynamic metadata

When the Adaptation Set contains a Supplemental Descriptor with @schemeIdUri set to "http://dashif.org/metadata/hdr" and @value set to "SMPTE2094-40", then the bitstream shall contain SMPTE ST 2094-40 [89] metadata, provided as a Supplemental Enhancement Information (SEI) message (as defined in CTA861-G [90]) in accordance with “User data registered by Recommendation ITU-T T.35 SEI message” syntax element.

This SEI message provides information to enable colour volume transformation of the reconstructed colour samples of the output pictures. The input to the indicated colour volume transform process is the linearized RGB colour components of the source content. The semantics and usage of the dynamic metadata shall be in conformance with the specifications in SMPTE ST 2094-40 [89].

In addition to the Bitstream Requirements defined above in clause 10.3.2.2, when ST2094-40 dynamic metadata is carried, exactly one ST 2094-40 SEI message shall be present with every SAP of type 1 or type 2.

11.19. UHD Dual-Stream (Dolby Vision)

[DolbyVision-ISOBMFF]

Note: This This extension is designed to be compatible with the “Dolby Vision Media Profile Definition” in DECE “Common File Format & Media Formats Specification” Version 2.2. The name of the DASH-IF extension is inherited from the DECE document in order to indicate the compatibility with this DECE Media Profile.

For the support of broad set of backward compatible use cases the DASH-IF IOP Dual-Stream (Dolby Vision) Interoperability Point is defined. Backward Compatible refers to a simple method for one delivery format to satisfy both an HDR client and an SDR client. This Interoperability Point allows for two interlocked video streams, as described in the clause 10.4.2 below (restrictions to Enhancement Layers and Annex D 1.1). These two layers are known as the Base and Enhancement layers, where the Base Layer fully conforms to previous non-UHD or UHD DASHIF Interoperability point. The EL provides additional information, which combined with the BL in a composition process produces a UHD output signal, including Wide Color Gamut and High Dynamic Range signal at the client.

The compliance to DASH-IF IOP Dual-Stream (Dolby Vision) may be signaled by a @profile attribute on the Enhancement Layer with the value http://dashif.org/guidelines/dash-if-uhd#dvduallayer

The dual-stream solution includes two video streams, known as the Base Layer and the Enhancement Layer. The high-level overview of the dual-stream process is shown in Figure 26 Overview of Dual-stream System.

Overview of dual-stream system.

The MPD includes at least two Adaptation Sets as described below, including a Base Layer Adaptation Set and an Enhancement Layer Adaptation Set.

The Base Layer shall conform to the requirements of one of the following Interoperability Points: the DASH-IF IOP Main Interoperability Point, the DASH-IF IOP UHD 4k Interoperability point or the DASH-IF IOP UHD HDR10 Interoperability point. Any client that is able to play DASHIF IOP Main content, DASH-IF IOP UHD 4k content, or DASH-IF IOP UHD HDR10 content as appropriate will be able to play the content from the Base Layer track as determined by the client capabilities. To be clear, the Base Layer is 100% conforming, with no changes or additional information, to the profile definition. A client that plays content conforming to the Base Layer profile will be able to play the Base Layer content with no modification and no knowledge of the Enhancement Layer or and Dolby Vision specific information. See Annex E, Sample MPD, for an example dual-layer MPD.

In addition, The Enhancement Layer shall conform to H.265/HEVC Main10 Profile Main Tier as defined in Recommendation ITU-T H.265 / ISO/IEC 23008-2, Level 5.1 or lower The Enhancement Layer shall conform to the following additional requirements:

The client - may either play the Base Layer alone, in which case it complies with the requirements of those interoperability points, or the client plays the Base Layer and Enhancement Layer together, decoding both layers and combining them to produce a 12 bit enhanced HDR signal which conforms to REC.2020 color parameters and SMPTE-2084 electro-optical transfer function. The details of this combination operation are detailed in ETSI Specification “Compound Content Management” [85].

Content shall only be authored claiming conformance to this IOP if a client can properly play the content through the method of combining the Base Layer and Enhancement layers to produce an enhanced HDR output. Note that clients who conform to the profile associated with the Base Layer alone may play the Base Layer alone, with no information (and no knowledge) of the Enhancement Layer. In addition, the content shall follow the mandatory aspects and should take into account the recommendations and guidelines for content authoring documented in sections 8 and 10and HEVC-related issues in this section.

The dual-stream delivery of Dolby Vision asset uses two tracks; the Base Layer is written into one track according to the profile of the Base Layer, and the Enhancement Layer exists in a second track, per the [TBD Reference on integration, 12] specification and the details in Annex C and Annex D. In particular, details about required mp4 Boxes and sample entries are detailed in Annex C, “Dolby Vision Streams Within the ISO Base Media File Format” The Enhancement Layer is identified by an additional parameter, @dependencyId, which identifies the Base layer which is the match for the Enhancement Layer as described in clause 10.4.2.3.

If all Representations in an Adaptation Set conforms to the elementary stream constraints for the Media Profile as defined in clause 10.4.2.1 and the Adaptation Set conforms to the MPD signaling according to clause 10.4.3.2 and 10.4.3.3, and the Representations conform to the file format constraints in clause 10.4.3.4, then the @profiles parameter in the Adaptation Set may signal conformance to this operation point by using “http://dashif.org/guidelines/dash-ifuhd#dvduallayer on the Enhancement Layer (the Base Layer uses the normal signaling of the layer as defined in the profile of the Base Layer).

The MPD shall conform to DASH-IF HEVC Main IOP as defined with the additional constraints defined in clause 10.4.2.

When the Dual-Stream Dolby Vision asset is delivered as two files, the Enhancement Layer is identified by an additional parameter, @dependencyId, which identifies the Base Layer that is the match for the Enhancement Layer. The Base Layer Representation element must have an @id attribute, and the @dependencyId attribute on the Enhancement Layer Representation shall refer to that @id, to indicate to a client that these two representations are linked. Note that in this case, the @codecs attribute for the Base Layer will have only the Base Layer codec. In this example, the Base Layer @codecs might be: codecs="hvc1.1.0.L120.00" And the Enhancement Layer @codecs would be: codecs="dvhe.dtr.uhd30" For both the Base Layer and the Enhacncement Layer, HEVC decoders are used in accordance with the @codecs signaling on each layer. The syntax and semantics of the @codecs signaling on the enhancement layer is detailed in Annex D. The output of the decoders are combined by the method detailed in ETSI Specification “Compound Content Management” [85].

Content shall only be authored claiming conformance to this IOP if a client can properly play the content. In addition, the content shall follow the mandatory aspects and should take into account the recommendations and guidelines for content authoring documented in clause 8 and 10 and HEVC-related issues in clause 6.2.

11.19.1. Requirements for enhancement layer

The sample aspect ratio information shall be signaled in the bitstream using the aspect_ratio_idc value in the Video Usability Information (see values of aspect_ratio_idc in Recommendation ITU-T H.265 / ISO/IEC 23008-2:2013 [19], table E1).

In addition to the provisions set forth in Recommendation ITU-T H.265 / ISO/IEC 23008-2:2013 [19], the following restrictions shall apply for the fields in the sequence parameter set:

In addition to the requirements imposed in clause 10.4.2.2, the following additional specifications shall apply to the Enhancement Layer encoding: HEVC Enhancement Layer Bitstreams shall contain the following SEI messages:

CM_data() messages and DM_data() messages are carried in the enhancement layer video elementary stream as Supplemental Enhancement Information in HEVC’s “User data registered by Recommendation ITU-T T.35 SEI message” syntactic element. The syntax of the composing metadata SEI message and the display management SEI message is defined in Table 31.

Field Type Usage
user_data_registered_itu_t_t35( payloadSize ) {
itu_t_t35_country_code b(8) This 8-bit field shall have the value 0xB5.
itu_t_t35_provider_code u(16) This 16-bit field shall have the value 0x0031.
user_identifier u(32) This 32-bit code shall have the value 0x47413934 ("GA94").
user_data_type_code u(8) An 8-bit value that indentifies the type of user data to follow in the user_data_type_structure(). The values are defined in Table 32.
user_data_type_structure() This is a variable length set of data defined by the value of user_data_type_code and table C.1 (DM_data()) or table D.1 (CM_data()).
}
Compound Content Management SEI message: HEVC (prefix SEI NAL unit with nal_unit_type = 39, payloadType=4)
user_data_type_code user_data_type_structure()
0x00 to 0x07 Reserved
0x08 CM_data()
0x09 DM_data()
0x0A to 0xFF Reserved
User identifier

The composing metadata SEI message is a “user data registered by Recommendation ITU-T T.35 SEI message” containing a CM_data() message, as specified in Annex F. HEVC Enhancement Layer Bitstreams shall contain composing metadata SEI messages with the following constraints:

The display management SEI message is a “user data registered by Recommendation ITU-T T.35 SEI message” containing a DM_data() message, as specified in Annex C. HEVC Enhancement Layer Bitstreams shall contain display management SEI messages with the following constraints:

11.20. VP9

VP9 [86] is an alternative video codec is which may be used for SD, HD, and UHD spatial resolutions, as well as HDR10 and HDR12 bit depths (HDR + WCG); and frame rates of 24fps and higher. This codec provides significant bandwidth savings at equivalent qualities with respect to AVC/H.264. While not meant to replace AVC and HEVC, DASH presentations may include additional VP9 representations for playback on clients which support it.

For the integration in the context of DASH, the following applies for VP9:

For VP9 video streams, if the @bitstreamSwitching flag is set to true, then the following additional constraints shall apply:

The scope of the DASH-IF VP9-HD extension interoperability point is basic support of highquality video distribution over the top based on VP9 up to 1080p with 8-bit pixel depth and up to 30fps. Both, live and on-demand services are supported.

The compliance to DASH-VP9 main may be signaled by a @profiles attribute with the value "http://dashif.org/guidelines/dashif#vp9"

11.20.1. HD

A DASH client conforms to this extension IOP by supporting at least the following features:

11.20.2. UHD

The scope of the DASH-IF VP9-UHD extension interoperability point is basic support of highquality video distribution over the top based on VP9 up to 2160p with 8-bit pixel depth and up to 60fps. Both, live and on-demand services are supported. The compliance to DASH-VP9 main may be signaled by a @profiles attribute with the value "http://dashif.org/guidelines/dash-if-uhd#vp9"

A DASH client conforms to this extension IOP by supporting at least the following features:

11.20.3. HDR

The scope of the DASH-IF VP9-HDR extension interoperability point is basic support of highquality video distribution over the top based on VP9 up to 2160p with 10-bit pixel depth and up to 60fps. Both, live and on-demand services are supported.

The compliance to DASH-VP9 main may be signaled by a @profiles attribute with the value http://dashif.org/guidelines/dashif#vp9-hdr (up to HD/1080p resolution), or http://dashif.org/guidelines/dash-if-uhd#vp9-hdr (up to 4K resolution).

A DASH client conforms to this extension IOP by supporting at least the following features:

12. Content protection and security

DASH-IF provides guidelines for using multiple DRM systems to access a DASH presentation by adding encryption signaling and DRM system configuration to DASH content encrypted in conformance to Common Encryption [MPEGCENC]. In addition to content authoring guidelines, DASH-IF specifies interoperable workflows for DASH client interactions with DRM systems, platform APIs and external services involved in content protection interactions.

A DRM system cooperates with the device’s media platform to enable playback of encrypted content while protecting the decrypted samples and the content key against potential attacks. The DASH-IF implementation guidelines focus on the signaling in the DASH presentation and the interactions of the DASH client with other components.

This document does not define any DRM system. DASH-IF maintains a registry of DRM system identifiers on dashif.org.

Common Encryption [MPEGCENC] specifies several protection schemes which can be applied by a scrambling system and used by different DRM systems. The same encrypted DASH presentation can be decrypted by different DRM systems if a DASH client is provided the DRM system configuration for each DRM system, either in the MPD or at runtime.

A content key is a 128-bit key used by a DRM system to make content available for playback. It is identified by a UUID-format string called default_KID (or sometimes simply KID). A content key and its identifier are shared between all DRM systems, whereas the mechanisms used for key acquisition and content protection are largely DRM system specific. Different DASH adaptation sets are often protected by different content keys.

Example default_KID in string format: 72c3ed2c-7a5f-4aad-902f-cbef1efe89a9

A license is a data structure in DRM system specific format that contains one or more content keys and associates them with a policy that governs the usage of the content keys (e.g. expiration time). The encapsulated content keys are typically encrypted and only readable by the DRM system.

12.1. HTTPS and DASH

Transport security in HTTP-based delivery may be achieved by using HTTP over TLS (HTTPS) as specified in [RFC8446]. HTTPS is a protocol for secure communication which is widely used on the Internet and also increasingly used for content streaming, mainly for protecting:

As an MPD carries links to media resources, web browsers follow the W3C recommendation [mixed-content]. To ensure that HTTPS benefits are maintained once the MPD is delivered, it is recommended that if the MPD is delivered with HTTPS, then the media also be delivered with HTTPS.

DASH also explicitly permits the use of HTTPS as a URI scheme and hence, HTTP over TLS as a transport protocol. When using HTTPS in an MPD, one can for instance specify that all media segments are delivered over HTTPS, by declaring that all the BaseURL's are HTTPS based, as follow:

<BaseURL>https://cdn1.example.com/</BaseURL>
<BaseURL>https://cdn2.example.com/</BaseURL>

One can also use HTTPS for retrieving other types of data carried with a MPD that are HTTP-URL based, such as, for example, DRM licenses specified within the ContentProtection descriptor:

<ContentProtection
    schemeIdUri="urn:uuid:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
    value="DRMNAME version"
    <dashif:laurl>https://MoviesSP.example.com/protect?license=kljklsdfiowek</dashif:laurl>
</ContentProtection>

It is recommended that HTTPS be adopted for delivering DASH content. It should be noted nevertheless, that HTTPS does interfere with proxies that attempt to intercept, cache and/or modify content between the client and the TLS termination point within the CDN. Since the HTTPS traffic is opaque to these intermediate nodes, they can lose much of their intended functionality when faced with HTTPS traffic.

While using HTTPS in DASH provides good protection for data exchanged between DASH servers and clients, HTTPS only protects the transport link, but does not by itself provide an enforcement mechanism for access control and usage policies on the streamed content. HTTPS itself does not imply user authentication and content authorization (or access control). This is especially the case that HTTPS provides no protection to any streamed content cached in a local buffer at a client for playback. HTTPS does not replace a DRM.

12.2. Client reference architecture for encrypted content playback

Different software architectural components are involved in playback of encrypted content. The exact nature depends on the specific implementation. A high-level reference architecture is described here.

Reference architecture for encrypted content playback.

The media platform provides one or more APIs that allow the device’s media playback and DRM capabilities to be used by a DASH client. The DASH client is typically a library included in an app. On some device types, the DASH client may be a part of the media platform.

This document assumes that the media platform exposes its encrypted content playback features via an API similar to W3C Encrypted Media Extensions (EME) [encrypted-media]. The technical nature of the API may be different but EME-equivalent functionality is expected.

The media platform often implements at least one DRM system. Additional DRM system implementations can be included as libraries in the app.

The guidelines in this document define recommended workflows and default behavior for a generic DASH client implementation that performs playback of encrypted content. In many scenarios, the default behavior is sufficient. When deviation from the default behavior is desired, solution-specific logic and configuration can be provided by the app. Extension points are explicitly defined in the workflows at points where solution-specific decisions are most appropriate.

12.3. Content encryption and DRM

A DASH presentation MAY provide some or all adaptation sets in encrypted form, requiring the use of a DRM system to decrypt the content for playback. The duty of a DRM system is to decrypt content while preventing disclosure of the content key and misuse of the decrypted content (e.g. recording via screen capture software).

In a DASH presentation, every representation in an adaptation set SHALL be protected using the same content key (identified by the same default_KID).

Note: This means that if representations use different content keys, they must be in different adaptation sets, even if they would otherwise (were they not encrypted) belong to the same adaptation set. See also § 6.5 Switching across adaptation sets.

Encrypted DASH content SHALL use either the cenc or the cbcs protection scheme defined in [MPEGCENC]. cenc and cbcs are two mutually exclusive protection schemes. DASH content encrypted according to the cenc protection scheme cannot be decrypted by a DRM system supporting only the cbcs protection scheme and vice versa.

Some DRM system implementations support both protection schemes. Even when this is the case, clients SHALL NOT concurrently consume encrypted content that uses different protection schemes.

Representations in the same adaptation set SHALL use the same protection scheme. Representations in different adaptation sets MAY use different protection schemes. If both protection schemes are used in the same DASH period, all encrypted representations in that period SHALL be provided using both protection schemes. That is, the only permissible scenario for using both protection schemes together is to offer them as equal alternatives to target DASH clients with different capabilities.

Note: None of the CMAF presentation profiles defined in [MPEGCMAF] allow the presence of both cenc and cbcs content in the same period. While this is permitted by the DASH-IF guidelines - to allow DASH clients to choose between alternative protection schemes - such content would not be conforming to the presentation profiles defined in [MPEGCMAF].

Representations that contain the same media content using different protection schemes SHALL use different content keys. This protects against some cryptographic attacks [MSPR-EncryptionModes].

12.3.1. Robustness

DRM systems define rules that govern how they can be implemented. These rules can define different robustness levels which are typically used to differentiate implementations based on their resistance to attacks. The set of robustness levels, their names and the constraints that apply are all specific to each DRM system.

A hypothetical DRM system might define the following robustness levels:

Policy associated with content can require a DRM system implementation to conform to a certain robustness level, thereby ensuring that valuable content does not get presented on potentially vulnerable implementations. This policy can be enforced on different levels, depending on the DRM system:

  1. A license server may refuse to provide content keys to implementations with unacceptable robustness levels.

  2. The DRM system may refuse to use content keys whose license requires a higher robustness level than the implementation provides.

Multiple implementations of a DRM system may be available to a DASH client, potentially at different robustness levels. The DASH client must choose at media load time which DRM system implementation to use. However, the required robustness level may be different for different device types and is not expressed in the MPD! This decision is a matter of policy and is impossible for a DASH client to determine on its own. Therefore, solution-specific logic and configuration must inform the DASH client of the correct choice.

A DASH client SHALL enable solution-specific logic and configuration to specify the required robustness level. Depending on which DRM system is used, this can be implemented by:

  1. Changing the mapping of DRM system to key system in EME-based implementations (see § 12.3.2 W3C Encrypted Media Extensions).

  2. Specifying a minimum robustness level during capability detection (see § 12.7.1 Capability detection).

12.3.2. W3C Encrypted Media Extensions

Whereas the DRM signaling in DASH deals with DRM systems, EME deals with key systems. While similar in concept, they are not always the same thing. A single DRM system may be implemented on a single device by multiple different key systems, with different codec compatibility and functionality, potentially at different robustness levels.

A device may implement the "ExampleDRM" DRM system as a number of key systems:

Even if multiple variants are available, a DASH client SHOULD map each DRM system to a single key system. The default key system SHOULD be the one the DASH client expects to offer greatest compatibility with content (potentially at a low robustness level). The DASH client SHOULD allow solution-specific logic and configuration to override the key system chosen by default (e.g. to force the use of a high-robustness variant).

12.4. Content protection constraints for CMAF

The structure of content protection related information in the CMAF containers used by DASH is largely specified by [MPEGCMAF] and [MPEGCENC] (in particular section 8). This chapter outlines some additional requirements to ensure interoperable behavior of DASH clients and services.

Note: This document uses the cenc: prefix to reference the XML namespace urn:mpeg:cenc:2013 [MPEGCENC].

Initialization segments SHOULD NOT contain any moov/pssh box ([MPEGCMAF] 7.4.3) and DASH clients MAY ignore such boxes when encountered. Instead, pssh boxes required for DRM system initialization are part of the DRM system configuration and SHOULD be placed in the MPD as cenc:pssh elements in DRM system specific ContentProtection descriptors.

Note: Placing the pssh boxes in the MPD has become common for purposes of operational agility - it is often easier to update MPD files than rewrite initialization segments when the default DRM system configuration needs to be updated. Furthermore, in some scenarios the appropriate set of pssh boxes is not known when the initialization segment is created.

Protected content MAY be published without any pssh boxes in both the MPD and media segments. All DRM system configuration can be provided at runtime, including the pssh box data. See also § 12.5.3 Providing default DRM system configuration.

Media segments MAY contain moof/pssh boxes ([MPEGCMAF] 7.4.3) to provide updates to DRM system internal state (e.g. to supply new leaf keys in a key hierarchy). These state updates are transparent to the DASH client - the media platform is expected to intercept the moof/pssh boxes and supply them directly to the active DRM system. See § 12.5.2.1 default_KID in hierarchical/derived/variant key scenarios for an example.

12.4.1. Content protection data in CMAF containers

This chapter describes the structure of content protection data in CMAF containers used to provide encrypted content in a DASH presentation, summarizing the requirements defined by [ISOBMFF], [MPEGDASH], [MPEGCENC], [MPEGCMAF] and other parts of DASH-IF implementation guidelines.

DASH initialization segments contain:

DASH media segments are composed of a single CMAF fragment that contains:

A key hierarchy is implemented by listing the default_KID in the tenc box of the initialization segment (identifying the root key) and then overriding the key identifier in the sgpd boxes of media segments (identifying the leaf keys that apply to each media segment). The moof/pssh box is used to deliver/unlock new leaf keys and provide the associated license policy.

When using CMAF chunks for delivery, each CMAF fragment may be split into multiple CMAF chunks. If the CMAF fragment contained any moof/pssh boxes, copies of these boxes SHALL be present in each CMAF chunk that starts with an independent media sample.

Note: While DASH only requires the presence of moof/pssh in the first CMAF chunk, the requirement is more extensive in the interest of HLS interoperability [HLS-LowLatency].

12.5. Encryption and DRM signaling in the MPD

A DASH client needs to recognize encrypted content and activate a suitable DRM system, configuring it to decrypt content. The MPD informs a DASH client of the protection scheme used to protect content, identifies the content keys that are used and optionally provides the default DRM system configuration for a set of DRM systems.

The DRM system configuration is the complete data set required for a DASH client to activate a single DRM system and configure it to decrypt content using a single content key. It is supplied by a combination of XML elements in the MPD and/or solution-specific logic and configuration. The DRM system configuration often contains:

The exact set of values required for successful DRM workflow execution depends on the requirements of the selected DRM system (e.g. what kind of initialization data it can accept) and the mechanism used for content key acquisition (e.g. the DASH-IF interoperable license request model). By default, a DASH client SHOULD assume that a DRM system accepts initialization data in pssh format and that the DASH-IF interoperable license request model is used for content key acquisition.

When configuring a DRM system to decrypt content using multiple content keys, a distinct DRM system configuration is associated with each content key. Concurrent use of multiple DRM systems is not an interoperable scenario.

Note: In theory, it is possible for the DRM system initialization data to be the same for different content keys. In practice, the default_KID is often included in the initialization data so this is unlikely. Nevertheless, DASH clients cannot assume that using equal initialization data implies anything about equality of the DRM system configuration or the content key - the default_KID is the factor identifying the scope in which a single content key is to be used. See § 12.5.2 default_KID defines the scope of DRM system interactions.

12.5.1. Signaling presence of encrypted content

The presence of a ContentProtection descriptor with schemeIdUri="urn:mpeg:dash:mp4protection:2011" on an adaptation set informs a DASH client that all representations in the adaptation set are encrypted in conformance to Common Encryption ([MPEGDASH] 5.8.4.1, 5.8.5.2 and [MPEGCENC] 11) and require a DRM system to provide access.

This descriptor is present for all encrypted content ([MPEGDASH] 5.8.4.1). It SHALL be defined on the adaptation set level. The value attribute SHALL be either cenc or cbcs, matching the used protection scheme. The cenc:default_KID attribute SHALL be present and have a value matching the default_KID in the tenc box. The value SHALL be expressed in lowercase UUID string notation.

Signaling an adaptation set encrypted using the cbcs scheme and with a content key identified by 34e5db32-8625-47cd-ba06-68fca0655a72.
<ContentProtection
    schemeIdUri="urn:mpeg:dash:mp4protection:2011"
    value="cbcs"
    cenc:default_KID="34e5db32-8625-47cd-ba06-68fca0655a72" />

12.5.2. default_KID defines the scope of DRM system interactions

A DASH client interacts with one or more DRM systems during playback in order to control the decryption of content. Some of the most important interactions are:

The scope of each of these interactions is defined by the default_KID. Each distinct default_KID identifies exactly one content key. The impact of this is further outlined in § 12.7 DRM workflows in DASH clients.

When activating a DRM system, a DASH client SHALL determine the required set of content keys based on the default_KID values of adaptation sets selected for playback. This set of content keys is used to activate the DRM system, after which zero or more of the content keys from this set are available for playback.

Clients SHALL provide all default_KIDs of the selected adaptation sets to the DRM system during activation and SHALL NOT assume that activating a DRM system with one content key will implicitly enable the use of any other content key.

Note: An occasionally encountered anti-pattern is to activate a DRM system for only key X but to configure the license server to always provide both keys X and Y when key X is requested. This is not inteoperable behavior.

The DASH client and/or DRM system MAY batch license requests for different default_KIDs (and the respective responses) into a single transaction (for example, to reduce the chattiness of license acquisition traffic).

Note: This optimization might require support from platform APIs and/or DRM system specific logic from the DASH client, as a batching mechanism is not yet a standard part of DRM related platform APIs.

12.5.2.1. default_KID in hierarchical/derived/variant key scenarios

While it is common that default_KID identifies the actual content key used for encryption, a DRM system MAY make use of other keys in addition to the one signalled by the default_KID value but this SHALL be transparent to the client with only the default_KID being used in interactions between the DASH client and the DRM system. See § 12.9 Controlling access rights with a key hierarchy.

In a hierarchical key scenario, default_KID references the root key and only the sample group descriptions reference the leaf keys.

In a hierarchical key scenario, default_KID identifies the root key, not the leaf key used to encrypt media samples, and the handling of leaf keys is not exposed to a DASH client. As far as a DASH client knows, there is always only one content key identified by default_KID.

This logic applies to all scenarios that make use of additional keys, regardless of whether they are based on the key hierarchy, key derivation or variant key ([iso23001-12]) concepts.

12.5.3. Providing default DRM system configuration

A DASH service SHOULD supply a default DRM system configuration in the MPD for all supported DRM systems in all encrypted adaptation sets. This enables playback without the need for DASH client customization or additional client-side configuration. DRM system configuration MAY also be supplied by solution-specific logic and configuration, replacing or enhancing the defaults provided in the MPD.

Any number of ContentProtection descriptors ([MPEGDASH] 5.8.4.1) MAY be present in the MPD to provide DRM system configuration. These descriptors SHALL be defined on the adaptation set level. The contents MAY be ignored by the DASH client if overridden by solution-specific logic and configuration - the DRM system configuration in the MPD simply provides default values known at content authoring time.

A ContentProtection descriptor providing a default DRM system configuration SHALL use schemeIdUri="urn:uuid:<systemid>" to identify the DRM system, with the <systemid> matching a value in the DASH-IF system-specific identifier registry. The value attribute of the ContentProtection descriptor SHOULD contain the DRM system name and version number in a human readable form (for diagnostic purposes).

Note: W3C defines the Clear Key mechanism ([encrypted-media] 9.1), which is a "dummy" DRM system implementation intended for client and platform development/testing purposes. Understand that Clear Key does not fulfill the content protection and content key protection duties ordinarily expected from a DRM system. For more guidelines on Clear Key usage, see § 12.10 Use of W3C Clear Key with DASH.

Each DRM system specific ContentProtection descriptor can contain a mix of XML elements and attributes defined by [MPEGCENC], the DRM system author, DASH-IF or any other party.

For DRM systems initialized by supplying pssh boxes [MPEGCENC], the cenc:pssh element SHOULD be present under the ContentProtection descriptor if the value is known at MPD authoring time. The base64 encoded contents of the element SHALL be equivalent to a complete pssh box including its length and header fields. See also § 12.4 Content protection constraints for CMAF.

DRM systems generally use the concept of license requests as the mechanism for obtaining content keys and associated usage constraints (see § 12.7.4 Performing license requests). For DRM systems that use this concept, one or more dashif:laurl elements SHOULD be present under the ContentProtection descriptor, with the value of the element being the URL to send license requests to. This URL MAY contain content identifiers.

Multiple mechanisms have historically been used to provide the license server URL in the MPD (e.g. embedding in the cenc:pssh data or passing by deprecated DRM system specific DASH-IF Laurl elements). A DASH client SHALL prefer dashif:laurl if multiple data sources for the URL are present in the MPD.

For DRM systems that require proof of authorization to be attached to the license request in a manner conforming to § 12.6 DASH-IF interoperable license request model, one or more dashif:authzurl elements SHOULD be present under the ContentProtection descriptor, containing the default URL to send authorization requests to (see § 12.7.4 Performing license requests).

Multiple dashif:laurl or dashif:authzurl elements under the same ContentProtection descriptor define sets of equivalent alternatives for the DASH client to choose from. A DASH client SHOULD select a random item from the set every time the value of such an element is used.

The above paragraph on URL handling should be generalized to all sets of alternative URLs but there does not seem to be a suitable chapter in v4.3 If such a chapter is created in v5, we could replace the above paragraph with a reference to the general URL handling guidelines.

A ContentProtection descriptor that provides default DRM system configuration for a fictional DRM system.
<ContentProtection
  schemeIdUri="urn:uuid:d0ee2730-09b5-459f-8452-200e52b37567"
  value="FirstDRM 2.0">
  <cenc:pssh>YmFzZTY0IGVuY29kZWQgY29udGVudHMgb2YgkXBzc2iSIGJveCB3aXRoIHRoaXMgU3lzdGVtSUQ=</cenc:pssh>
  <dashif:authzurl>https://example.com/tenants/5341/authorize</dashif:authzurl>
  <dashif:laurl>https://example.com/AcquireLicense</dashif:laurl>
</ContentProtection>

The presence of a DRM system specific ContentProtection descriptor is not required in order to activate the DRM system; these elements are used merely to provide the default DRM system configuration. Empty ContentProtection descriptors SHOULD NOT be present in an MPD and MAY be ignored by DASH clients.

Because default_KID determines the scope of DRM system interactions, the contents of DRM system specific ContentProtection elements with the same schemeIdUri SHALL be identical in all adaptation sets with the same default_KID. This means that a DRM system will treat equally all adaptation sets that use the same content key.

Note: If you wish to change the default DRM system configuration associated with a content key, you must update all the instances where the data is present in the MPD. For live services, this can mean updating the data in multiple periods.

To maintain the default_KID association, a DASH client that exposes APIs/callbacks to business logic for the purpose of controlling DRM interactions and/or supplying data for DRM system configuration SHALL NOT allow these APIs to associate multiple DRM system configurations for the same DRM system with the same default_KID. Conversely, DASH client APIs SHOULD allow business logic to provide different DRM system configurations for the same DRM system for use with different default_KIDs.

12.5.4. Delivering updates to DRM system internal state

Some DRM systems support live updates to DRM system internal state (e.g. to deliver new leaf keys in a key hierarchy). These updates SHALL NOT be present in the MPD and SHALL be delivered by moof/pssh boxes in media segments.

12.6. DASH-IF interoperable license request model

The interactions involved in acquiring licenses and content keys in DRM workflows have historically been proprietary, requiring a DASH client to be customized in order to achieve compatibility with specific DRM systems or license server implementations. This chapter defines an interoperable model to encourage the creation of solutions that do not require custom code in the DASH client in order to play back encrypted content. Use of this model is optional but recommended.

Any conformance statements in this chapter apply to clients and services that opt in to using this model (e.g. a "SHALL" statement means "SHALL, if using this model," and has no effect on implementations that choose to use proprietary mechanisms for license acquisition). The authorization service and license server are considered part of the DASH service.

In performing license acquisition, a DASH client needs to:

  1. Be able to prove that the user and device have the right to use the requested content keys.

  2. Handle errors in a manner agnostic to the specific DRM system and license server being used.

This license request model defines a mechanism for achieving both goals. This results in the following interoperability benefits:

These benefits increase in value with the size of the solution, as they reduce the development cost required to offer playback of encrypted content on a wide range of DRM-capable client platforms using different DRM systems, with licenses potentially served by different license server implementations.

12.6.1. Proof of authorization

An authorization token is a JSON Web Token used to prove to a license server that the caller has the right to use one or more content keys under certain conditions. Attaching this proof of authorization to a license request is optional, allowing for architectures where a "license proxy" performs authorization checks in a manner transparent to the DASH client.

The basic structural requirements for authorization tokens are defined by [jwt] and [jws]. This document adds some additional constraints to ensure interoperability. Beyond that, the license server implementation is what defines the contents of the authorization token (the set of claims it contains), as the data needs to express implementation-specific license server business logic parameters that cannot be generalized.

Note: An authorization token is divided into a header and body. The distinction between the two is effectively irrelevant and merely an artifact of the JWT specification. License servers may use existing fields and define new fields in both the header and the body.

Implementations SHALL process claims listed in [jwt] 4.1 "Registered Claim Names" when they are present (e.g. exp "Expiration Time" and nbf "Not Before"). The typ header parameter ([jwt] 5.1) SHOULD NOT be present. The alg header parameter defined in [jws] SHALL be present.

JWT headers, specifying digital signature algorithm and expiration time (general purpose fields):
{
    "alg": "HS256",
    "exp": "1516239022"
}

JWT body with list of authorized content key IDs (an example field that could be defined by a license server):

{
    "authorized_kids": [
        "1611f0c8-487c-44d4-9b19-82e5a6d55084",
        "db2dae97-6b41-4e99-8210-493503d5681b"
    ]
}

The above data sets are serialized and digitally signed to arrive at the final form of the authorization token: eyJhbGciOiJIUzI1NiIsImV4cCI6IjE1MTYyMzkwMjIifQ.eyJhdXRob3JpemVkX2tpZHMiOlsiMTYxMWYwYzgtNDg3Yy00NGQ0LTliMTktODJlNWE2ZDU1MDg0IiwiZGIyZGFlOTctNmI0MS00ZTk5LTgyMTAtNDkzNTAzZDU2ODFiIl19.tBvW6XVPHBRp1JEwItsVnbHwIqoqnQAVQfTV9PGMkIU

Authorization tokens are issued by an authorization service, which is part of a solution’s business logic. The authorization service has access to project-specific context that it needs to make its decisions (e.g. the active session, user identification and database of purchases/entitlements). A single authorization service can be used to issue authorization tokens for multiple license servers, simplifying architecture in solutions where multiple license server vendors are used.

Role of the authorization service in DRM workflow related communication.

An authorization service SHALL digitally sign any issued authorization token with an algorithm from the "HMAC with SHA-2 Functions" or "Digital Signature with ECDSA" sets in [jwt]. The HS256 algorithm is recommended as a highly compatible default, as it is a required part of every JWT implementation. License server implementations SHALL validate the digital signature and reject tokens with invalid signatures or tokens using signature algorithms other than those referenced here. The license server MAY further constrain the set of allowed signature algorithms.

Successful signature verification requires that keys/certificates be distributed and trust relationships be established between the signing parties and the validating parties. The specific mechanisms for this are implementation-specific and out of scope of this document.

12.6.1.1. Obtaining authorization tokens

To obtain an authorization token, a DASH client needs to know the URL of the authorization service. DASH services SHOULD specify the authorization service URL in the MPD using the dashif:authzurl element (see § 12.5.3 Providing default DRM system configuration).

If no authorization service URL is provided by the MPD nor made available at runtime, a DASH client SHALL NOT attach an authorization token to a license request. Absence of this URL implies that authorization operations are performed in a manner transparent to the DASH client (see § 12.6.3 Possible deployment architectures).

Authorization tokens are requested from all authorization services referenced by the selected adaptation sets.

DASH clients will use zero or more authorization tokens depending on the number of authorization service URLs defined for the set of content keys in use. One authorization token is requested from each distinct authorization service URL. The authorization service URL is specified individually for each DRM system and content key (i.e. it is part of the DRM system configuration). Services SHOULD use a single authorization token covering all content keys and DRM systems but MAY divide the scope of authorization tokens if appropriate (e.g. different DRM systems might use different license server vendors that use mutually incompatible authorization token formats).

Note: Path or query string parameters in the authorization service URL can be used to differentiate between license server implementations (and their respective authorization token formats).

DASH clients SHOULD cache and reuse authorization tokens up to the moment specified in the token’s exp "Expiration Time" claim (defaulting to "never expires"). DASH clients SHALL discard the authorization token and request a new one if the license server indicates that the authorization token was rejected (for any reason), even if the "Expiration Time" claim is not present or the expiration time is in the future (see § 12.6.2 Problem signaling and handling).

Before requesting an authorization token, a DASH client SHALL take the authorization service URL and add or replace the kids query string parameter containing a comma-separated list in ascending alphanumeric order of default_KID values obtained from the MPD. This list SHALL contain every default_KID for which proof of authorization is requested from this authorization service (i.e. every distinct default_KID for which the same set of URLs was specified using dashif:authzurl elements).

To request an authorization token, a DASH client SHALL make an HTTP GET request to this modified URL, attaching to the request any standard contextual information used by the underlying platform and allowed by active security policy (e.g. HTTP cookies). This data can be used by the authorization service to identify the user and device and assess their access rights.

Note: For DASH clients operating on the web platform, effective use of the authorization service may require the authorization service to exist on the same origin as the website hosting the DASH client in order to share the session cookies.

If the HTTP response status code indicates a successful result and Content-Type: text/plain, the HTTP response body is the authorization token.

Consider an MPD that specifies the authorization service URL https://example.com/Authorize for the content keys with default_KID values 1611f0c8-487c-44d4-9b19-82e5a6d55084 and db2dae97-6b41-4e99-8210-493503d5681b.

The generated URL would then be https://example.com/Authorize?kids=1611f0c8-487c-44d4-9b19-82e5a6d55084,db2dae97-6b41-4e99-8210-493503d5681b to which a DASH client would make a GET request:

GET /Authorize?kids=1611f0c8-487c-44d4-9b19-82e5a6d55084,db2dae97-6b41-4e99-8210-493503d5681b HTTP/1.1
Host: example.com

Assuming authorization checks pass, the authorization service would return the authorization token in the HTTP response body:

HTTP/1.1 200 OK
Content-Type: text/plain

eyJhbGciOiJIUzI1NiIsImV4cCI6IjE1MTYyMzkwMjIifQ.eyJhdXRob3JpemVkX2tpZHMiOlsiMTYxMWYwYzgtNDg3Yy00NGQ0LTliMTktODJlNWE2ZDU1MDg0IiwiZGIyZGFlOTctNmI0MS00ZTk5LTgyMTAtNDkzNTAzZDU2ODFiIl19.tBvW6XVPHBRp1JEwItsVnbHwIqoqnQAVQfTV9PGMkIU

If the HTTP response status code indicates a failure, a DASH client needs to examine the response to determine the cause of the failure and handle it appropriately (see § 12.6.2 Problem signaling and handling). DASH clients SHOULD NOT treat every failed authorization token request as a fatal error - if multiple authorization tokens are used to authorize access to different content keys, it may be that some of them fail but others succeed, potentially still enabling a successful playback experience. The examination of whether playback can successfully proceed SHOULD be performed only once all license requests have been completed and the final set of available content keys is known. See also § 12.7.3.1 Handling unavailability of content keys.

DASH clients SHALL follow HTTP redirects signaled by the authorization service.

12.6.1.2. Issuing authorization tokens

The mechanism of performing authorization checks is implementation-specific. Common approaches might be to identify the user from a session cookie, query the entitlements/purchases database to identify what rights are assigned to the user and then assemble a suitable authorization token, taking into account the license policy configuration that applies to the content keys being requested.

The structure of the authorization tokens is unconstrained beyond the basic requirements defined in § 12.6.1 Proof of authorization. Authorization services need to issue tokens that match the expectations of license servers that will be using these tokens. If multiple different license server implementations are served by the same authorization service, the path or query string parameters in the authorization service URL allow the service to identify which output format to use.

Example authorization token matching the requirements of a hypothetical license server.

JWT headers, specifying digital signature algorithm and expiration time:

{
    "alg": "HS256",
    "exp": "1516239022"
}

JWT body with list of authorized content key IDs (an example field that could be defined by a license server):

{
    "authorized_kids": [
        "1611f0c8-487c-44d4-9b19-82e5a6d55084",
        "db2dae97-6b41-4e99-8210-493503d5681b"
    ]
}

Serialized and digitally signed: eyJhbGciOiJIUzI1NiIsImV4cCI6IjE1MTYyMzkwMjIifQ.eyJhdXRob3JpemVkX2tpZHMiOlsiMTYxMWYwYzgtNDg3Yy00NGQ0LTliMTktODJlNWE2ZDU1MDg0IiwiZGIyZGFlOTctNmI0MS00ZTk5LTgyMTAtNDkzNTAzZDU2ODFiIl19.tBvW6XVPHBRp1JEwItsVnbHwIqoqnQAVQfTV9PGMkIU

An authorization service SHALL NOT issue authorization tokens that authorize the use of content keys that are not in the set of requested content keys (as defined in the request’s kids query string parameter). An authorization service MAY issue authorization tokens that authorize the use of only a subset of the requested content keys, provided that at least one content key is authorized. If no content keys are authorized for use, an authorization service SHALL signal a failure.

Note: During license issuance, the license server may further constrain the set of available content keys (e.g. as a result of examining the robustness level of the DRM system implementation requesting the license). See § 12.7.3.1 Handling unavailability of content keys.

Authorization tokens SHALL be returned by an authorization service using JWS Compact Serialization [jws] (the aaa.bbb.ccc format). The serialized form of an authorization token SHOULD NOT exceed 5000 characters to ensure that a license server does not reject a license request carrying the token due to excessive HTTP header size.

12.6.1.3. Attaching authorization tokens to license requests

Authorization tokens are attached to license requests using the Authorization HTTP request header, signaling the Bearer authorization type.

HTTP request to a hypothetical license server, carrying an authorization token.
POST /AcquireLicense HTTP/1.1
Authorization: Bearer eyJhbGciOiJIUzI1NiIsImV4cCI6IjE1MTYyMzkwMjIifQ.eyJhdXRob3JpemVkX2tpZHMiOlsiMTYxMWYwYzgtNDg3Yy00NGQ0LTliMTktODJlNWE2ZDU1MDg0IiwiZGIyZGFlOTctNmI0MS00ZTk5LTgyMTAtNDkzNTAzZDU2ODFiIl19.tBvW6XVPHBRp1JEwItsVnbHwIqoqnQAVQfTV9PGMkIU

(opaque license request blob from DRM system goes here)

The same authorization token MAY be used with multiple license requests but one license request SHALL only carry one authorization token, even if the license request is for multiple content keys. A DASH client SHALL NOT use content key batching features offered by the platform APIs to combine requests for content keys that require the use of separate authorization tokens.

A DASH client SHALL NOT make license requests for content keys that are configured as requiring an authorization token but for which the DASH client has failed to acquire an authorization token.

Note: A content key requires an authorization token if there is at least one dashif:authzurl in the MPD or if this element is added by solution-specific logic and configuration.

12.6.2. Problem signaling and handling

Authorization services and license servers SHOULD indicate an inability to satisfy a request by returning an HTTP response that:

  1. Signals a suitable status code (4xx or 5xx).

  2. Has a Content-Type of application/problem+json.

  3. Contains a HTTP response body conforming to [rfc7807].

HTTP response from an authorization service, indicating a rejected authorization token request because the requested content is not a part of the user’s subscriptions.
HTTP/1.1 403 Forbidden
Content-Type: application/problem+json

{
    "type": "https://dashif.org/drm-problems/not-authorized",
    "title": "Not authorized",
    "detail": "Your active service plan does not include the channel 'EurasiaSport'.",
    "href": "https://example.com/view-available-subscriptions?channel=EurasiaSport",
    "hrefTitle": "Available subscriptions"
}

A problem record SHALL contain a short human-readable description of the problem in the title field and SHOULD contain a human-readable description, designed to help the reader solve the problem, in the detail field.

Note: The detail field is intended to be displayed to users of a DASH client, not to developers. The description should be helpful to the user whose device the DASH client is running on.

During DRM system activation, it is possible that multiple failures occur. DASH clients SHOULD be capable of displaying a list of error messages to the end-user and SHOULD deduplicate multiple records with the same type (e.g. if an authorization token expires, this expiration may cause failures when requesting 5 content keys but should result in at most 1 error message being displayed).

Note: Merely the fact that a problem record was returned does not mean that it needs to be presented to the user or acted upon in other ways. The user may still experience successful playback in the presence of some failed requests. See § 12.7.3.1 Handling unavailability of content keys.

This chapter defines a set of standard problem types that SHOULD be used to indicate the nature of the failure. Implementations MAY extend this set with further problem types if the nature of the failure does not fit into the existing types.

Let’s come up with a good set of useful problem types we can define here, to reduce the set of problem types that must be defined in solution-specific scope.

12.6.2.1. Problem type: not authorized to access content

Type: https://dashif.org/drm-problems/not-authorized

Title: Not authorized

HTTP status code: 403

Used by: authorization service

This problem record SHOULD be returned by an authorization service if the user is not authorized to access the requested content keys. The detail field SHOULD explain why this is so (e.g. their subscription has expired, the requested content keys are for a movie not in their list of purchases, the content is not available in their geographic region).

The authorization service MAY supply a href (string) field on the problem record, containing a URL using which the user can correct the problem (e.g. purchase a missing subscription). If the href field is present, a hrefTitle (string) field SHALL also be present, containing a title suitable for a hyperlink or button (e.g. "Subscribe"). DASH clients MAY expose this URL and title in their user interface to enable the user to find a quick solution to the problem.

12.6.2.2. Problem type: insufficient proof of authorization

Type: https://dashif.org/drm-problems/insufficient-proof-of-authorization

Title: Not authorized

HTTP status code: 403

Used by: license server

This problem record SHOULD be returned by a license server if the proof of authorization (if any) attached to a license request is not sufficient to authorize the use of any of the requested content keys. The detail field SHOULD explain what exactly was the expectation the caller failed to satisfy (e.g. no token provided, token has expired, token is for disabled tenant).

Note: If the authorization token authorizes only a subset of requested keys, a license server does not signal a problem and simply returns only the authorized subset of content keys.

When encountering this problem, a DASH client SHOULD discard whatever authorization token was used, acquire a new authorization token and retry the license request. If no authorization service URL is available, this indicates a DASH service or client misconfiguration (as clearly, an authorization token was expected) and the problem SHOULD be escalated for operator attention.

12.6.3. Possible deployment architectures

The interoperable license request model is designed to allow for the use of different deployment architectures in common use today, including those where authorization duties are offloaded to a "license proxy". This chapter outlines some of the possible architectures and how interoperable DASH clients support them.

The baseline architecture assumes that a separate authorization service exists, implementing the logic required to determine which users have the rights to access which content.

The baseline architecture with an authorization service directly exposed to the DASH client.

While the baseline architecture offers several advantages, in some cases it may be desirable to have the authorization checks be transparent to the DASH client. This need may be driven by license server implementation limitations or by other system architecture decisions.

A common implementation for transparent authorization is to use a "license proxy", which acts as a license server but instead forwards the license request after authorization checks have passed. Alternatively, the license server itself may perform the authorization checks.

A transparent authorization architecture performs the authorization checks at the license server, which is often hidden behind a proxy (indistinguishable from a license server to the DASH client).

The two architectures can be mixed, with some DRM systems performing the authorization operations in the license server (or a "license proxy") and others using the authorization service directly. This may be relevant when integrating license servers from different vendors into the same solution.

A DASH client will attempt to contact an authorization service if an authorization service URL is provided either in the MPD or by solution-specific logic and configuration. If no such URL is provided, it will assume that all authorization checks (if any are required) are performed by the license server (in reality, often a license proxy) and will not attach any proof of authorization.

12.6.4. Passing a content ID to services

The concept of a content ID is sometimes used to identify groups of content keys based on solution-specific associations. The DRM workflows described by this document do not require this concept to be used but do support it if the solution architecture demands it.

In order to make use of a content ID in DRM workflows, the content ID SHOULD be embedded into authorization service URLs and/or license server URLs (depending on which components are used and require the use of the content ID). This may be done either directly at MPD authoring time (if the URLs and content ID are known at such time) or by solution-specific logic and configuration at runtime.

Having embedded the content ID in the URL, all DRM workflows continue to operate the same as they normally would, except now they also include knowledge of the content ID in each request to the authorization service and/or license server. The content ID is an addition to the license request workflows and does not replace any existing data.

Embedding a content ID allows the service handling the request to use the content ID in its business logic. However, the presence of a content ID in the URL does not invalidate any requirements related to the processing of the default_KID values of content keys. For example, an authorization service must still constrain the set of authorized content keys to a subset of the keys listed in the kids parameter (§ 12.6.1.2 Issuing authorization tokens).

No generic URL template for embedding the content ID is defined, as the content ID is always a proprietary concept. Recommended options include:

DRM system configuration with the content ID embedded in the authorization service and license server URLs. Each service may use a different implementation-defined URL structure for carrying the content ID.
<ContentProtection
  schemeIdUri="urn:uuid:d0ee2730-09b5-459f-8452-200e52b37567"
  value="AcmeDRM 2.0">
  <cenc:pssh>YmFzZTY0IGVuY29kZWQgY29udGVudHMgb2YgkXBzc2iSIGJveCB3aXRoIHRoaXMgU3lzdGVtSUQ=</cenc:pssh>
  <dashif:authzurl>https://example.com/tenants/5341/authorize?contentId=movie865343651</dashif:authzurl>
  <dashif:laurl>https://example.com/moviecatalog-license-api/movie865343651/AcquireLicense</dashif:laurl>
</ContentProtection>

The content ID SHOULD NOT be embedded in DRM system specific data structures such as pssh boxes, as logic that depends on DRM system specific data structures is not interoperable and often leads to increased development and maintenance costs.

12.7. DRM workflows in DASH clients

To present encrypted content a DASH client needs to:

  1. Select a DRM system that is capable of decrypting the content.

  2. Activate the selected DRM system and configure it to decrypt content.

A client also needs to take observations at runtime to detect the need for different content keys to be used (e.g. in live services that change the content keys periodically) and to detect content keys becoming unavailable (e.g. due to expiration of access rights).

This chapter defines the recommended DASH client workflows for interacting with DRM systems in these aspects.

12.7.1. Capability detection

A DRM system implemented by a client platform may only support playback of encrypted content that matches certain parameters (e.g. codec type and level). A DASH client needs to detect what capabilities each DRM system has in order to understand what adaptation sets can be presented and to make an informed choice when multiple DRM systems can be used.

A typical DRM system might offer the following set of capabilities:

A typical media platform API such as EME [encrypted-media] will require the DASH client to query the platform by supplying a desired capability set. The media platform will inspect the desired capabilities, possibly displaying a permissions prompt to the user (if sensitive capabilities such as unique user identification are requested), after which it will return a supported capability set that indicates which of the desired capabilities are available.

The DASH client presents a set of desired capabilities for each DRM system and receives a response with the supported subset.

The exact set of capabilities that can be used and the data format used to express them in capability detection APIs are defined by the media platform API. A DASH client is expected to have a full understanding of the potentially offered capabilities and how they map to parameters in the MPD. Some capabilities may have no relation to the MPD and whether they are required depends entirely on the DASH client or solution-specific logic and configuration.

To detect the set of supported capabilities, a DASH client must first determine the required capability set for each adaptation set. This is the set of capabilities required to present all the content in a single adaptation set and can be determined based on the following:

  1. Content characteristics defined in the MPD (e.g. codecs strings of the representations and the used protection scheme).

  2. Solution-specific logic and configuration (e.g. what robustness level is required).

Querying for the support of different protection schemes is currently not possible via the capability detection API of Encrypted Media Extensions [encrypted-media]. To determine the supported protection schemes, a DASH client must assume what the CDM supports. A bug is open on W3C EME and a pull request exists for the ISOBMFF file format bytestream. In future versions of EME, this may become possible.

Some of the capabilities (e.g. required robustness level) are DRM system specific. The required capability set contains the values for all DRM systems.

During DRM system selection, the required capability set of each adaptation set is compared with the supported capability set of a DRM system. As a result of this, each candidate DRM system is associated with zero or more adaptation sets that can be successfully presented using that DRM system.

It is possible that multiple DRM systems have the capabilities required to present some or all of the adaptation sets. When multiple candidates exist, the DASH client SHOULD enable solution-specific logic and configuration to make the final decision.

Note: Some sensible default behavior can be implemented in a generic way (e.g. the DRM system should be able to enable playback of both audio and video if both media types are present in the MPD). Still, there exist scenarios where the choices seem equivalent to the DASH client and an arbitrary choice needs to be made.

The workflows defined in this document contain the necessary extension points to allow DASH clients to exhibit sensible default behavior and enable solution-specific logic and configuration to drive the choices in an optimal direction.

12.7.2. Selecting the DRM system

The MPD describes the protection scheme used to encrypt content, with the default_KID values identifying the content keys required for playback, and optionally provides the default DRM system configuration for one or more DRM systems via ContentProtection descriptors. It also identifies the codecs used by each representation, enabling a DASH client to determine the set of required DRM system capabilities.

Neither an initialization segment nor a media segment is required to select a DRM system. The MPD is the only component of the presentation used for DRM system selection.

An adaptation set encrypted with a key identified by 34e5db32-8625-47cd-ba06-68fca0655a72 using the cenc protection scheme.
<AdaptationSet>
    <ContentProtection
        schemeIdUri="urn:mpeg:dash:mp4protection:2011"
        value="cenc"
        cenc:default_KID="34e5db32-8625-47cd-ba06-68fca0655a72" />
    <ContentProtection
        schemeIdUri="urn:uuid:d0ee2730-09b5-459f-8452-200e52b37567"
        value="FirstDrm 2.0">
        <cenc:pssh>YmFzZTY0IGVuY29kZWQgY29udGVudHMgb2YgkXBzc2iSIGJveCB3aXRoIHRoaXMgU3lzdGVtSUQ=</cenc:pssh>
        <dashif:authzurl>https://example.com/tenants/5341/authorize?mode=firstDRM</dashif:authzurl>
        <dashif:authzurl>https://alternative.example.com/tenants/5341/authorize?mode=firstDRM</dashif:authzurl>
        <dashif:laurl>https://example.com/AcquireLicense</dashif:laurl>
        <dashif:laurl>https://alternative.example.com/AcquireLicense</dashif:laurl>
    </ContentProtection>
    <ContentProtection
        schemeIdUri="urn:uuid:eb3841cf-d7e4-4ec4-a3c5-a8b7f9f4f55b"
        value="SecondDrm 8.0">
        <cenc:pssh>ZXQgb2YgcGxheWFibGUgYWRhcHRhdGlvbiBzZXRzIG1heSBjaGFuZ2Ugb3ZlciB0aW1lIChlLmcuIGR1ZSB0byBsaWNlbnNlIGV4cGlyYXRpb24gb3IgZHVl</cenc:pssh>
        <dashif:authzurl>https://example.com/tenants/5341/authorize?mode=secondDRM</dashif:authzurl>
    </ContentProtection>
    <Representation mimeType="video/mp4" codecs="avc1.64001f" width="640" height="360" />
    <Representation mimeType="video/mp4" codecs="avc1.640028" width="852" height="480" />
</AdaptationSet>

The MPD provides DRM system configuration for DRM systems:

There are two encrypted representations in the adaptation set, each with a different codecs string. Both codecs strings are included in the required capability set of this adaptation set. A DRM system must support playback of both representations in order to present this adaptation set.

In addition to the MPD, a DASH client can use solution-specific logic and configuration for controlling DRM selection and configuration decisions (e.g. loading license server URLs from configuration data instead of the MPD). This is often implemented in the form of callbacks exposed by the DASH client to an "app" layer in which the client is hosted. It is assumed that when executing any such callbacks, a DASH client makes available relevant contextual data, allowing the business logic to make fully informed decisions.

The purpose of the DRM system selection workflow is to select a single DRM system that is capable of decrypting a meaningful subset of the adaptation sets selected for playback. The selected DRM system will meet the following criteria:

  1. It is actually implemented by the media platform.

  2. It supprots a set of capabilities sufficient to present an acceptable set of adaptation sets.

  3. The necessary DRM system configuration for this DRM system is available.

It may be that the selected DRM system is only able to decrypt a subset of the encrypted adaptation sets selected for playback. See also § 12.7.3.1 Handling unavailability of content keys.

The set of adaptation sets considered during selection does not need to be constrained to a single period, potentially enabling seamless transitions to a new period with a different set of content keys.

In live services new periods may be added over time, with potentially different DRM system configuration and required capability sets, making it necessary to re-execute the selection process.

Note: If a new period has significantly different requirements in terms of DRM system configuration or the required capability sets, the media pipeline may need to be re-initialized to play the new period. This may result in a glitch/pause at the period boundary. The specifics are implementation-dependant.

The default DRM system configuration in the MPD of a live service can change over time. DASH clients are not expected to re-execute DRM workflows if the default DRM system configuration in the MPD changes for an adaptation set that has already been processed in the past. Such changes will only affect clients that are starting playback.

When encrypted adaptation sets are initially selected for playback or when the selected set of encrypted adaptation sets changes (e.g. because a new period was added to a live service), a DASH client SHOULD execute the following algorithm for DRM system selection:

  1. Let adaptation_sets be the set of encrypted adaptation sets selected for playback.

  2. Let signaled_system_ids be the set of DRM system IDs for which a ContentProtection descriptor is present in the MPD on any entries in adaptation_sets.

  3. Let candidate_system_ids be an ordered list initialized with items of signaled_system_ids in any order.

  4. Provide candidate_system_ids to solution-specific logic and configuration for inspection/modification.

    • This enables business logic to establish an order of preference where multiple DRM systems are present.

    • This enables business logic to filter out DRM systems known to be unsuitable.

    • This enables business logic to include DRM systems not signaled in the MPD.

  5. Let default_kids be the set of all distinct default_KID values in adaptation_sets.

  6. Let system_configurations be an empty map of system ID -> map(default_kid -> configuration), representing the DRM system configuration of each default_KID for each DRM system.

  7. For each system_id in candidate_system_ids:

    1. Let configurations be a map of default_kid -> configuration where the keys are default_kids and the values are the DRM system configurations initialized with data from ContentProtection descriptors in the MPD (matching on default_KID and system_id).

    2. Provide configurations to solution-specific logic and configuration for inspection and modification, passing system_id along as contextual information.

      • This enables business logic to override the default DRM system configuration provided by the MPD.

      • This enables business logic to inject values that were not embedded in the MPD.

      • This enables business logic to reject content keys that it knows cannot be used, by removing the DRM system configuration for them.

    3. Remove any entries from configurations that do not contain all of the following pieces of data:

    4. Add configurations to system_configurations (keyed on system_id).

  8. Remove from candidate_system_ids any entries for which the map of DRM system configurations in system_configurations is empty.

  9. Let required_capability_sets be a map of adaptation set -> capability set, providing the required capability set of every item in adaptation_sets.

  10. Match the capabilities of DRM systems with the required capability sets of adaptation sets:

    1. Let supported_adaptation_sets be an empty map of system ID -> list of adaptation set, incidating which adaptation sets are supported by which DRM systems.

    2. For each system_id in candidate_system_ids:

      1. Let candidate_adaptation_sets by the set of adaptation sets for which system_configurations contains DRM system configuration (keyed on system_id and then the default_KID of the adaptation set).

      2. Let maximum_capability_set be the union of all values in required_capability_sets keyed on items of candidate_adaptation_sets.

      3. Query the DRM system identified by system_id with the capability set maximum_capability_set, assigning the output to supported_capability_set.

        • A DRM system that is not implemented is treated as having no capabilities.

      4. For each adaptation_set in candidate_adaptation_sets:

        1. If supported_capability_set contains all the capabilities in the corresponding entry in required_capability_sets (keyed on adaptation_set), add adaptation_set to the list in supported_adaptation_sets (keyed on system_id).

  11. Remove from supported_adaptation_sets any entries for which the value (the set of adaptation sets) meets any of the following criteria:

    • The set is empty (the DRM system does not support playback of any adaptation set).

    • The set does not contain all encrypted media types present in the MPD (e.g. the DRM system can decrypt only the audio content but not the video content).

  12. If supported_adaptation_sets is empty, playback of encrypted content is not possible and the workflow ends.

  13. If supported_adaptation_sets contains multiple items, request solution-specific logic and configuration to select the preferred DRM system from among them.

  14. If solution-specific logic and configuration does not make a decision, find the first entry in candidate_system_ids that is among the keys of supported_adaptation_sets. Remove items with any other key from supported_adaptation_sets.

    • This falls back to the "order of preference" logic and takes care of scenarios where business logic did not make an explicit choice.

  15. Let selected_system_id be the single remaining key in supported_adaptation_sets.

  16. Let final_adaptation_sets be the single remaining value in supported_adaptation_sets.

  17. Let final_configurations (map of default_KID -> DRM system configuration) be the value from system_configurations keyed on selected_system_id.

  18. Remove from final_configurations any entries keyed on default_KID values that are not used by any adaptation set in final_adaptation_sets.

    • These are the configurations of adaptation sets for which configuration was present but for which the required capabilities were not offered by the DRM system.

  19. Prohibit playback of any encrypted adaptation sets that are not in final_adaptation_sets.

  20. Execute the DRM system activation workflow, providing selected_system_id and final_configurations as inputs.

If a DRM system is successfully selected, activation and potentially one or more license requests will follow before playback can proceed. These related workflows are described in the next chapters.

12.7.3. Activating the DRM system

Once a suitable DRM system has been selected, it must be activated by providing it a list of content keys that the DASH client requests to be made available for content decryption, together DRM system specific initialization data for each of the content keys. The result of activation is a DRM system that is ready to decrypt zero or more encrypted adaptation sets selected for playback.

During activation, it may be necessary to perform license requests in order to obtain some or all of the content keys and the usage policy that constrains their use. Some of the requested content keys may already be available to the DRM system, in which case no license request will be triggered.

Note: The details of stored content key management and persistent DRM session management are out of scope of this document - workflows described here simply accept the fact that some content keys may already be available, regardless of why that is the case or what operations are required to establish content key persistence.

Once a suitable DRM system has been selected, a DASH client SHOULD execute the following algorithm to activate it:

  1. Let configurations be the input to the algorithm; it is a map with the entry keys being default_KID values identifying the content keys and the entry values being the DRM system configuration to use with that particular content key.

  2. Let pending_license_requests be an empty set.

  3. For each kid and config pair in configurations invoke the platform API to activate the selected DRM system and signal it to make kid available for decryption, passing the DRM system the initialization data stored in config.

    • If the DRM system indicates that one or more license requests are needed, add any license request data provided by the DRM system and/or platform API to pending_license_requests, together with the associated kid and config values.

  4. If pending_license_requests is not an empty set, execute the license request workflow and provide this set as input to the algorithm.

  5. Inspect the set of content keys the DRM system indicates are now available and deselect from playback any adaptation sets for which the content key has not become available.

  6. Inspect the set of remaining adaptation sets to determine if a sufficient data set remains for successful playback. Raise error if playback cannot continue.

The default format for initialization data supplied to a DRM system is a pssh box. However, if the DASH client has knowledge of any special initialization requirements of a particular DRM system, it MAY supply initialization data in other formats (e.g. the keyids JSON structure used by W3C Clear Key). Presence of initialization data in the expected format is considered during DRM system selection when determining whether a DRM system is a valid candidate.

For historical reasons, platform APIs often implement DRM system activation as a per-content-key operation. Some APIs and DRM system implementations may also support batching all the content keys into a single activation operation, for example by combining multiple "content key and DRM system configuration" data sets into a single data set in a single API call. DASH clients MAY make use of such batching where supported by the platform API. The workflow in this chapter describes the most basic scenario where activation must be performed separately for each content key.

Note: The batching may, for example, be accomplished by concatenating all the pssh boxes for the different content keys. Support for this type of batching among DRM systems and platform APIs remains uncommon, despite the potential efficiency gains from reducing the number of license requests triggered.

12.7.3.1. Handling unavailability of content keys

It is possible that not all of the encrypted adaptation sets selected for playback can actually be played back (e.g. because a content key for ultra-HD content is only authorized for use by implementations with a high robustness level). The unavailability of one or more content keys SHOULD NOT be considered a fatal error condition as long as at least one audio and at least one video adaptation set remains available for playback (assuming both content types are initially selected for playback). This logic MAY be overridden by solution specific business logic to better reflect end-user expectations.

The set of available content keys can change over time (e.g. due to license expiration or due to new periods in the presentation requiring different content keys). A DASH client SHALL monitor the set of default_KID values that are required for playback and either request the DRM system to make these content keys available or deselect the affected adaptation sets when the content keys become unavailable. Conceptually, any such change can be handled by re-executing the DRM system selection and activation workflows, although platform APIs may also offer more fine-grained update capabilities.

A DASH client can request a DRM system to enable decryption using any set of content keys (if it has the necessary DRM system configuration). However, this is only a request and playback can be countermanded at multiple stages of processing by different involved entities.

The set of content keys made available for use can be far smaller than the set requested by a DASH client. Example workflow indicating potential instances of content keys being removed from scope.

The set of available content keys is only known at the end of executing the activation workflow and may decrease over time (e.g. due to license expiration). The proper handling of unavailable keys depends on the limitations imposed by the platform APIs.

Media platform APIs often refuse to start or continue playback if the DRM system is not able to decrypt all the data already in media platform buffers.

It may be appropriate for a DASH client to avoid buffering data for encrypted adaptation sets until the required content key is known to be available. This allows the client to avoid potentially expensive buffer resets and rebuffering if unusable data needs to be removed from buffers.

Note: The DASH client should still download the data into intermediate buffers for faster startup and simply defer submitting it to the media platform API until key availability is confirmed.

If a content key expires during playback, it is common for a media platform to pause playback until the content key can be refreshed with a new license or until data encrypted with the now-unusable content key is removed from buffers. DASH clients SHOULD acquire new licenses in advance of license expiration. Alternatively, DASH clients should implement appropriate recovery/fallback behavior to ensure a minimally disrupted user experience in situations where some content keys remain available.

12.7.3.2. Content protection policies

When content keys are acquired, the license that delivers them also supplies a policy for the DRM system, instructing it how to protect the content that is made accessible by the content keys.

Protection policy may define the following example requirements:

Typical DRM systems will enforce the most restrictive protection policy from among all active content keys and will refuse to start playback if any of the constraints cannot be satisfied! As a result, it can be the case that even though only the constraints for a UHD video stream cannot be satisfied, playback of even the lower quality levels is blocked.

In many cases, it might be more desirable to instead exclude the UHD quality level from the set of adaptation sets selected for playback and DRM system activation. Alternatively, there may be a different DRM system implementation available on the device that is capable of satisfying the constraints. It is not possible for a DASH client to resolve these constraints as it has no knowledge of what policy applies nor of the capabilities of the different DRM system implementations.

Solution-specific logic and configuration SHOULD be used to select the most suitable DRM system, taking into consideration the protection policy, and to preemptively exclude adaptation sets from playback if it can be foreseen that the protection policy for their content keys cannot be satisfied. Likewise, license servers SHOULD NOT provide content keys if it can be foreseen that the recipient will be unable to satisfy their protection policy.

12.7.4. Performing license requests

DASH clients performing license requests SHOULD follow the DASH-IF interoperable license request model. The remainder of this chapter only applies to DASH clients that follow this model. Alternative implementations are possible and in common use but are not interoperable and are not described in this document.

DRM systems generally do not perform license requests on their own. Rather, when they determine that a license is required, they generate a document that serves as the license request body and expect the DASH client to deliver it to a license server for processing. The latter returns a suitable response that, if a license is granted, encapsulates the content keys in an encrypted form only readable to the DRM system.

Simplified conceptual model of license request processing. Many details omitted.

The request and response body are in DRM system specific formats and considered opaque to the DASH client. A DASH client SHALL NOT modify the request body or the response body.

The license request workflow defined here exists to enable the following goals to be achieved without the need to customize the DASH client with logic specific to a DRM system or license server implementation:

  1. Provide proof of authorization if the license server requires the DASH client to prove that the user being served has the rights to use the requested content keys.

  2. Execute the license request workflow driven purely by the MPD, without any need for solution-specific logic and configuration.

  3. Detect common error scenarios and present an understandable message to the user.

The proof of authorization is optional and the need to attach it to a license request is indicated by the presence of at least one dashif:authzurl in the DRM system configuration. The proof of authorization is a JSON Web Token in compact encoding (the aaa.bbb.ccc form) returned as the HTTP response body when the DASH client performs a GET request to this URL. The token is attached to a license request in the HTTP Authorization header with the Bearer type. For details, see § 12.6 DASH-IF interoperable license request model.

Error responses from both the authorization service and the license server SHOULD be returned as [rfc7807] compatible responses with a 4xx or 5xx status code and Content-Type: application/problem+json.

DASH clients SHOULD implement retry behavior to recover from transient failures and expiration of authorization tokens.

To process license requests queued during execution of the DRM system activation workflow, the client SHOULD execute the following algorithm:

  1. Let pending_license_requests be the set of license requests that the DRM system has requested to be performed, with at least the following data present in each entry:

  2. Let retry_requests be an empty set. It will contain the set of license requests that are to be retried due to transient failure.

  3. Let pending_authz_requests be a map of URL -> GUID[], with the keys being authorization service URLs and the values being lists of default_KIDs. The map is initially empty.

  4. For each request in pending_license_requests:

    1. If the DRM system configuration does not contain at least one value for dashif:authzurl, skip to the next loop iteration. This means that no authorization token is to be attached to this license request.

    2. Create/update the entry in pending_authz_requests with the key being the set of dashif:authzurl values; add the default_KID to the list in the map entry value.

  5. Let authz_tokens be a map of GUID -> string, with the keys being default_KIDs and the values being the associated authorization tokens. The map is initially empty.

  6. For each authz_url_set and kids pair in pending_authz_requests:

    1. If the DASH client has a cached authorization token previously acquired for the same authz_url_set and kids combination that still remains valid according to its exp "Expiration Time" claim:

      1. Let authz_token be the cached authorization token.

    2. Else:

      1. Create a comma-separated list from kids in ascending alphanumeric (ASCII) order.

      2. Let authz_url be a random item from authz_url_set.

      3. Let authz_url_with_kids be authz_url with an additional query string parameter named kids with the value from kids.

        • authz_url may already include query string parameters, which should be preserved!

      4. Perform an HTTP GET request to authz_url_with_kids (following redirects).

        • Include any relevant HTTP cookies.

        • Allow solution-specific logic and configuration to intercept the request and inspect/modify it as needed (e.g. provide additional HTTP request headers to enable user identification).

      5. If the response status code indicates failure, make a note of any error information for later processing and skip to the next authz_url.

      6. Let authz_token be the HTTP response body.

      7. Submit authz_token into the DASH client cache, with the cache key being a combination of authz_url_set and kids, and the cache entry expiration being defined by the exp "Expiration Time" claim in the authorization token (defaulting to never expires).

    3. For each kid in kids, add an entry to authz_tokens with the key kid and the value being authz_token.

  7. For each request in pending_license_requests:

    1. If the DRM system configuration from request contains an authorization service URL but there is no entry in authz_tokens keyed on the default_KID from request, skip to the next loop iteration.

      • This occurs when an authorization token is required but cannot be obtained for this license request.

    2. Execute an HTTP POST request with the following parameters:

      • Request body is the license request body from request.

      • Request URL is defined by DRM system configuration. If multiple license server URLs are defined, select a random URL from the set.

      • If authz_tokens contains an entry with the key being the default_KID from request, add the Authorization header with the value being the string Bearer concatenated with a space and the authorization token from authz_tokens (e.g. Bearer aaa.bbb.ccc).

    3. If the response status code indicates failure:

      1. Expel the used authorization token (if any) from the DASH client cache to force a new token to be used for any future license requests.

      2. If the DASH client believes that retrying the license request might succeed (e.g. because the response indicates that the error might be transient or due to an expired authorization token that can be renewed), add request to retry_requests.

      3. Make a note of any error information for later processing and presentation to the user.

      4. Skip to the next loop iteration.

    4. Submit the HTTP response body to the DRM system for processing.

      • This may cause the DRM system to trigger additional license requests. Append any triggered request to pending_license_requests and copy the DRM system configuration from the current entry, processing the additional entry in a future iteration of the same loop.

      • If the DRM system indicates a failure to process the data, make a note of any error information for later processing and skip to the next loop iteration.

  8. If retry_requests is not empty, re-execute this workflow with retry_requests as the input.

While the above algorithm is presented sequentially, authorization requests and license requests may be performed in a parallelized manner to minimize processing time.

At the end of this algorithm, all pending license requests have been performed. However, it is not necessary that all license requests or authorization requests succeed! For example, even if one of the requests needed to obtain an HD quality level content key fails, other requests may still make SD quality level content keys available, leading to a successful playback if the HD quality level is deselected by the DASH client. Individual failing requests therefore do not indicate a fatal error. Rather, such error information should be collected and provided to the top-level error handler of the DRM system activation workflow, which can make use of this data to present user-friendly messages if it decides that meaningful playback cannot take place with the final set of available content keys. See also § 12.7.3.1 Handling unavailability of content keys.

12.7.4.1. Efficient license acquisition

In some situations a DASH client can foresee the need to make new content keys available for use or to renew the licenses that enable content keys to be used. For example:

DASH clients SHOULD perform license acquisition ahead of time, activating a DRM system before it is needed or renewing licenses before they expire. This provides the following benefits:

To avoid a huge number of concurrent license requests causing license server overload, a DASH client SHOULD perform a license request at a randomly selected time between the moment when it became aware of the need for the license request and the time when the license must be provided to a DRM system (minus some safety margin).

Multiple license requests to the same license server with the same authorization token SHOULD be batched into a single request if the media platform API supports this. See § 12.7.3 Activating the DRM system for details.

The possibility for ahead-of-time DRM system activation, seamless license renewal and license request batching depends on the specific DRM system and media platform implementations. Some implementations might not support optimal behavior.

12.8. Periodic re-authorization

In a live DASH presentation the rights of the user can be different for different programs included in the presentation. This chapter describes recommended mechanisms for forcing rights to be re-evaluated at program boundaries.

The user’s level of access to content is governed by the issuance (or not) of licenses with content keys and the policy configuration carried by the licenses. The license server is the authority on what rights are assigned to the user. To force re-evaluation of rights, a service must force a new license request to be made. This can be accomplished by:

  1. Defining an expiration time on the license.

  2. Changing the content key to one that is not yet available to DASH clients, thereby triggering DRM system activation for the new content key.

Not every DRM system supports real-time license expiration - some widely used implementations only check license validity at activation time. Therefore the latter option is a more universally applicable method to force re-evaluation of access rights. As changing the content key is only possible on DASH period boundaries, live DASH presentations SHOULD create a new period in which content is encrypted with new content keys to force re-evaluation of user’s access rights.

Note: Changing the content keys does not increase the cryptographic security of content protection. The term periodic re-authorization is therefore used here instead of key rotation, to maintain focus on the goal and not the mechanism.

12.9. Controlling access rights with a key hierarchy

Using a key hierarchy allows a single content key to selectively unlock only a subset of a DASH presentation and apply license policy updates without the need to perform license requests at every program boundary. This mechanism is a specialization of periodic re-authorization for scenarios where license requests at program boundaries are not always desirable or possible.

A key hierarchy establishes a DRM system specific relationship between a root key and a set of leaf keys.

A key hierarchy defines a multi-level structure of cryptographic keys, instead of a single content key:

A root key might not be an actual cryptographic key. Rather, it acts as a reference to identify the set of leaf keys that protect content. A DASH client requesting a license for a specific root key will be interpreted as requesting a license that makes available all the leaf keys associated with that root key.

Note: Intermediate layers of cryptographic keys may also exist between root keys and leaf keys but such layers are DRM system specific and only processed by the DRM system, being transparent to the DASH client and the media platform. To a DASH client, only the root keys have meaning. To the media platform, only the leaf keys have meaning.

This layering enables the user’s rights to content to be evaluated in two ways:

  1. Changing the root key invokes the full re-evaluation workflow as a new license request must be made by the DASH client.

  2. Changing the leaf key invokes an evaluation of the rights granted by the license for the root key and processing of any additional policy attached to the leaf key. If result of this evaluation indicates the leaf key cannot be used, the DRM system will signal playback failure to the DASH client.

Changing the root key is equivalent to changing the content key in terms of MPD signaling, requiring a new period to be started. The leaf key can be changed in any media segment and does not require modification of the MPD. Leaf keys SHOULD NOT be changed within the same program. Changing leaf keys on a regular basis does not increase cryptographic security.

Note: A DASH service with a key hierarchy is sometimes referred to as using "internal key rotation".

The mechanism by which a set of leaf keys is made available based on a request for a root key is DRM system specific. Nevertheless, different DRM systems may be interoperable as long as they can each make available the required set of leaf keys using their system-specific mechanisms, using the same root key as the identifier for the same set of leaf keys.

When using a key hierarchy, the leaf keys are typically delivered in-band in the media segments, using moof/pssh boxes, together with additional/updated license policy constraints. The exact implementation is DRM system specific and transparent to a DASH client.

Different rows indicate root key changes. Color alternations indicate leaf key changes. A key hierarchy enables per-program access control even in scenarios where a license request is only performed once per day. The single license request makes available all the leaf keys that the user is authorized to use during the next epoch.

A key hierarchy is useful for broadcast scenarios where license requests are not possible at arbitrary times (e.g. when the system operates by performing nightly license updates). In such a scenario, this mechanism enables user access rights to be cryptographically enforced at program boundaries, defined on the fly by the service provider, while re-evaluating the access rights during moments when license requests are possible. At the same time, it enables the service provider to supply in-band updates to license policy (when supported by the DRM system).

Similar functionality could be implemented without a key hierarchy by using a separate content key for each program and acquiring all relevant licenses in advance. The advantages of a key hierarchy are:

12.10. Use of W3C Clear Key with DASH

Clear Key is a DRM system defined by W3C in [encrypted-media]. It is intended primarily for client and media platform development/test purposes and does not perform the content protection and content key protection duties ordinarily expected from a DRM system. Nevertheless, in DASH client DRM workflows, it is equivalent to a real DRM system.

A DRM system specific ContentProtection descriptor for Clear Key SHALL use the system ID e2719d58-a985-b3c9-781a-b030af78d30e and value="ClearKey1.0".

The dashif:laurl element SHOULD be used to indicate the license server URL. Legacy content MAY also use an equivalent Laurl element from the http://dashif.org/guidelines/clearKey namespace, as this was defined in previous versions of this document (the definition is now expanded to also cover non-clearkey scenarios). Clients SHOULD process the legacy element if it exists and dashif:laurl does not.

The license request and response format is defined in [encrypted-media].

W3C describes the use of the system ID 1077efec-c0b2-4d02-ace3-3c1e52e2fb4b in [eme-initdata-cenc] section 4 to indicate that tracks are encrypted with Common Encryption. However, the presence of this "common" pssh box does not imply that Clear Key is to be used for decryption. DASH clients SHALL NOT interpret a pssh box with the system ID 1077efec-c0b2-4d02-ace3-3c1e52e2fb4b as an indication that the Clear Key mechanism is to be used (nor as an indication of anything else beyond the use of Common Encryption).

An example of a Clear Key ContentProtection descriptor using laurl is as follows.

<MPD xmlns="urn:mpeg:dash:schema:mpd:2011" xmlns:dashif="https://dashif.org/">
	<Period>
		<AdaptationSet>
			<ContentProtection schemeIdUri="urn:uuid:e2719d58-a985-b3c9-781a-b030af78d30e" value="ClearKey1.0">
				 <dashif:laurl>https://clearKeyServer.foocompany.com</dashif:laurl>
				 <dashif:laurl>file://cache/licenseInfo.txt</dashif:laurl>
			</ContentProtection>
		</AdaptationSet>
	</Period>
</MPD>

Parts of the MPD structure that are not relevant for this chapter have been omitted - this is not a fully functional MPD file.

12.11. XML Schema for DASH-IF MPD extensions

The namespace for the DASH-IF MPD extensions is https://dashif.org/. This document refers to this namespace using the dashif prefix. The XML schema of the extensions is:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:dashif="https://dashif.org/"
    targetNamespace="https://dashif.org/">

    <xs:element name="laurl" type="xs:anyURI"/>
    <xs:element name="authzurl" type="xs:anyURI"/>
</xs:schema>

13. Annex B

Merge Annex B from 4.3 to live services chapter (if not already duplicated).

14. Annex: Dolby Vision streams within ISO BMFF

Where is this used? Why is it an annex? Consider restructuring to improve usefulness.

This Annex defines the structures for the storage of Dolby Vision video streams in a file format compliant with the ISO base media file format (ISOBMFF). Example file formats derived from the ISOBMFF include the Digital Entertainment Content Ecosystem (DECE) Common File Format (CFF) and Protected Interoperable File Format (PIFF). Note, that the file format defined here is intended to be potentially compliant with the DECE media specifications as appropriate.

The Dolby Vision decoder configuration record provides the configuration information that is required to initialize the Dolby Vision decoder.

The Dolby Vision Configuration Box contains the following information:

Box Type ‘dvcC’

Container  DolbyVisionHEVCSampleEntry( ‘dvhe’), or DolbyVisionHVC1SampleEntry( ‘dvh1’), or

Mandatory Yes

Quantity Exactly One

The syntaxes of the Dolby Vision Configuration Box and decoder configuration record are described below.

align(8) class DOVIDecoderConfigurationRecord
{
    unsigned int (8) dv_version_major;
    unsigned int (8) dv_version_minor;
    unsigned int (7) dv_profile;
    unsigned int (6) dv_level;
    bit (1) dv_metadata_present_flag;
    bit (1) el_present_flag;
    bit (1) bl_present_flag;
    const unsigned int (32)[5] reserved = 0;
}

class DOVIConfigurationBox extends Box(‘dvcC’)
{
    DOVIDecoderConfigurationRecord() DOVIConfig;
}

The semantics of the Dolby Vision decoder configuration record is described as follows.

dv_version_major - specifies the major version number of the Dolby Vision specification that the stream complies with. A stream compliant with this specification shall have the value 1.

dv_version_minor - specifies the minor version number of the Dolby Vision specification that the stream complies with. A stream compliant with this specification shall have the value 0.

dv_profile – specifies the Dolby Vision profile. Valid values are Profile IDs as defined in Table B.1 of Signaling Dolby Vision Profiles and Levels, Annex B.

dv_level – specifies the Dolby Vision level. Valid values are Level IDs as defined in Table B.2 of Signaling Dolby Vision Profiles and Levels, Annex B.

dv_metadata_present_flag – if 1 indicates that this track contains the supplemental enhancement information as defined in clause 10.4.2.2.

el_present_flag – if 1 indicates that this track contains the EL HEVC video substream.

bl_present_flag – if 1 indicates that this track contains the BL HEVC video substream.

Note: The settings for these semantic values are specified in Section A.7.1 Constraints on EL Track.

This section describes the Dolby Vision sample entries. It is used to describe tracks that contain substreams that cannot necessarily be decoded by HEVC compliant decoders.

The Dolby Vision sample entries contain the following information:

Box Type 	 ‘dvhe’, ’dvh1’

Container 	Sample Description Box (‘stsd’)

Mandatory 	Yes

Quantity 		One or more sample entries of the same type may be present

The syntax for the Dolby Vision sample entries are described below.

class DolbyVisionHEVCSampleEntry() extends HEVCSampleEntry(‘dvhe’)
{
    DOVIConfigurationBox() config;
}

class DolbyVisionHVC1SampleEntry() extends HEVCSampleEntry(‘dvh1’)
{
    DOVIConfigurationBox() config;
}

A Dolby Vision HEVC sample entry shall contain a Dolby Vision Configuration Box as defined in C.2.2.

config - specifies the configuration information required to initialize the Dolby Vision decoder for a Dolby Vision EL track encoded in HEVC.

Compressorname in the base class VisualSampleEntry indicates the name of the compressor used, with the value “\013DOVI Coding” being recommended (\013 is 11, the length of the string “DOVI coding” in bytes).

The brand ‘dby1’ SHOULD be used in the compatible_brands field to indicate that the file is compliant with all Dolby Vision UHD Extension as outlined in this document. The major_brand shall be set to the ISO-defined brand,e.g. ‘iso6’.

A Dolby Vision video stream can be encapsulated in a single file as a dual-track file containing separate BL and EL tracks. Each track has different sample descriptions.

For the visual sample entry box in an EL track a DolbyVisionHEVCVisualSampleEntry (dvhe) or DolbyVisionHVC1VisualSampleEntry (dvh1) SHALL be used.

The visual sample entries SHALL contain an HEVC Configuration Box (hvcC) and a Dolby Vision Configuration Box (dvcC).

The EL track shall meet the following constraints:

The following table shows the box hierarchy of the EL track.

Sample table box hierarchy for the EL track of a dual-track Dolby Vision file

Note: This is not an exhaustive list of boxes.

For a dual-track file, the movie fragments carrying the BL and EL shall meet the following constraints:

The track fragment random access box (tfra) for the base and enhancement track shall conform to the ISO/IEC 14496-12 (section 8.8.10) and meet the following additional constraint:

15. Annex: Signaling Dolby Vision profiles and levels

Where is this used? Why is it an annex? Consider restructuring to improve usefulness.

This Annex defines the detailed list of Dolby Vision profile/levels and how to represent them in a string format. This string can be used for identifying Dolby Vision device capabilities and identifying the type of the Dolby Vision streams presented to device through various delivery mechanisms such as HTML 5.0 and MPEG-DASH.

The Dolby Vision codec provides a rich feature set to support various ecosystems such as Over the Top streaming, Broadcast television, Blu-Ray discs, and OTT streaming. The codec also supports many different device implementation types such as GPU accelerated software implementation, full-fledged hardware implementation, and hardware plus software combination. One of the Dolby Vision codec features allows choosing the type of backward compatibility such as non-backward compatible or backward compatible with SDR. A Dolby Vision capable device may not have all the features or options implemented, hence it is critical the device advertises the capabilities and content server provides accurate Dolby vision stream type information.

Following are the currently supported Dolby Vision profiles:

Dolby Vision profiles

Legend:

BL:EL

ratio of Base Layer resolution to Enhancement Layer resolution (when applicable)

BL/EL Full alignment

The Enhancement Layer (EL) GOP and Sub-GOP structures are fully aligned with Base Layer (BL), i.e. the BL/EL IDRs are aligned, BL/EL frames are fully aligned in decode order such that skipping or seeking is possible anywhere in the stream not only limited to IDR. BL AU and EL AU belonging to the same picture shall have the same POC (picture order count)

Encoder Recommendations

The following is the profile string naming convention: dv[BL codec type].[number of layers][bit depth][backward compatibility] [EL codec type][EL codec bit depth]

Components of a Dolby Vision @codecs string.

Notes:

  1. [EL codec type] and [EL codec bit depth] shall only be present if the EL codec type is different from the BL codec.

  2. Interlaced: There is no support for interlaced video at this time.

  3. Codecs other than HEVC or AVC may be supported in future.

The Dolby Vision level indicates the maximum frame rate and resolution supported by the device for a given profile. Typically there is a limit on the maximum number of pixels the device can process per second in a given profile; the level indicates the maximum pixels and the maximum bitrate supported in that profile. Since maximum pixels per second is a constant for given level, the resolution can be reduced to get higher frame rate and vice versa. Following are the possible levels:

Dolby Vision levels.

The following is the level string naming convention: [resolution][fps][high tier]

Components of a Dolby Vision level string.

The profile and level string is recommended to be joined in the following manner: Format: [Profile String].[Level String]

Examples

dvav.per.fhd30

dual layer avc 8 bit with enforcement of BL/EL GOP Structure and POC alignment, rec709 backwards compatible, 1920x1080@30fps

dvhe.stn.uhd30

single layer hevc 10 bit non-backwards compatible, 3840x2160@30fps

The device capabilities can be expressed in many ways depending on the protocol used by the streaming service or VOD service. The device could maintain a list of supported capabilities in an array:

String capabilities [] = {“dvhe.dtr.uhd24”, “dvhe.stn.uhd30”}

After receiving the manifest the Player could iterate over the stream types and check whether a stream type is supported by searching the capabilities[].

When using HTTP, the device could send the capabilities via the user agent string in HTTP request in following manner:

Opera/9.80 (Linux armv71) Presto/2.12.407 Version/12.51 Model-UHD+dvhe.dtr.uhd24+dvhe.stn.uhd30/1.0.0 (Manufacturer name, Model)

A server program can search for +dv to determine whether Dolby Vision is supported and further identify the profiles and level supported by parsing the characters following the +dv. Multiple profiles/level pairs can be listed with + beginning each profile/level pair.

16. Annex: Display management message

Where is this used? Why is it an annex? Consider restructuring to improve usefulness.

A display management (DM) message contains metadata in order to provide dynamic information about the colour volume of the video signal. This metadata can be employed by the display to adapt the delivered HDR imagery to the capability of the display device. The information conveyed in this message is intended to be adequate for purposes corresponding to the use of Society of Motion Picture and Television Engineers ST 2094-1 and ST 2094-10.

The syntax and semantics for DM_data() are defined in clause C.2.

DM_data()
ext_dm_data_block()
ext_dm_data_block_payload()

This clause defines the semantics for DM_data().

For the purposes of the present clause, the following mathematical functions apply:

app_identifier

identifies an application in the ST 2094 suite.

app_version

specifies the application version in the application in the ST 2094 suite.

metadata_refresh_flag

when set equal to 1 cancels the persistence of any previous extended display mapping metadata in output order and indicates that extended display mapping metadata follows. The extended display mapping metadata persists from the coded picture to which the SEI message containing DM_data() is associated (inclusive) to the coded picture to which the next SEI message containing DM_data() and with metadata_refresh_flag set equal to 1 in output order is associated (exclusive) or (otherwise) to the last picture in the coded video seqeunce (inclusive). When set equal to 0 this flag indicates that the extended display mapping metadata does not follow.

num_ext_blocks

specifies the number of extended display mapping metadata blocks. The value shall be in the range of 1 to 254, inclusive.

dm_alignment_zero_bit

shall be equal to 0

ext_block_length[ i ]

is used to derive the size of the i-th extended display mapping metadata block payload in bytes. The value shall be in the range of 0 to 1023, inclusive.

ext_block_level[ i ]

specifies the level of payload contained in the i-th extended display mapping metadata block. The value shall be in the range of 0 to 255, inclusive. The corresponding extended display mapping metadata block types are defined in Table E.1.4. Values of ext_block_level[ i ] that are ATSC reserved shall not be present in the bitstreams conforming to this version of ATSC specification. Blocks using ATSC reserved values shall be ignored.

When the value of ext_block_level[ i ] is set equal to 1, the value of ext_block_length[ i ] shall be set equal to 5.

When the value of ext_block_level[ i ] is set equal to 2, the value of ext_block_length[ i ] shall be set equal to 11.

When the value of ext_block_level[ i ] is set equal to 5, the value of ext_block_length[ i ] shall be set equal to 7.

Definition of extended display mapping metadata block type.

When an extended display mapping metadata block with ext_block_level equal to 5 is present, the following constraints shall apply:

When the active area defined by the current extended display mapping metadata block with ext_block_level equal to 5 overlaps with the active area defined by preceding extended display mapping metadata blocks with ext_block_level equal to 5, all metadata of the extended display mapping metadata blocks with ext_block_level equal to 1 or 2 associated with the current extended display mapping metadata block with ext_block_level equal to 5 shall be applied to the pixel values of the overlapping area.

min_PQ specifies the minimum luminance value of the current picture in 12-bit PQ encoding. The value shall be in the range of 0 to 4095, inclusive. Note that the 12-bit min_PQ value with full range is calculated as follows:

min_PQ = Clip3(0, 4095, Round(Min * 4095))

where Min is MinimumPqencodedMaxrgb as defined in clause 6.1.3 of SMPTE ST 2094-10.

max_PQ specifies the maximum luminance value of current picture in 12-bit PQ encoding. The value shall be in the range of 0 to 4095, inclusive. Note that the 12-bit max_PQ value with full range is calculated as follows:

max_PQ = Clip3(0, 4095, Round(Max * 4095))

where Max is MaximumPqencodedMaxrgb as defined in clause 6.1.5 of SMPTE ST 2094-10.

avg_PQ specifies the midpoint luminance value of current picture in 12-bit PQ encoding. The value shall be in the range of 0 to 4095, inclusive. Note that the 12-bit avg_PQ value with full range is calculated as follows:

avg_PQ = Clip3(0, 4095, Round(Avg * 4095))

where Avg is AveragePqencodedMaxrgb as defined in section 6.1.4 of SMPTE ST 2094-10.

target_max_PQ specifies the maximum luminance value of a target display in 12-bit PQ encoding. The value shall be in the range of 0 to 4095, inclusive. The target_max_PQ is the PQ encoded value of TargetedSystemDisplayMaximumLuminance as defined in clause 10.4 of SMPTE ST 2094-1.

If there is more than one extended display mapping metadata block with ext_block_level equal to 2, those blocks shall have no duplicated target_max_PQ.

trim_slope specifies the slope metadata. The value shall be in the range of 0 to 4095, inclusive. If trim_slope is not present, it shall be inferred to be 2048. Note that the 12-bit slope value is calculated as follows:

𝑡𝑟𝑖𝑚_𝑠𝑙𝑜𝑝𝑒 = Clip3(0, 4095, Round((𝑆-0.5) * 4096))

where S is the ToneMappingGain as defined in clause 6.2.3 of SMPTE ST 2094-10.

trim_offset specifies the offset metadata. The value shall be in the range of 0 to 4095, inclusive. If trim_offset is not present, it shall be inferred to be 2048. Note that the 12-bit offset value is calculated as follows:

𝑡𝑟𝑖𝑚_𝑜𝑓𝑓𝑠𝑒𝑡 = Clip3(0, 4095, Round((𝑂+0.5) * 4096))

where O is the ToneMappingOffset as defined in clause 6.2.2 of SMPTE ST 2094-10.

trim_power specifies the power metadata. The value shall be in the range of 0 to 4095, inclusive. If trim_power is not present, it shall be inferred to be 2048. Note that the 12-bit power value is calculated as follows:

𝑡𝑟𝑖𝑚_𝑝𝑜𝑤𝑒𝑟 = Clip3(0, 4095, Round((𝑃-0.5) * 4096))

where P is the ToneMappingGamma as defined in clause 6.2.4 of SMPTE ST 2094-10.

trim_chroma_weight specifies the chroma weight metadata. The value shall be in the range of 0 to 4095, inclusive. If trim_chroma_weight is not present, it shall be inferred to be 2048. Note that the 12-bit chroma weight value is calculated as follows:

𝑡𝑟𝑖𝑚_𝑐ℎ𝑟𝑜ma_𝑤𝑒𝑖𝑔ℎ𝑡 = Clip3(0, 4095, Round((𝐶𝑊+0.5) * 4096))

where CW is the ChromaCompensationWeight as defined in clause 6.3.1 of SMPTE ST 2094-10.

trim_saturation_gain specifies the saturation gain metadata. The value shall be in the range of 0 to 4095, inclusive. If trim_saturation_gain is not present, it shall be inferred to be 2048. Note that the 12-bit saturation gain value is calculated as follows:

𝑡𝑟𝑖𝑚_𝑠𝑎𝑡𝑢𝑟𝑎𝑡𝑖𝑜𝑛_𝑔𝑎𝑖𝑛 = Clip3(0, 4095, Round((𝑆𝐺+0.5) * 4096))

where SG is the SaturationGain as defined in clause 6.3.2 of SMPTE ST 2094-10.

ms_weight specifies the multiscale weight metadata. The value shall be in the range of -1 to 4095, inclusive. If ms_weight is not present, it shall be inferred to be 2048. Where ms_weight is equal to -1, the bit stream indicates ms_weight is unspecified. The 13-bit multiscale weight value is calculated as follows:

𝑚𝑠_𝑤𝑒𝑖𝑔ℎ𝑡 = -1 OR Clip3(0, 4095, Round(𝑀S * 4096))

where MS is the ToneDetailFactor as defined in clause 6.4.2 of SMPTE ST 2094-10.

active_area_left_offset, active_area_right_offset, active_area_top_offset, active_area_bottom_offset specify the active area of current picture, in terms of a rectangular region specified in picture coordinates for active area. The values shall be in the range of 0 to 8191, inclusive. See also UpperLeftCorner and LowerRightCorner definitions in ST 2094-1.

If active_area_left_offset, active_area_right_offset, active_area_top_offset, active_area_bottom_offset are not present, they shall be inferred to be 0.

The coordinates of top left active pixel is derived as follows:

Xtop_left = active_area_left_offset

Ytop_left = active_area_top_offset

The coordinates of top left active pixel are defined as the UpperLeftCorner in clause 9.2 of SMPTE ST.2094-1.

With Xsize is the horizontal resolution of the current picture and Ysize is the vertical resolution of current picture, the coordinates of bottom right active pixel are derived as follows:

Xbottom_right = XSize - 1 - active_area_right_offset

Ybottom_right = YSize - 1 - active_area_bottom_offset

where Xbottom_right greater than Xtop_left and Ybottom_right greater than Ytop_left.

The coordinates of bottom right active pixel are defined as the LowerRightCorner in clause 9.3 of SMPTE ST.2094-1.

ext_dm_alignment_zero_bit shall be equal to 0.

17. Annex: Composing metadata message

Where is this used? Why is it an annex? Consider restructuring to improve usefulness.

A composing metadata (CM) message contains the metadata which is needed to apply the post-processing process as described in the ETSI ETCCM specification to recreate the HDR UHDTV pictures.

The syntax for CM_data() is shown in table D.1. The number of bits “v” used to represent each of the syntax elements of CM_data(), for which the parsing process is specified by the descriptor u(v), is defined in table D.2.

CM_data()
Specification of number of bits "v" for CM_data() syntax elements with descriptor u(v)

The definitions of the header parameter values are contained in ETCCM, Section 5.3.2, “CM Header Parameter Definitions”.

The definitions of the mapping parameter values are contained in ETCCM, Section 5.3.3, “CM Mapping Parameter Definitions”.

Parameter cm_alignment_zero_bit shall be equal to 0.

18. Annex: Sample Dual-layer MPD

Where is this used? Why is it an annex? Consider restructuring to improve usefulness.

Below is an example dual-layer MPD, with dual adaptation sets – both a Base layer and an Enhancement Layer. Items of note are highlighted:

<Period>
    <AdaptationSet subsegmentAlignment="true"  subsegmentStartsWithSAP="1" frameRate="24000/1001">
      <Representation mimeType="video/mp4" codecs=" hvc1.2.100000000.L150.B0" id="base-layer"
          bandwidth="14156144" width="3840" height="2160">
        <BaseURL>BL_dual_track_BC.mp4</BaseURL>
        <SegmentBase indexRange="795-1210">
         <Initialization range="0-794"/>
        </SegmentBase>
      </Representation>
      <Representation mimeType="video/mp4" codecs="dvhe.dtr" id="enhancement-layer"
            dependencyId="base-layer" bandwidth="3466528" width="1920" height="1080">
        <BaseURL>EL_dual_track_BC.mp4</BaseURL>
        <SegmentBase indexRange="704-1119">
         <Initialization range="0-703"/>
        </SegmentBase>
      </Representation>
    </AdaptationSet>
    <AdaptationSet mimeType="audio/mp4" codecs="ec-3" lang="und"
        subsegmentAlignment="true" subsegmentStartsWithSAP="1">
      <Representation id="2" bandwidth="192000">
        <AudioChannelConfiguration
          schemeIdUri="tag:dolby.com,2014:dash:audio_channel_configuration:2011" value="F801"/>
        <BaseURL>audio.mp4</BaseURL>
        <SegmentBase indexRange="652-875">
         <Initialization range="0-651"/>
        </SegmentBase>
      </Representation>
    </AdaptationSet>
  </Period>
</MPD>

19. Externally defined terms

adaptation set

See [MPEGDASH]

asset identifier

See [MPEGDASH]

CMAF track file

See [MPEGCMAF]

essential property descriptor

See [MPEGDASH]

index segment

See [MPEGDASH]

initialization segment

See [MPEGDASH]

supplemental property descriptor

See [MPEGDASH]

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

References

Normative References

[CEA708]
Digital Television (DTV) Closed Captioning CEA-708-E. URL: https://standards.cta.tech/kwspub/published_docs/ANSI-CTA-708-E-Preview.pdf
[Dolby-TrueHD]
MLP (Dolby TrueHD) streams within the ISO Base Media File Format, version 1.0, September 2009.
[DolbyVision-ISOBMFF]
Dolby Vision streams within the ISO base media file format. URL: https://www.dolby.com/us/en/technologies/dolby-vision/dolby-vision-bitstreams-within-the-iso-base-media-file-format.pdf
[DTS9302J81100]
Implementation of DTS Audio in Media Files Based on ISO/IEC 14496.
[DTS9302K62400]
Implementation of DTS Audio in Dynamic Adaptive Streaming over HTTP (DASH).
[DVB-DASH]
ETSI TS 103 285 V1.2.1 (2018-03): Digital Video Broadcasting (DVB); MPEG-DASH Profile for Transport of ISO BMFF Based DVB Services over IP Based Networks. March 2018. Published. URL: http://www.etsi.org/deliver/etsi_ts/103200_103299/103285/01.02.01_60/ts_103285v010201p.pdf
[EME-INITDATA-CENC]
David Dorwin; et al. "cenc" Initialization Data Format. 15 September 2016. NOTE. URL: https://www.w3.org/TR/eme-initdata-cenc/
[ENCRYPTED-MEDIA]
David Dorwin; et al. Encrypted Media Extensions. 18 September 2017. REC. URL: https://www.w3.org/TR/encrypted-media/
[ETSI102114]
ETSI TS 102 114: DTS Coherent Acoustics; Core and Extensions with Additional Profiles. URL: https://www.etsi.org/deliver/etsi_ts/102100_102199/102114/01.04.01_60/ts_102114v010401p.pdf
[ETSI102366]
ETSI TS 102 366: Digital Audio Compression (AC-3, Enhanced AC-3) Standard. URL: https://www.etsi.org/deliver/etsi_ts/102300_102399/102366/01.04.01_60/ts_102366v010401p.pdf
[ETSI103190-1]
ETSI TS 103 190-1 V1.3.1 (2018-02): Digital Audio Compression (AC-4) Standard; Part 1: Channel based coding. February 2018. Published. URL: http://www.etsi.org/deliver/etsi_ts/103100_103199/10319001/01.03.01_60/ts_10319001v010301p.pdf
[ETSI103433-1]
ETSI TS 103 433-1: High-Performance Single Layer High Dynamic Range (HDR) System for use in Consumer Electronics devices; Part 1: Directly Standard Dynamic Range (SDR) Compatible HDR System (SL-HDR1). URL: https://www.etsi.org/deliver/etsi_ts/103400_103499/10343301/01.02.01_60/ts_10343301v010201p.pdf
[ISO14496-15]
Information technology — Coding of audio-visual objects — Part 15: Carriage of network abstraction layer (NAL) unit structured video in the ISO base media file format. September 2019. Published. URL: https://www.iso.org/standard/74429.html
[ISO14496-3-2009-AMD4-2013]
Information technology — Coding of audio-visual objects — Part 3: Audio — Amendment 4: New levels for AAC profiles. December 2013. Published. URL: https://www.iso.org/standard/63022.html
[ISO14496-30]
Information technology — Coding of audio-visual objects — Part 30: Timed text and other visual overlays in ISO base media file format. November 2018. Published. URL: https://www.iso.org/standard/75394.html
[ISO23000-19-2018-AMD2-2019]
Information technology — Multimedia application format (MPEG-A) — Part 19: Common media application format (CMAF) for segmented media — Amendment 2: XHE-AAC and other media profiles. January 2019. Published. URL: https://www.iso.org/standard/74442.html
[ISO23001-8]
Information technology — MPEG systems technologies — Part 8: Coding-independent code points. May 2016. Withdrawn. URL: https://www.iso.org/standard/69661.html
[ISO23003-1]
Information technology — MPEG audio technologies — Part 1: MPEG Surround. February 2007. Published. URL: https://www.iso.org/standard/44159.html
[ISO23008-3]
Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 3: 3D audio. February 2019. Published. URL: https://www.iso.org/standard/74430.html
[ISOBMFF]
Information technology — Coding of audio-visual objects — Part 12: ISO Base Media File Format. December 2015. International Standard. URL: http://standards.iso.org/ittf/PubliclyAvailableStandards/c068960_ISO_IEC_14496-12_2015.zip
[ITU-R-BT.709]
Recommendation ITU-R BT.709-6 (06/2015), Parameter values for the HDTV standards for production and international programme exchange. 17 June 2015. Recommendation. URL: https://www.itu.int/rec/R-REC-BT.709/
[JWS]
M. Jones; J. Bradley; N. Sakimura. JSON Web Signature (JWS). May 2015. Proposed Standard. URL: https://tools.ietf.org/html/rfc7515
[JWT]
M. Jones; J. Bradley; N. Sakimura. JSON Web Token (JWT). 6 July 2012. Internet Draft. URL: https://tools.ietf.org/html/draft-ietf-oauth-json-web-token-01
[MEDIA-FRAGS]
Raphaël Troncy; et al. Media Fragments URI 1.0 (basic). 25 September 2012. REC. URL: https://www.w3.org/TR/media-frags/
[MIXED-CONTENT]
Mike West. Mixed Content. 2 August 2016. CR. URL: https://www.w3.org/TR/mixed-content/
[MP4]
Information technology — Coding of audio-visual objects — Part 14: MP4 file format. Under development. URL: https://www.iso.org/standard/79110.html
[MPEG2TS]
Information technology — Generic coding of moving pictures and associated audio information — Part 1: Systems. June 2019. Published. URL: https://www.iso.org/standard/75928.html
[MPEGAAC]
Information technology — Coding of audio-visual objects — Part 3: Audio. September 2009. Published. URL: https://www.iso.org/standard/53943.html
[MPEGAVC]
Information technology — Coding of audio-visual objects — Part 10: Advanced video coding. Under development. URL: https://www.iso.org/standard/75400.html
[MPEGCENC]
Information technology — MPEG systems technologies — Part 7: Common encryption in ISO base media file format files. February 2016. Published. URL: https://www.iso.org/standard/68042.html
[MPEGCMAF]
Information technology — Multimedia application format (MPEG-A) — Part 19: Common media application format (CMAF) for segmented media. Under development. URL: https://www.iso.org/standard/79106.html
[MPEGDASH]
Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats. August 2019. Published. URL: https://www.iso.org/standard/75485.html
[MPEGDASHCMAFPROFILE]
N18641 WD of ISO/IEC 23009-1 4th edition AMD 1 Client event and timed metadata processing.
[MPEGHEVC]
Information technology — High efficiency coding and media delivery in heterogeneous environments — Part 2: High efficiency video coding. Under development. URL: https://www.iso.org/standard/75484.html
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[RFC7232]
R. Fielding, Ed.; J. Reschke, Ed.. Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests. June 2014. Proposed Standard. URL: https://httpwg.org/specs/rfc7232.html
[RFC7233]
R. Fielding, Ed.; Y. Lafon, Ed.; J. Reschke, Ed.. Hypertext Transfer Protocol (HTTP/1.1): Range Requests. June 2014. Proposed Standard. URL: https://httpwg.org/specs/rfc7233.html
[RFC7807]
M. Nottingham; E. Wilde. Problem Details for HTTP APIs. March 2016. Proposed Standard. URL: https://tools.ietf.org/html/rfc7807
[RFC8446]
E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.3. August 2018. Proposed Standard. URL: https://tools.ietf.org/html/rfc8446
[SCTE128-1]
ANSI/SCTE 128-1: AVC Video Constraints for Cable TelevisionPart 1 - Coding. URL: https://www.scte.org/SCTEDocs/Standards/ANSI_SCTE%20128-1%202018.pdf
Steven DeRose; Eve Maler; David Orchard. XML Linking Language (XLink) Version 1.0. 27 June 2001. REC. URL: https://www.w3.org/TR/xlink/

Informative References

[3GPP26.116]
3GPP TS 26.116 (03/2016): Television (TV) over 3GPP services; Video Profiles..
[ATSC3]
ATSC Standard: A/300:2017 “ATSC3.0 System”. URL: https://https://www.atsc.org/wp-content/uploads/2017/10/A300-2017-ATSC-3-System-Standard-1.pdf
[EBU-TT]
EBU TECH 3350: "EBU-TT Subtitling format definition". URL: https://tech.ebu.ch/docs/tech/tech3350.pdf
[ECMASCRIPT]
ECMAScript Language Specification. URL: https://tc39.es/ecma262/
[HLS-LowLatency]
Protocol Extension for Low-Latency HLS (Preliminary Specification). URL: https://developer.apple.com/documentation/http_live_streaming/protocol_extension_for_low-latency_hls_preliminary_specification
[ISO23001-12]
Information technology — MPEG systems technologies — Part 12: Sample variants. December 2018. Published. URL: https://www.iso.org/standard/74431.html
[MEDIA-SOURCE]
Matthew Wolenetz; et al. Media Source Extensions™. 17 November 2016. REC. URL: https://www.w3.org/TR/media-source/
[MSPR-EncryptionModes]
PlayReady Content Encryption Modes. URL: https://docs.microsoft.com/en-us/playready/packaging/content-encryption-modes
[SCTE214-1]
ANSI/SCTE 214-1 2016: MPEG DASH for IP-Based Cable Services Part 1: MPD Constraints and Extensions. 2016. URL: http://scte.org/SCTEDocs/Standards/ANSI_SCTE%20214-1%202016.pdf
[SMPTE2052-1-2013]
SMPTE ST 2052-1:2013 "Timed Text Format (SMPTE-TT)". URL: https://doi.org/10.5594/SMPTE.ST2052-1.2013
[SMPTE2052-10]
SMPTE 2025-10: Conversion from CEA-608 Data to SMPTE-TT. URL: https://www.smpte.org/sites/default/files/RP2052-10-2013.pdf
[SMPTE2052-11]
Conversion from CEA-708 Caption Data to SMPTE-TT. URL: https://www.smpte.org/sites/default/files/RP2052-11-2013.pdf
[TTML-IMSC1.1]
Pierre-Anthony Lemieux. TTML Profiles for Internet Media Subtitles and Captions 1.1. 8 November 2018. REC. URL: https://www.w3.org/TR/ttml-imsc1.1/
[TTML2]
Glenn Adams; Cyril Concolato. Timed Text Markup Language 2 (TTML2). 8 November 2018. REC. URL: https://www.w3.org/TR/ttml2/

Issues Index

Need to add a paragraph on interoperability on baseline, if we have any
We could benefit from some detailed examples here, especially as clock sync is such a critical element of live services.
What about period connectivity? #238
Update to match [MPEGDASH] 4th edition.
We need to clarify how to determine the right value for SAP_type. #235
Once we have a specific @earliestPresentationTime proposal submitted to MPEG we need to update this section to match. See #245. This is now done in [MPEGDASH] 4th edition - need to synchronize this text.
What exactly is metadata @codecs supposed to be? https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues/290
An illustration here would be very useful.
https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues/284
Why is the above a SHOULD? If it matters enough to signal, we should make it SHALL? https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues/286
This chapter already includes changes from #274
https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues/333
What do relative BaseURLs do? Do they just incrementally build up the URL? Or are they ignored? This algorithm leaves it unclear, only referencing absolute BaseURLs. We should make it explicit.
The text here is technically correct but could benefit from being reworded in a simpler and more understandable way. If anyone finds themselves with the time, an extra pass over this would be helpful.
We need to clarify how to determine the right value for startsWithSAP. #235
Add a reference here to help readers understand what are "IDS-like SAPs (i.e. SAPs of type 1 or 2)".
Allowing Representation@codecs to be absent might make it more difficult to make bitstream-switching-oblivious clients. If we require Representation@codecs to always be present, client developer life could be made simpler.
What is the above talking about?
This section could use another pass to make it easier to read.
How do leap seconds tie into this? See #161
https://github.com/Dash-Industry-Forum/DASH-IF-IOP/issues/274
insert reference to encryption.
This and everything below needs to be updated to conform to timing model
Needs proper Bikeshed formatting and references
Check and align references in original text.
Is there something that goes into more depth about 404s? These statements need a better home.
Needs to be checked for conformance with timing model.
Needs proper Bikeshed formatting and referencing
Needs deduplication of DASH concepts that are re-defined here.
What are "extensions"? Move this to features/constraints chapters?
Where does UHD fit? Why is it in a separate chapter? We should unify.
What is the correct scoping for the above requirement? Is the composition time requirement specific to H.264/H.265? Or does it apply to all bitstream switching video? Or does it apply to all bitstream switching, not only video?
What does the above requirement actually mean - what does an implementation have to do? Unclear right now.
ITU-T T.35 referenced above seems unrelated to the topic. What is the correct reference?
Is the payload_type sentence meant to be a requirement or a description of the referenced spec or what is the utility of this statement in IOP?
There is a bunch of stuff below with no obvious connection to UHD. Should this not also be in the non-UHD HEVC chapter?
The above paragraph on URL handling should be generalized to all sets of alternative URLs but there does not seem to be a suitable chapter in v4.3 If such a chapter is created in v5, we could replace the above paragraph with a reference to the general URL handling guidelines.
Let’s come up with a good set of useful problem types we can define here, to reduce the set of problem types that must be defined in solution-specific scope.
Merge Annex B from 4.3 to live services chapter (if not already duplicated).
Where is this used? Why is it an annex? Consider restructuring to improve usefulness.
Where is this used? Why is it an annex? Consider restructuring to improve usefulness.
Where is this used? Why is it an annex? Consider restructuring to improve usefulness.
Where is this used? Why is it an annex? Consider restructuring to improve usefulness.
Where is this used? Why is it an annex? Consider restructuring to improve usefulness.