DASH Player’s Application Events and Timed Metadata Processing Models and APIs (Community Review)

Living Document,

This version:
https://dashif.org/guidelines/Events-CR-v1.pdf
Issue Tracking:
GitHub
Editors:
DASH Industry Forum

CHANGE REQUEST

DASH-IF IOP

CR

rev

-

Current version:

V4.3

Status:

Draft

Internal Review

X

Community Review

Agreed


Title:

DASH Player’s Application Events and Timed Metadata Processing Models and APIs

Source:

DASH-IF IOP Event TF

Supporting Companies:

Qualcomm Incorporated, LG Electronics, Sony, Ericsson, Nomor Research, Unified Streaming, Tencent, <others>

Category:

A

Date:

2019-06-14

Use one of the following categories:
C
(correction)
A (addition of feature)
B (editorial modification)


Reason for change:

For the proper usage of Events and Timed Metadata distributed in DASH Media Presentations, APIs pertaining to subscription and notification delivery are beneficially defined between the DASH client and the application consuming the Events.

Summary of change:

Addition of a client processing model for Events

Consequences if not approved:

Inconsistent implementations

Sections affected:

New section X

Other comments:

This document contains several notes. Feedback during community review is welcome specifically on these topics.


Disclaimer:

This document is not yet final. It is provided for public review until the deadline mentioned below. If you have comments on the document, please submit comments by one of the following means:

Please add a detailed description of the problem and the comment.

Based on the received comments a final document will be published latest by the expected publication date below, integrated in a new version of DASH-IF IOP, if the following additional criteria are fulfilled:

  • All comments from community review are addressed
  • The relevant aspects for the Conformance Software are provided
  • Verified IOP test vectors are provided

Commenting Deadline:

July 31st, 2019

Expected Publication:

August 31st, 2019

1. DASH Player architecture for processing DASH events and timed metadata tracks

This Figure demonstrates a generic architecture of DASH Player including DASH Events and timed metadata tracks processing models.

DASH Player architecture including the inband Event and Application-related timed metadata handling

In the above figure:

  1. DASH Player processes the received MPD. The manifest information including the list of events schemes and values, and timed metadata track schemes are passed to Application.

  2. Application subscribes to the event and timed metadata track schemes in which it is interested, with the desired dispatch mode.

  3. If the manifest includes any MPD Events, the DASH Player parses them and appends them to the Event & Timed Metadata Buffer.

  4. Based on the MPD, the DASH Player manages the fetching and parsing of the Segments before appending them to the Media Buffer.

  5. Parsing a Segment includes:

    1. Parsing the high-level boxes such as Segment Index (sidx) and Event Message boxes, and appending Event Message boxes to the Event & Metadata Buffer.

    2. For an Application-related timed metadata track, extracting the data samples, and appending them to Event & Metadata Buffer.

    3. For media segments, parsing the segments and appending them to the Media Buffer.

  6. Event & Metadata Buffer is a FIFO buffer, passing the events and timed metadata samples to Event & Metadata Synchronizer and Dispatcher function.

  7. The DASH Player-specific Events are dispatched to DASH Player’s Control, Selection & Heuristic Logic, while the Application-related Events and timed metadata track samples are dispatched to the application as the following. If an Application is subscribed to a specific Event or timed metadata stream, dispatch the corresponding event instances or timed metadata samples, according to the dispatch mode:

    1. For on-receive dispatch mode, dispatch the Event information or timed metadata samples as soon as they are received (or no later than LAT).

    2. For on-start dispatch mode, dispatch the Event information or timed metadata samples at their associated presentation time, using the synchronization signal from the media decoder.

2. Event and Timed metadata sample timing models

2.1. Inband Event timing parameters

Figure 2 presents the timing of an inband Events along the media timeline:

The inband event timing parameter on the media timeline

As shown in Figure 2, every inband Event can be described by three timing parameters on the media timeline:

  1. Event Latest Arrival Time (LAT) which is the earliest presentation time of the Segment containing the Event Message box.

  2. Event Presentation/Start Time (ST) which is the moment in the media (MPD) timeline that the Event becomes active.

  3. Event duration (DU): the duration for which the Event is active

An inband Event is inserted in the beginning of a Segment. Since each media segment has an earliest presentation time equal to (LAT), LAT of the Segment carrying the Event Message box can be considered as the location of that box on the media timeline. DASH Player has to fetch and parse the Segment before or at its LAT (at LAT when it’s assumed that the decoding and rendering of the segment incurs practically zero delay). Therefore, the Event inserted in a Segment at its LAT time will be ready to be processed and fetched no later than LAT on the media timeline.

The second timing parameter is Event Presentation/Start Time (ST ). ST is the moment in the media timeline that the Event becomes active. This value can be calculated using the parameters included in the DashEventMessageBox.

The third parameter is Event Duration (DU ), the duration for which the Event is considered to be active. DU is also signaled in the Event Message box using a specific value.

2.2. Dash Event message box format and event timing parameters

Table 1 shows the DASHEventMessageBox emsg box format defined in MPEG DASH:

aligned(8) class DASHEventMessageBox extends FullBox (‘emsg’, version, flags = 0){
if (version==0) {
string scheme_id_uri;
string value;
unsigned int(32) timescale_v0;
unsigned int(32) presentation_time_delta;
unsigned int(32) event_duration;
unsigned int(32) id;
} else if (version==1) {
unsigned int(32) timescale_v1;
unsigned int(64) presentation_time;
unsigned int(32) event_duration;
unsigned int(32) id;
string scheme_id_uri;
string value;
}
unsigned int(8) message_data();
}

The emsg box format and parameters

Note: In the table above, parameters with timescale_v0 and timescale_v1 are same parameters. The additional suffixes are for purpose of clear refenencing in the equation below. These parameters are defined as timescale in [MPEGDASH].

The ST of an event can be calculated using values in its emsg box:

$$ST = \begin{cases} PeriodStart - \frac{SegmentBase@presentationTimeOffset}{SegmentBase@timescale} + LAT + \frac{presentation\_time\_delta}{timescale\_v0} \space \qquad version=0\\ PeriodStart - \frac{SegmentBase@presentationTimeOffset}{SegmentBase@timescale} + \frac{presentation\_time}{timescale\_v1}\qquad version=1 \end{cases} $$

Event Start Time of an inband event

Where PeriodStart is the corresponding Period‘s start time, and SegmentBase@presentationTimeoffset" and SegmentBase@timescale belong to the corresponding Represenation.

Note: ST is always equal to or larger than LAT in both versions of emsg.

Note: Since the media sample timescales might be different than emsg’s timescale, ST might not line up with a media sample if different timescales are used.

Note: If various Adaptation Sets carry the same events, different Adaptation Sets/Representations with different PTOs, the presentation_time_delta and/or presentation_time values might be different per Adaptation Set/Representation, i.e. the same emsg box can not be replicated over multiple Representations and/or Adaptations Sets. Therefore, the use of same PTOs cross Adaptation Sets/Representations which carry the same events is encouraged.

Note: In the case of CMAF, PeriodStart is the CMAF track’s earliest presentation time. If during the segment creation, this time is not known, it is recommeded to use the presentation_time_delta.

In this document, we use the following common variable names instead of some of above variables to harmonize parameters between Inband events, MPD events, and timed metadata samples:

2.3. MPD Events timing model

MPD Events carry a similar data model as inband Events. However, the former type is are carried in the MPD, under the Period elements. Each Period event has EventStream element(s), defining the EventStream@schemeIdUri, EventStream@value, EventStream@timescale and a sequences of Event elements. Each event may have Event@presentationTime, Event@duration, Event@id and Event@messageData attributes, as shown in Table 2.

Element or Attribute Name

Use

Description

 

EventStream

 

specifies event Stream

 

 

@xlink:href

O

specifies a reference to an external EventStream element

 

 

@xlink:actuate

OD

default:
onRequest

specifies the processing instructions, which can be either "onLoad" or "onRequest".

This attribute shall not be present if the @xlink:href attribute is not present.

 

 

@schemeIdUri

M

identifies the message scheme. The string may use URN or URL syntax. When a URL is used, it is recommended to also contain a month-date in the form mmyyyy; the assignment of the URL must have been authorized by the owner of the domain name in that URL on or very close to that date. A URL may resolve to an Internet location, and a location that does resolve may store a specification of the message scheme.

 

 

@value

O

specifies the value for the event stream element. The value space and semantics must be defined by the owners of the scheme identified in the @schemeIdUri attribute.

 

 

@timescale

O

specifies the timescale in units per seconds to be used for the derivation of different real-time duration values in the Event elements.

If not present on any level, it shall be set to 1.

 

 

@presentationTimeOffset

OD

Default: 0

specifies the presentation time offset of this Event Stream that aligns with the start of the Period. Any Event contained in this Event Stream is mapped to the Period timeline by using the Event presentation time adjusted by the value of the presentation time offset

The value of the presentation time offset in seconds is the division of the value of this attribute and the value of the @timescale attribute.

 

 

Event

0 ... N

specifies one event. For details see Table 35.

Events in Event Streams shall be ordered such that their presentation time is non-decreasing.

Key

For attributes: M=Mandatory, O=Optional, OD=Optional with Default Value, CM=Conditionally Mandatory

For elements: <minOccurs>...<maxOccurs> (N=unbounded)

Elements are bold; attributes are non-bold and preceded with an @.

 

 

Element or Attribute Name

Use

Description

 

 

 

Event

 

specifies an Event and contains the message of the event. The content of this element depends on the event scheme.  The contents shall be either:

��        A string, optionally encoded as specified by @contentEncoding

��        XML content using elements external to the MPD namespace

For new event schemes string content should be used, making use of Base 64 encoding if needed.

Note: The schema allows ��mixed�� content within this element however only string data or XML elements are permitted by the above options, not a combination.

 

 

 

 

@presentationTime

OD
default: 0

specifies the presentation time of the event relative to the start of the Period taking into account the @presentationTimeOffset of the Event Stream, if present.

The value of the presentation time in seconds is the division of the value of this attribute and the value of the @timescale attribute.

If not present, the value of the presentation time is 0.

 

 

 

 

@duration

O

specifies the presentation duration of the Event.

The value of the duration in seconds is the division of the value of this attribute and the value of the
@timescale attribute.

The interpretation of the value of this attribute is defined by the scheme owner.

If not present, the value of the duration is unknown.

 

 

 

 

@id

O

specifies an identifier for this instance of the event. Events with equivalent content and attribute values in the Event element shall have the same value for this attribute.

The scope of the @id for each Event is with the same @schemeIdURI and @value pair.

 

 

 

 

@contentEncoding

O

specifies whether the information in the body and the information in the @messageData is encoded.

If present, the following value is possible:

��        base64 the content is encoded as described in IETF RFC 4648 prior to adding it to the field.

If this attribute is present, the DASH Client is expected to decode the message data and only provide the decoded message to the application.

 

 

 

 

@messageData

O

specifies the value for the event stream element. The value space and semantics must be defined by the owners of the scheme identified in the @schemeIdUri attribute.

NOTE: the use of the message data is discouraged by content authors, it is only maintained for the purpose of backward-compatibility. Including the message in the Event element is recommended in preference to using this attribute. This attribute is expected to be deprecated in the future editions of this document.

Key

For attributes: M=Mandatory, O=Optional, OD=Optional with Default Value, CM=Conditionally Mandatory

For elements: <minOccurs>...<maxOccurs> (N=unbounded)

Elements are bold; attributes are non-bold and preceded with an @.

MPD Event elements

As is shown in Figure 3, each MPD Event has three associated timing parameters along the media timeline:

  1. The PeriodStart Time (LAT) of the Period element containing the EventStream element.

  2. Event Start Time (ST): the moment in the media timeline that a given MPD Event becomes active and can be calculated from the attribute <{Event@presentationTime}>.

  3. Event duration (DU): the duration for which the event is active that can be calculated from the attribute <{Event@duration}>.

Note that the first parameter is inherited from the Period containing the Events and only the 2nd and 3rd parameters are explicitly included in the Event element. Each EventStream also has EventStream@timescale to scale the above parameters.

Figure 3 demonstrates these parameters in the media timeline.

MPD events timing model

The ST of an MPD event can be calculated using values in its EventStream and Event elements:

$$ST = PeriodStart - \frac{EventStream@presentationTimeOffset}{EventStream@timescale} + \frac{Event@presentationTime}{EventStream@timescale}$$

Event Start Time of MPD event

In this document, we use the following common variable names instead of some of above variables to harmonize parameters between Inband events, MPD events, and timed metadata samples:

In which decode64() function is:

$$decode64(x) = \begin{cases} x\space\qquad\qquad\qquad\qquad\qquad \space \space \space \space @contentEncoding\space Not \space Present\\ base64 \space decoding \space of \space (x) \qquad @contentEncoding \space = \space base64 \end{cases} $$

decode64 function

Note that the DASH client shall Base64 decode the Event@messageData value if the received Event@contentEncoding value is base64.

2.4. Timed metadata sample timing model

An alternative way to convey information relating to a media is using timed metadata tracks. Timed metadata tracks are ISOBMFF formatted tracks that obey the following characteristics according to [ISOBMFF]:

  1. The sample description box stsd in the MovieBox SHALL contain a sampleEntry that is a URIMetaSampleEntry, to signal that the media samples contain metadata based on a urn in a URIBox to signal that scheme.

  2. The Handler Box hdlr has handler_type set to meta to signal the fact that the track contains metadata

  3. The null media header nmhd is used in the minf box

  4. Contain metadata (non media data relating to presentation) embedded in ISOBMFF samples

Figure 4 shows the timing model for a simple ISOBMFF timed metadata sample.

Timing parameters of a timed metadata sample on the media timeline

As shown in this figure, the metadata sample timing includes metadata sample presentation time (ST) and metadata sample duration (DU). Also one or more metadata samples are included in a segment with Segment earliest presentation time (LAT).

Note that the metadata sample duration can not go beyond DASH Segments/ISOBMFF fragment duration for fragmented metadata tracks, i.e. to the next fragment.

In this document, we use the following variable names instead of some of above variables to harmonize parameters between Inband events, MPD events, and timed metadata samples used in dispatach process:

3. Events and timed metadata sample dispatch timing modes

3.1. Dispatech timing

This figure shows two possible dispatch timing models for DASH events and timed metadata samples.
The Application events and timed metadata dispatch modes

In this figure, two modes are shown:

  1. on-receive Dispatch Mode: Dispatching at LAT or earlier. Since the segment carrying an emsg/metadata sample has to be parsed before (or assuming zero decode/rendering delay as the latest at) LAT on the media timeline, the event/metadata sample shall be dispatched at this time or before to Application in this mode. Application has a duration of ST-LAT for preparing for the event. In this mode, the client doesn’t need to maintain states of Application events or metadata samples either. Application may have to maintain the state for any event/metadata sample, its ST and DU, and monitor its activation duration, if it needs to. Application may also need to schedule each event/sample at its ST.

  2. on-start Dispatch Mode: Dispatching exactly at ST, which is the start/presentation time of the event/metadata sample. The DASH player shall dispatch the event to the application at the presentation time of the corresponding media sample, or in the case of the start of playback after that moment and during the event duration, at the earliest time within the event duration. In this mode, since Application receives the event/sample at its start/presentation time, it may need to act on the received data immediately.

Note: According to ISO/IEC 23009-1, the parameter duration has a different meaning in each dispatch mode. In the case of on-start, duration defines the duration starting from ST in which DASH Player shall disp atch the event exactly once. In the nromal playback, the player dispatches the event at ST. However if DASH Player for instance seek to a moment after ST and during the above duration, then it must dispatch the event immidiately. In the case of on-receive, duration is a property of event instance and is defined by the scheme_id owner.

3.2. The Dispatch Processing Model

3.2.1. Prerequisite

Application is subscribed to a specific event stream identified by a (scheme/value) pair with a specific dispatch_mode, either on start or on_receive, as described in § 4.1 Event and metadata track subscription.

The processing model varies depending on dispatch_mode.

3.2.2. Common process

DASH Player implements the following process:
  1. Parse the emsg/timed metadata sample and retrieve scheme_uri/(value).

  2. If Application is not subscribed to the scheme_uri/(value) pair, end the processing of this emsg.

3.2.3. on-receive processing

DASH Player implements the following process when dispatch_mode = on_receive:

3.2.4. on-start processing

DASH Player set ups an Active Event Table for each subscribed scheme_uri/(value) in the case of dispatch_mode = on_start. Active Event Table maintains a single list of emsg’s id that have been dispatched.

DASH Player implements the following process when dispatch_mode = on_start:

  1. Derive the event instance/metadata sample’s ST

  2. If the current media presentation time value is smaller than ST, then go to Step 5.

  3. Derive the ending time ET= ST + DU.

  4. If the current presentation time value is greater than ET, then end processing.

  5. In the case of event: Compare the event’s id with the entries of Active Event Table of the same scheme_uri/(value pair:

    • If an entry with the identical id value exists, end processing;

    • If not, add emsg’s id to the corresponding Active Event Table.

  6. Dispatch the event/metadata message_data at time ST, or immediately if current presentation time is larger then ST, as described in § 4 Prose description of APIs.

3.3. The event/metadata buffer model

Along with the media samples, the event instances and timed metadata samples are buffered. The event/metadata buffer should be managed with same scheme as the media buffer, i.e. as long as a media sample exists in the media buffer, the corresponding events and/or metadata samples should be maintained in the event/metadata buffer.

4. Prose description of APIs

The event/timed metadata API is an interface defined between a “DASH player” as defined in DASH-IF, or a “DASH client” as defined in 3GPP TS 26.247 or ISO/IEC 23009-1 and a device application in the exchange of subscription data and dispatch/transfer of matching DASH Event or timed metadata information between these entities. The Event/timed metadata API is shown at Figure 1.

Note: In this document, the term "DASH Player" is used.

The description of the API below is strictly functional, i.e. implementation-agnostic, is intended to be employed for the specification of the API in Javascript for the dash.js open source DASH Player, and in IDL such as the OMG IDL or WebIDL. For example, the subscribeEvent() method as defined below may be mapped to the existing on(type,listener,scope) method as defined for the dash.js under MediaPlayerEvents.

As part of this API and prior to any operations, DASH Player provides a list of scheme_id/(value) listed in MPD when it receives it. This list includes all events as well as scheme_id of all timed metadata tracks. At this point Application is aware of the possible events and metadata delivered by DASH Player.

4.1. Event and metadata track subscription

The subscription state diagram of DASH Player associated with the API is shown below in Figure 6:

State Diagram of DASH Player for the event/timed metadata API.

The scope of the above state diagram is the entire set of applicable events/timed metadata streams being subscribed/unsubscribed, i.e. it is not indicating the state model of DASH Player in the context of a single Event/timed metadata stream subscription/un-subscription.

The application subscribes to the reception of the desired event/timed metadata and associated information by the subscribeEvent() method. The parameters to be passed in this method are:

Note: ISO/IEC 23009-1 does not include amy explicit signaling for the desired dispatch mode in MPD or timed metadata track. In the current design, Application relay its desired dispatch mode to DASH Player when it subscribes to an event stream or timed metadata track. In this approach, the scheme owner should consider the dispatch mode as part of the scheme design and define whether any specific dispatch mode should be selected during the design of the scheme.

Note: (Editor’s Note-to be removed at the end of Community Review Period) If any service provider or application developer beleives an explicit signaling of dispatch mode is needed for some use-cases, they are requested to provide such use-case during Community Review Period of this document to DASH-IF for considering introducing a @dispatchMode attribute in MPD and submitting the request to MPEG.

the DASH-IF beleives an explicit signaling of the dispatch mode is benifitial and will request MPEG to add the support for it. Otherwise, either DASH-IF addes extensions or signaling of the dispatch mode would be considered out-of-band.

Upon successful execution of the event/timed metadata subscription call (for which DASH Player will return a corresponding acknowledgment), DASH Player shall monitor the source of potential Event stream information, i.e., the MPD or incoming DASH Segments, for matching values of the subscribed scheme_uri/(value). The parentheses around value is because this parameter may be absent in the event/timed metadata subscription call. When a matching event/metadata sample is detected, DASH Player invokes the function specified in the callbackFunction argument with the following parameters. It should additionally provide to the Application the current presentation time at DASH Player when performing the dispatch action. The parameters to be passed in this method are shown in Table 3 below:

API Parameter MPD event Inband emsg Metadata Data Type on-receiveon-start
scheme_id EventStream@schemeIdUri scheme_id_uri timed metadata track URI   Y Y
value EventStream@value value Y Y
presentation_time Event@presentationTime presentation_time timed metadata sample presentation time unsigned int(64)
in milliseconds
Y N
duration Event@duration event_duration timed metadata sample duration unsigned int(32)
in milliseconds
Y N
id Event@id id unsigned int(32) Y N
message_data Event@messageData message_data() timed metadata sample data in mdat unsigned int(8) x messageSize Y Y
Y= Yes, N= NO, O= Optional
Event/timed metadata API parameters and datatypes

When the duration of the event is unknown, the vairable duration shall be set to its maximum value (xFFFFFFFF = 4,294,967,295).

Note: In the case of ‘emsg’ version 0, DASH Player is expected to calculate presentation_time from presentation_time_delta.

In order to remove a listener the unsubscribeEvent() function is called with the following arguments:

If a specific listener is given in the callback_function argument, then only that listener is removed for the specified scheme_uri/(value). Omitting or passing null to the callback_function argument would remove all event listeners for the specified scheme_uri/(value).

5. Detailed processing

As shown in Figure 1, the event/metadata buffer holds the events or metadata samples to be processed. We assume that this buffer have same data structure to hold events or metadata. We use Table 3 to define this Event/Metadata Internal Object (EMIO):
event-metadata-internal-object {
string scheme_id_uri;
string value;
unsigned int(32) presentation_time;
unsigned int(32) duration;
unsigned int(32) id;
unsigned int(8) message_data();
}

The Event/Metadata Internal Object (EMIO)

The process for converting the received event/metadata sample to EMIO is as following:

  1. For MPD event

    • For each period:

    • Parse each EventStream

      • Get Eventstream common parameters

      • For each Event Stream:Parse each event

      • For each event

        • calculate presentation time and event duration

        • add it to EMIO

  2. For inband event

    • For each Segment

    • Parse event boxes as well as moof

    • calculate EPT of segment

    • For each event:

      • map emsg box parameters to EMIO

  3. For simple metadata samples

    • For each Segment

    • Parse moof

    • For each sample:

      • Parse the formant

      • map the data to EMIO

6. Externally defined terms

cmaf

See [MPEGDASH]

Event@contentEncoding

See [MPEGDASH]

Event@duration

See [MPEGDASH]

Event@id

See [MPEGDASH]

Event@messageData

See [MPEGDASH]

Event@presentationTime

See [MPEGDASH]

EventStream@schemeIdUri

See [MPEGDASH]

EventStream@timescale

See [MPEGDASH]

EventStream@value

See [MPEGDASH]

event_duration

See [MPEGDASH]

id

See [MPEGDASH]

message_data()

See [MPEGDASH]

presentation_time

See [MPEGDASH]

presentation_time_delta

See [MPEGDASH]

scheme_id_uri

See [MPEGDASH]

SegmentBase@presentationTimeoffset

See [MPEGDASH]

SegmentBase@timescale

See [MPEGDASH]

timescale

See [MPEGDASH]

value

See [MPEGDASH]

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

References

Normative References

[ISOBMFF]
Information technology — Coding of audio-visual objects — Part 12: ISO Base Media File Format. December 2015. International Standard. URL: http://standards.iso.org/ittf/PubliclyAvailableStandards/c068960_ISO_IEC_14496-12_2015.zip
[MPEGDASH]
Information technology — Dynamic adaptive streaming over HTTP (DASH) — Part 1: Media presentation description and segment formats. December 2019. Published. URL: https://www.iso.org/standard/79329.html
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119