Non-Normative Video Coding

In order to provide interoperability between different compliant codecs, it is essential for a coding standard to fix the decoding procedure and the bitstream syntax and semantics. However, the encoding procedure and encoding tools do not have to be defined for video codec’s, which is a very useful feature. Since standards are always a constraint on codec design freedom, it is important to make them as minimally constraining as possible in order that the standard may evolve as much as possible through the non-normative areas. These coding tools, which are essential for a high performing codec, but are not normatively defined by the standards, are called non-normative video coding tools; the non-normative areas correspond to the tools for which normative specification is not essential for interoperability. On the contrary, the normative tools are those specified by the standard and whose specification is essential for interoperability. For example, while video segmentation and bitrate control are non-normative tools, the decoding process needs to be normative. The strategy of specifying the minimum for maximum usability ensures that good use can be made of the continuous technical improvements in the relevant technical areas. The consequence is that better non-normative tools can always be used, even after the standard is finalised, and that it is possible to rely on competition for obtaining ever better results. In fact, it will be through the very non-normative tools that products will distinguish themselves, which only reinforces the importance of this type of tools and thus of bitrate control technologies. Non-normative video coding tools are very important since, notably to i) improve the RD performance of a coding standard which has been defined long ago by adding more powerful encoding tools which may be continuously developed, without any negative impacts in terms of interoperability; ii) improve the error resilience of a coded bitstream by adding error resilience at encoding or improving the decoded quality at decoding (error concealment). This fact also allows research competition between companies since very different analysis tools may be used at the encoder while guaranteeing interoperability through the same bitstream syntax, eventually with rather different performances for similar conditions, e.g., different video quality for the same bitrate. Finally, this fact, and thus better non-normative coding tools, is behind the incessant improvement of the RD performance of standard video codecs, making the standard a ‘living entity’. Considering the specific case of the H.264/AVC predictive encoder architecture, it can easily be concluded that the most important coding modules are located at the encoder where the ‘brain’ of the codec lies. The better this ‘brain’ works, this means the more powerful the analysis tools are, in terms of allowing to take good coding decisions and to get good auxiliary data, the better the RD performance of the codec will be. For the H.264/AVC video codec, which is the video coding standard with the most flexible coding syntax, the main analysis modules regard:

1. Motion Estimation – Motion estimation plays a central role in exploiting the temporal correlation in any video codec. To show how important motion estimation and compensation is, it is possible to say that most of the compression efficiency evolution is recent years, notably for the MPEG standards, is related to the evolution of motion representation, e.g., accuracy of the motion estimation, variety of macroblock partitions, and number of reference frames. This compression power has the side effect of a high encoding complexity which may reach 80% for some codecs and platforms, justifying the need to develop the so-called fast motion estimation methods where the image is analysed in a non exhaustive way to select the best motion vectors. It is important to stress that video coding standards do not define how motion estimation shall be performed but it is well known that bad motion estimation implies a bad RD performance.

2. Reference Frame Selection – In the H.264/AVC standard, it is possible to choose the usual one or two reference frames (for P and B slices) from a set of candidate reference frames, which means that the encoder has to analyse the multiple candidate reference frames available to decide which frame(s) should become the reference frame(s) to provide the best prediction for each macroblock. This is basically the same problem as in the previous item, brought now to a larger analysis and optimisation space. Again, considering the high complexity associated, fast reference frame selection methods are relevant.

3. Intra Prediction – Intra prediction in the H.264/AVC standard allows a macroblock to be very efficiently Intra encoded based on a prediction created using the neighbouring macroblocks from the same frame. This requires some analysis in order for the best Intra prediction mode to be selected.

4. Coding Mode Selection – Since there is a multiplicity of coding modes, macroblock partitions, and reference frames, it is necessary to select the coding solution which performs the best. In the H.264/AVC reference software, this can be accomplished through various RD optimisation (RDO) methods using RD functions which trade-off complexity with optimality, but other types of analysis may be used since this is a non-normative process. For certain environments, e.g., battery constrained, the RD optimisation approach may evolve to a rate-distortion-complexity (RDC) approach where the decoding complexity is also considered in the optimisation process.

5. Rate Control – If a certain output bitrate has to be obtained, then the encoder has also to further decide on the bit allocations (e.g., GOP, frame and MB level), quantisation steps, and their variation within and along the frames, implying that some rate control method is available. This tool has a key impact on the final RD performance since badly allocated resources are wasted resources; to avoid this, some good analysis is also required.

6. Error Resilience – Since most transmission channels are error prone, an efficient way to improve the final video quality is by increasing the error resilience of the transmitted bitstream. This may be performed in several ways, notably by adding intra refreshing, redundant data, selecting adequate FMO modes, etc. Moreover, since standards define the decoding process for ideal channels, when the channels are not ideal, decoders have to become ‘more clever’ by adding adequate error processing tools which may significantly improve the final video quality. A very interesting type of these tools, well known as error concealment tools, has the task to maximise the final video quality by exploiting in the best way the corrected and corrupted received data.