Cinematic Look, Part 3: Dynamic Range
Images are all about light. Light is captured, transferred through the various storage and processing stages of the workflow and finally reproduced for viewing. The adventures of scene light on its way to the viewer of the final images have some implications for the cinematic look. More precisely, this article is about the dynamic range of the image capturing medium. The differences in the dynamic range of film and digital camera sensors are explained. We also get to talk a bit about transfer curves and gamma.
Scene dynamic range
Dynamic range and dynamic range transfer is one of the often misunderstood concepts in video and film, maybe because it is a bit technical. Dynamic range is the ratio between the smallest and the biggest possible values in some signal. Here we are interested in the case when this signal is light. Scene dynamic range or scene contrast is the ratio between the luminance of the darkest blacks and the brightest whites in a scene. This ratio can get quite large in scenes with both bright sunlight and dark shadows.
Human vision has a curious characteristic. In order to accommodate large scene contrasts we don’t see light physically “correct”. We see exponential luminance increments as linear increments. We perceive the change from 10 cd/m2 to 20 cd/m2 as similar to the change from 200 to 400 cd/m2. This means that the series of gray steps with luminances of 10, 20, 40, 80, 160,…cd/m2 is perceived as uniformly changing. And the series of gray steps with luminances 10, 20, 30, 40, 50, 60,…cd/m2 has the perceived differences between steps getting smaller. One important consequence of this logarithmic correlation of human vision to light is that the eye discerns small luminance differences in the darks better than in the highlights.
The logarithmic concept of light stops fits well with the workings of our vision and is widely adopted in photography. A surface is said to be one stop higher than another surface when the luminance of the first surface is twice the luminance of the second surface. So if a scene has a contrast ratio of 1000:1 it is said to have dynamic range of around 10 stops (210 = 1024).
Film dynamic range
The dynamic range of film and digital sensors is usually smaller than high dynamic range scenes. And color reversal film has much smaller dynamic range than color negative film. For example, Kodak Ektachrome 5285, which is a reversal stock, has less than 9 stops of dynamic range. The captured dynamic range distribution varies a bit depending on the specific film negative stock but latest negative stocks like Kodak Vision3 5219 have dynamic range of over 14 stops. Color reversal film is usually much more saturated than negative film. Both high saturation and limited dynamic range make reversal film more of a specialty stock, appropriate for specific uses like ads or music videos. Movies are almost universally shot on negative film.
From the characteristic curve of film (Kodak 5219 in this example) we can note the following. There is a large linear part in the middle of the curve where equal exposure change results in equal density change. That’s where detail is captured uniformly and with the greatest tonal resolution. The slope of the curve in its straight part is called gamma. For most film negatives gamma is around 0.6. This means that one stop of light, or 0.3 log exposure, is represented by 0.3*0.6 = 0.18 density. So, in a way, film does dynamic range compression: as you can see from the chart, a spread of more than 14 stops (4.2 log exposure) is captured in a density range of less than 2.0 log D. Most of this is due to highlights and shadows compression as explained below. Note that film has different sensitivity to red, green and blue. This is taken care of during the printing process.
I have also marked where 18% gray, 2% black and 90% white fall on the curve (for the green sensitivity curve). 18% gray or middle gray is what light meters use for light measurements. This is the shade of gray that falls perceptually in the middle of a black-to-white grayscale. 90% white is used as reference white in video and shows where diffuse white falls. Whites above this are generally specular highlights or in-frame lights. 2% black shows where the darkest detailed shadows fall. Below this, deep black with some tonal change is expected, but without real detail.
As we can see, there are around 3 stops below 2% where blacks are recorded, albeit compressed at the bottom and with less tonal resolution. And there are around 5 stops above 90% white for highlights. This is also the overexposure latitude. This latitude allows the cinematographer to overexpose in order to capture significant dark detail or to play with the look of the image during processing and printing. This allows for some contrast and grain modulation. Slight overexposure paired with pull processing (underdevelopment) and/or print down is common. The highest part of the curve is also compressed a bit, which means less tonal precision in this part. The point where shadows start to compress is called toe, and the point where highlights begin to roll is called shoulder.
It should be clear that the negative image is source material. If printed so that the curve is preserved, the image would appear very low contrast: washed and unappealing. That’s why release printing is done on high contrast positive stocks with gamma in the range of 2.5 to 3.0. This results in a print-through gamma of around 1.5 to 1.8. The print stock also does some further highlight compression through its toe. Blacks, on the other hand, are mostly unaffected due to the high maximum density over base of positive stocks.
The existence of a toe and a shoulder is the cause of one the defining characteristics of film, and consequently, of the cinematic look. The relatively large dynamic range paired with the compression of the extremes is the reason of the pleasant look of material shot on film in terms of range distribution: highlights seemingly roll off forever without clipping and there is a notion of tonality in the deep shadows.
An interlude: gamma encoding and end-to-end gamma
Gamma is another often misunderstood area. The fact that the word is used for at least three different concepts in the image-making realm doesn’t help either. In the case of film gamma is the slope (or the tangent) of the linear part of the characteristic curve. In digital, gamma is used both as a synonym of transfer function or transfer curve, and as the value used for the exponent in the special case of power-law gamma encoding/decoding.
The dynamic range of the human eye is around 10 to 15 stops in a given moment of time, depending on lighting conditions. Displays and projection have smaller reproduction capabilities. Projection usually has intraframe contrast ratio of 150:1 or smaller. Good monitors may have intraframe contrast of around 1000:1. So the highlights and blacks compression above shoulder and below toe allows for squeezing a higher dynamic range into the smaller dynamic range of the reproduction system. It is, in essence, a case of tone mapping.
System gamma, end-to-end gamma or print-through gamma (in the film case) all describe the gamma of the whole process: from scene to the final deliverable. Replicating scene light would suggest system gamma of 1. But this is only true if the viewing conditions were equivalent to scene conditions in terms of light. This is rarely the case. Projection flare, low absolute projection luminance (less than 50 cd/m2) and the relatively dark viewing conditions lower display contrast significantly and make blacks appear brighter to the eye. The higher system gamma adds some contrast and combats these limitations. For example, film negative gamma of 0.6, intermediate film gamma of 1 and print film gamma of 3.0 lead to a composite gamma of 0.6 * 1.0 * 3.0 = 1.8. For the brighter viewing conditions in offices and homes an end-to-end gamma of around 1.2 is considered sufficient.
Gamma encoding in digital images serves a different purpose. Consumer grade images are universally 8-bit. If light is encoded linearly the dark stops have very limited precision: 2 is a stop higher than 1, 4 is a stop higher than 2, 8 is a stop higher than 4, etc. There are almost no values to encode intermediate shades. On the other hand, there is an excessive amount of values in the upper end: in the top stop between 128 and 255, for example. So linear encoding is both inefficient and losing important information in the shadows. Power-law gamma encoding addresses this by applying a transform (usually a simple power function) to the input signal. The eye still needs linear light in order to see the correct image so the display applies the reverse curve and linearizes the output. Decoding (reverse) gamma values between 2.2 (sRGB) and 2.6 (digital cinema) are used, depending on the expected viewing conditions.
Dynamic range of digital video
Digital sensors are more straightforward than film in terms of captured light representation. The quantized signal from the sensor’s analog to digital converter is linear. If a photosite (pixel) is capturing twice the light than another photosite, then its quantized value will be twice larger. Most DSLR cameras capture RAW images quantized to 14 bits. For a typical DSLR camera with slightly above 11 stops of dynamic range, 14 bits allow for some decent tonal resolution even with linear encoding. But things start to get complicated when the raw data have to be stuffed into less bits for recording.
All DSLR cameras and consumer video cameras output 8-bit video. Stuffing 11+ stops of dynamic range into 8 bits can’t be done linearly simply because the coding space lacks resolution. The typical compromise results into a gamma encoded S-shaped (over stops/log exposure) transfer curve. The top 8 to 9 stops of the RAW dynamic range are selected for transfer because they are cleanest. Some sort of a knee is usually implemented with the highest 1 to 1.5 stops getting compressed. The knee is very similar to the shoulder of film. It simulates a roll-off in the highlights, slightly increases the overall dynamic range and also allows for a bit more tonal precision in the mids where the most important tones are. The resulting image is sufficiently contrasty and ready for the consumer display. But it is not really supposed to be post-processed.
Again, I have marked 2% black, 18% gray and 90% white on the dynamic range chart for the Canon DSLR Standard picture style. Note that there is around one stop over 90% white available for highlights. Compare this to the excessive overexposure latitude of film. Shadows are better represented although the low stops are lacking in tonal resolution. An attempt to contain the highlights on exposure will often result in crushed blacks in high contrast scenes.
This type of consumer-ready transfer function plus the limited dynamic range of early digital cameras have led to the notion that digital video is too contrasty, highlights are hard clipped and the blacks are crushed and lacking detail. This is exactly what many people mean when they say that an image looks “video-ish”.
Recent high-end digital cameras have much better dynamic range capabilities and rival the best film stocks. Access to the full dynamic range is enabled through either linear RAW video (12 bit or more) or some (near) logarithmic transfer function. Both linear RAW video and log video are production formats and require post-processing for presentation. The idea of log space video is to provide a near flat distribution of coding values over exposure. Such a distribution provides both the full camera dynamic range and better tonal precision in blacks and highlights. Thus log curves are close to film characteristic curves, allowing for easier intercutting of digital video and scanned film footage. For example, the Arri Log C transfer function encodes around 14 stops of dynamic range from the Arri Alexa camera. Similar transfer curves have been constructed for many cameras, including DSLRs. It is worth noting that accommodating a large dynamic range into a limited coding space (such as 8 bits) results in limited tonal precision. This makes the practicality of true 8-bit log curves somewhat dubious. A 10-bit film scan allocates around 90 coding values per stop in the flat part of the characteristic curve, 10-bit Arri Log C allocates around 75 values. Whereas an 8-bit transfer curve like Technicolor’s CineStyle for Canon DSLR cameras allocates around 27 values per stop. That’s why low precision flat curves should be used with care and with understanding of the tonal precision trade-off. You can read more on 8-bit flat transfer curves here.
The previous parts of the Cinematic Look series can be found here: Part 1 on Aspect Ratio, Depth of Field and Sensor Size, and Part 2 on Frame Rate and Shutter Speed. And the next part is on Film Grain.