Cinematic Look, Part 1: Aspect Ratio, Sensor Size and Depth of Field

With the advent of the digital SLR as a video capturing device in recent years there is a lot of raving on the internet about the “cinematic look” one can achieve with DSLRs. Cinematic look is often opposed to video look or TV look. On forums and blogs one can read both delusions and truth regarding this distinction. As is often the case with any hype – hype has the tendency to self-amplify – a lot of noise gets picked up and reiterated in such a discussion. This series of articles will attempt to examine in some detail the various characteristics of the cinematic look and then explore how they relate to the image of video capturing devices, including HDDSLRs. Hopefully, some myths will be cleared in the process. This first part in the series is focused on aspect ratios and sensor sizes and the closely related topic of depth of field.

Before we start, let’s make it clear what does “cinematic look” actually mean. For us, cinematic look is what audiences have come to expect from a motion picture in terms of appearance, or in other words in terms of visual perception. This is the image we have been culturally conditioned to consider as cinematic through decades of exposure to movies. And this is what we are trying to replicate with digital cameras when aspiring to achieve the “cinematic look”. Note that the cinematic look is historically “film look” as movies were almost exclusively shot on film for some hundred years.

The aspect ratio

Silent films were generally shot in 4:3 (1.33:1) aspect ratio

For years wide format moving images were associated with cinema and the more squarish 1.33:1 format was linked to TVs. At least, that’s what we were conditioned to imagine. With the introduction of wide TVs things change a bit. HD and full HD television screens now have aspect ratio of 1.78:1 (1920×1080 pixels for full HD; 1280×720 pixels for HD). In comparison, the most popular cinema aspect ratios are currently 1.85:1 and 2.39:1 (often labeled 2.35:1 for historical reasons), for flat and anamorphic projection respectively. Which means TVs are now much closer in geometric appearance to cinema screens. Cinematic ratios weren’t always like this, though. Silent films were 1.33:1. Talking pictures were 1.375:1 for some twenty years till 1952. For various reasons, around this time the widescreen revolution in cinema happened with the forementioned wider aspect ratios becoming prevalent.

Artistic reasons for choosing a specific aspect ratio notwithstanding, it is correct to assume that wide image is nowadays associated with cinematic appearance. Up to some limit, at least. Incidentally, using anamorphic adapters on HD cameras may yield images with extracted aspect ratio of up to double the HD ratio (or 3.56:1), which can be way too wide, so some cropping of the sides is recommended in such cases. But use of anamorphics on consumer and prosumer digital cameras is a topic for another article. The bottom line is, 1.78:1 video is already quite cinematic in geometric appearance. But if one so desires, it is safe to consider a slight cropping of the top and the bottom to bring it to 1.85:1 or even 2.39:1. Have a look at this article for more in-detail thoughts on aspect ratio choice for a specific project.

Cinema aspect ratios

Note that in the above two paragraphs we are talking about the aspect ratio of what you see on screen. This is not necessarily the same ratio as the recorded image, especially when the recording medium is film.

Sensor size and depth of field

In terms of aesthetics the most apparent property linked to sensor size or film frame size is depth of field. Popular understanding is that smaller sensors yield greater depth of field and, conversely, large sensors have less depth of field. Technically, if we shoot the same composition with two differently sized sensors, and:

with lenses covering the same angle of view, i.e. wider lens for the smaller sensor and longer lens for the larger sensor;
with the same camera-subject distance;
with the same aperture diameter;
enlarge the result to the same print or screen size (or resize to the same video pixel size, let’s say 1920×1080);
and use the same criterion for sharpness (i.e., the circle of confusion is proportional to the sensor size),

then both pictures will have exactly the same depth of field. Your experience tells you otherwise? The tricky part is number 3). Same size aperture does NOT mean same f-number because the f-number equals the focal length divided by the aperture diameter. Which means that for the conditions above, the longer lens (used on the larger sensor to achieve equal angle of view with the small sensor) will shoot at a bigger f-number in order to maintain the same physical aperture.

On the other hand, if we retain all conditions but change 3) to “at the same f-number” (which, by the way, should also, more or less, preserve the same exposure, considering we shoot at the same ISO), then the smaller sensor will indeed yield greater depth of field. This is because the wider lens (used for the smaller sensor) will have a smaller aperture opening at this equal f-number. So we can conclude that lenses with equal angles of view, shot at the same f-number on differently sized sensors (or film frames, for that matter) manifest different depth of field, with smaller sensors giving pictures with greater DOF.

For this article we will ignore other properties related to format size. In the digital case these include dynamic range, sensitivity and noise (all three are, more precisely, connected to sensor pixel size). For film, different film frame sizes (but from the same emulsion) will show different grain sizes when projected on the same screen (or printed on the same release stock).

Camera aperture and projection aperture

Motion picture film cameras have a rectangular film gate in front of the negative, which defines the portion of the frame getting exposed. This portion of the negative (and often the gate itself) is called camera aperture. When projecting a release print in a movie theater another gate is used in front of the film called projection aperture. The projection aperture is slightly smaller than the camera aperture to allow some safety margin for imperfect alignment of the film roll. It is essentially a window in the camera aperture area. So the audience in the theater never sees the full image as it was recorded but a slightly cropped version. Camera and projection aperture may also vastly differ when different aspect ratios are involved in shooting and projection: for example, when a movie is shot in 1.33:1 but projected at 1.85:1. All this makes film frame area measurements somewhat ambiguous. In this text for consistency when talking about various film formats we will mean the standardized camera aperture unless “projection aperture” is explicitly stated.

Frame sizes and aspect ratios of popular film formats

A frame from Kodak Gold 35mm stills negative film

Stills photographers coming to videography sometimes wrongfully assume that 35mm motion picture frames are the same as stills 35mm frames. A full frame for stills photography is sized 36mm x 24mm, or, rather, this is the exposed area of the frame (or the camera aperture). Note that film rolls are oriented horizontally in a stills camera with film perforations at the top and the bottom of the frame. On the other hand, film used for motion pictures is (usually) oriented vertically (perforations at the sides). For 35mm film the longer side is around 24mm and the shorter side around 18mm. The exact frame height depends on how narrow is the frame line. The standard negative pulldown (or film pulldown) for movies is 4 perforations per frame (4-perf): the camera sprocket wheels pull four perforations from the film roll for each frame.

35mm silent frame

35mm Academy format frame, shown with the space reserved for soundtrack

3-perf Super 35 frame

Techniscope is a 2-perf format

Silent film utilized the full area of the frame for recording images because there was no need to leave space on the negative for sound. The camera aperture of silent film was 24.89mm x 18.67mm (.980″ x .735″) with 1.33:1 aspect ratio.

For talkies the image area shrunk in order to accommodate the soundtrack on the release print. In 1932 sound pictures camera aperture was set to 22.05mm x 16.03mm (.868″ x .631″), with the projection aperture set to 20.1mm x 15.24mm (.825″ x .600″) and 1.375:1 aspect ratio.

Wide screen formats varied a lot through the years. Anamorphic formats utilized a frame size similar to the Academy format but in order to achieve widescreen ratios anamorphic lenses were used to squeeze the image while shooting and then unsqueeze it on projection. But we will leave anamorphic formats out for this article and focus on flat formats as they relate easier to digital sensors. The current wide standard for shooting flat is Super 35. Super 35 is a production standard, meaning it gets resized when printed. This also means there is no need to leave space on the negative for sound as the printing is not 1:1. Super 35 was originally a 4-perf format sized 24.89mm x 18.67mm (.980″ x .735″). This is a 1.33:1 format so frames were matted down (to 1.85:1 or 2.39:1) for release. This also means a lot of the frame was wasted, so currently a 3-perf version sized 24.89mm x 13.87mm (.980″ x .546″) is used in order to maximize frame utilization. This saves around 1/4 stock length compared to 4-perf. 3-perf still gets cropped a bit when printed but wastes much less negative than 4-perf.

Various wide-screen apertures have been used through the years. VistaVision was a 8-perf horizontal format developed by Paramount in the 50′s and similar to 35mm for stills. With camera aperture sized 37.7mm x 25.17mm (1.485″ x .991″) it offered great image quality. Most productions were matted and printed down to standard size 1.85:1 format vertical prints for theatrical release. Despite the exceptional quality VistaVision didn’t pick up because of the higher stock costs in comparison to anamorphic formats. Since the 60′s it’s been used mostly for special effects work requiring greater resolution.

On the other side of the spectrum was Techniscope introduced by Technicolor Italy in the early 60′s. This was a 2-perf production format meant to save film stock by sacrificing a bit of image quality. It used a camera aperture sized 22.05mm x 9.47mm (.868″ x .373″). Techniscope pictures were shot flat then printed with 2x vertical enlargement factor to be projected anamorphically. Being 2-perf, during production it used half the stock compared to 4-perf but resulted in larger grain and less clarity.

North by Northwest, like other Hitchcock movies from the second half of the 50's, was shot in VistaVision

Then there is also 16mm film, which is widely used for documentary, TV and occasionally for cinema work, especially for indie films. Super 16 (which is the analog of Super 35 in the 16mm world) has camera aperture sized 12.52mm x 7.41mm (0.493″ x 0.292″). Recent award winning films shot in Super 16 include The Hurt Locker, Black Swan and The Wrestler.

Digital sensors

For a long time TV cameras used analog pickup tubes to convert optical images into electric signals. The imaging area of the tube is usually 2/3 of the diameter of the tube. Standard tube diameters included 1 inch, 2/3 inch, 1/1.8 inch. You probably notice similarity with digital sensor size categories. Indeed, sensor sizes like 2/3″, 1/1.8″, 1/2.3″, Four-Thirds, etc. are named like this for historical reasons related to analog TV and video cameras. Each of these has an imaging diagonal roughly equal to the imaging diameter of a tube of that size (remember, the imaging diameter of the tube is about 2/3 of the overall tube diameter). For example, a typical 2/3″ sensor will have a diagonal of around 11 mm. This is roughly equal to 2/3 of 2/3″.

Modern HD digital video cameras normally use a 16:9 sensor. The typical aspect ratio of the digital sensor in a photo camera is either 3:2 (mimicking film stills) or 4:3. But when shooting video with a photo camera only a 16:9 portion of the sensor is used with pixels in the top and the bottom getting discarded. One consequence of this is that crop factors often used for comparison of photo camera sensors are not always accurate in terms of video. Modern video is predominantly widescreen and cropping practically always happens at the top and/or the bottom thus keeping the original width of the image unchanged. That’s why cinematographers and videographers often use the sensor width when comparing sensors in terms of DOF instead of the usual diagonal measurement as used in crop factors for stills.

Considering this, the following table lists some film format frame and digital sensor sizes with only the approximately 16:9 (or wider, where 16:9 is not applicable) area taken into account, sorted by width.

*Various film format and sensor sizes sorted by width. All sizes in millimeters.*
Sensor or film format	Frame size (16:9)
Canon 5D Mark 2/3 (Full Frame)	36 x 20.3
Canon 1D Mark 4 (APS-H)	27.9 x 15.7
Super 35 (film)	24.89 x 13.87
Canon C300	24.6 x 13.8
Arri Alexa	23.76 x 13.365
Nikon D7000 (APS-C)	23.6 x 13.3
Sony Nex 5n	23.4 x 13.16
Canon 7D/60D/600D (APS-C)	22.3 x 12.5
Red Epic/Scarlet in 4K mode	22.12 x 12.44
Techniscope (film)	22.05 x 9.47
Panasonic GH2 in 16:9 mode	18.8 x 10.6
Super 16 (film)	12.52 x 7.03
Typical 2/3″ TV camera tube	8.8 x 4.95

So how do we interpret these in terms of depth of field?
It is a common understanding that TV and video have relatively big apparent depth of field. And we can easily see why this is the case. First, TV cameras tend to use relatively slow zoom lenses with relatively small apertures. Second, and more important, as seen above they have a much smaller imaging area compared to both film and large digital sensors.

One can often read on forums statements like “I like Canon 5d Mark 2 because of its cinematic DOF”. Statements like this can be attributed to years of visual opposition TV vs Cinema and the consequent automatic generalization: TV has lots of DOF, cinema has shallow DOF. This is not always true. The correct statement is “cinema can have shallower depth of field than TV”.

One look at the table shows that Full-Frame DSLRs actually have much larger sensor size than the typical widescreen motion picture frame size (i.e. Super 35 and classic widescreen). This means they may demonstrate excessively shallow DOF compared to motion pictures shot on film when pictures are shot at the same f-number/exposure (see above). APS-C sized sensors are actually much closer to the typical film frame size. No wonder that digital cinema cameras that claim “Super 35″ sized sensors actually utilize APS-C sensors. This doesn’t mean that APS-C sensors are better than Full Frame. There are other reasons to use sensors larger than APS-C: low light sensitivity, dynamic and color range, overall image crispness (this last one is often “lost in compression” in DSLR video). And the shallow depth of field fetish, of course.

Once Upon a Time in the West (1968) screenshot

Sergio Leone shot Once Upon a Time in the West in Techniscope, with a frame size smaller than APS-C

A small non-technical digression. There is also the aesthetic side of DOF. Some of the greatest films in history sought to get deep focus by either using wide lenses exclusively or pooling tons of light on set and shooting at small apertures. Others ended with relatively bigger DOF for technical reasons: smaller film frame sizes (compared to Super 35 and APS-C). It is telling (and a bit ironic) that Paramount promoted their VistaVision process as a deep focus vehicle, mostly based on the availability of a 28 mm lens – one of the widest at the time. It was not shallow focus that tempted filmmakers, but rather image clarity, depth and wide angle possibilities. Generally, movies are supposed to represent objects in relation to their surroundings. This requirement implies sufficient DOF in order to visualize these relations. Selective focus is certainly a great tool for isolating subject matter and commanding the viewer’s eye. But shallow DOF is just that: a mean, not a goal. Very shallow DOF can surely be a good aesthetic for certain scenarios (recently Tinker Tailor Soldier Spy relied on shallow DOF, mostly by utilizing longer lenses), but these tend to be the exception, not the norm. So we can nevertheless argue that shallow depth of field is not an implicit characteristic of cinema because there are lots of influential movies heavily utilizing deep focus shots. There are other properties more intimately associated with the cinematic look. Some of them are in the focus of the next part in this series.

We can safely conclude that in the DSLR realm (and in the digital sensor world, in general) APS-C is the closest representation of the cinematic look in terms of DOF.