Digitization of Audio and Moving Images

Overview


Selecting the appropriate digitization specifications depends on knowing the format of audiovisual media. Instructions and photographs for identifying audio, video, and film formats can be found in the Preservation Self-Assessment Program (PSAP) Collection ID Guide (University of Illinois at Urbana-Champaign). Information about appropriate storage enclosures, orientation, and environmental conditions for preservation of original media can also be found in the PSAP guide.

 Recommended preservation file formats for audiovisual media are quickly evolving. These recommendations are based on preliminary research and pilot projects and will require regular review and updating.

 For each media category (audio, video, and film), file format recommendations are broken down into three file types, using terminology employed by FADGI:[1]


  • Archival Master File: high-quality/”best copy” preservation master file intended for long-term dark storage; may be used to create new production master and derivative files; replaces original analog media object as primary content source for preservation
  • Production Master File: intermediate-quality master for infrequent access, which could be an assembled or corrected version of the archival master file; could be used to generate new derivative files or provided as an access copy for users interested in reusing content in professional-quality video or sound productions
  • Derivative File: lower-quality access copy optimized for streaming or otherwise delivering a copy of the file to researchers; not an object of long-term preservation/may be replaced regularly due to format obsolescence

 

While this document includes recommendations for archival master, production master, and derivative files, not all projects will warrant the creation of all three file types. For projects where the content of the source media object is unique or particularly rare -- and the objective of digitization is long-term preservation -- all three file types should be created. Production masters may not be necessary for file formats where the differences between the archival master and the production master would be negligible. For projects where the content of the source media is not unique -- and the primary objective of digitization is access -- creating only a derivative file may be sufficient.

 

The decision whether to retain original media following preservation-quality digitization will vary based on the curatorial context and may depend on factors including condition, stability of the format, and artifactual value.

Audio

Recommendations for audio are based on a general profession-wide consensus on Broadcast Wave as the standard audio preservation file format.[2]

 

 

Archival Master File

Production Master File

Derivative File

Audio

(analog source)

Broadcast Wave (BWF) wrapper;

linear PCM uncompressed

24-bit bit-depth; 96kHz sampling rate

 

Broadcast Wave (BWF) wrapper;

linear PCM uncompressed

16-bit bit-depth; 44.1 kHz sampling rate

MPEG Audio Layer 3 (MP3); 256Kbps bit rate; 44.1 kHz sampling rate

Audio

(digital source)

Broadcast Wave (BWF) wrapper;

native uncompressed data at original sample rate (typically 44.1 or 48 kHz) and bit-depth (typically 16-bit) with original embedded metadata maintained

n/a

 

Sound processing (i.e. noise reduction, compression, limiting, etc.) should not be applied to archival master files. If sound processing is desired in order to address errors inherent to the original recording, processing should take place on a production master or derivative file rather than the archival master file. The transfer must capture all content recorded from the head to tail of each original tape.

 

For archival master files, metadata should be embedded according to FADGI Guidelines on Embedded Metadata in Broadcast WAVE Files.

Moving Image

Video

There is no current profession-wide consensus on a single preservation file format for video. According to the 2017 white paper from Indiana University’s Media Digitization and Preservation Initiative (MDPI), there are three primary standards in use:[3]

 

  • 10-bit, uncompressed, v210 codec, usually with a QuickTime wrapper
  • JPEG 2000, mathematically lossless profile, usually with an MXF wrapper
  • FFV1, a mathematically lossless format, with an AVI or Matroska wrapper

 

While all of these options are potentially viable, recommendations mirror the decisions made by MDPI and have also been implemented at the New York Public Library.[4] FFV1 and Matroska are both open-source technologies developed with preservation needs in mind and are undergoing the IETF (Internet Engineering Task Force) standards process. Selecting a mathematically lossless format like FFV1, as opposed to an uncompressed format, will also result in smaller archival master files, which can be massive for digitized video, reducing the overall financial and environmental impact of long-term storage for digitized video.[5]

 

 

Archival Master File

Production Master File

Derivative File

Video

(analog source)

FFV1 version 3 in Matroska (MKV) wrapper (4:2:2 YUV, 10-bit bit depth) with linear PCM uncompressed audio encoding (24-bit bit-depth; 48kHz sampling rate)

 

QuickTime ProRes 422 HQ

MPEG-4 H.264 video

Video

(digital source)

preserve original codec in Matroska (MKV) wrapper, with original embedded metadata maintained

n/a

 

For archival master files, all characteristics intrinsic to the broadcast standard of the source material will be preserved, including frame rate, pixel aspect ratio, interlacing, resolution, recording standard, and number of audio channels. Auxiliary information, such as original timecode and closed captioning should also be maintained. Luma, chroma, and black levels will be adjusted accordingly to each specific video on the playback equipment or Time base corrector (TBC) to best represent the source material. The transfer must capture all content recorded from the head to tail of each original tape.

Film

It should be noted that digitization of film is still debated among film-preservation specialists; some practitioners insist that photochemical, film-to-film reproduction is the only acceptable method for preservation reformatting. Recommendations for film digitization are based on FADGI’s “Digitizing Motion Picture Film: Exploration of the Issues and Sample SOW.”[6]

 

 

Archival Master File

Production Master File

Derivative File

Film

2k RAWcooked DPX (FFV1 version 3 in Matroska (MKV) wrapper)

10-bit LOG; RGB 4:4:4; with uncompressed linear PCM audio encoding (24-bit bit-depth; 96kHz sampling rate)[7]


QuickTime ProRes 422 HQ

MPEG-4 H.264 video


For archival master files, all characteristics intrinsic to the original source film should be preserved, including aspect ratio (e.g., no cropping or matting of the image.) No interpolation and recreation of missing frames. The transfer must capture all content recorded from the head to tail of each original reel.

 

For archival master files, metadata should be embedded according to FADGI Guidelines on Embedded Metadata in DPX Files.

 

Since digitization of film is not broadly considered sufficient as a preservation-quality replacement for the original, original film reels should generally be retained even after they have been scanned and stored in cold storage at ReCAP. (Film on cellulose nitrate stock may be an exception.)

Checksums and Validation

For each file intended for long-term preservation, a SHA256 checksum should be generated upon creation; this checksum should be verified every time files are transported from one storage medium to another.

 

All files must pass JHOVE format validation, as well as validation of required technical specifications via Media Conch/MediaInfo. In addition to these automated quality control checks, digitized audiovisual files should also pass manual quality control checks. Specific processes will depend on the type of project, based on a risk assessment analysis. Files may be checked comprehensively, or a sampling methodology could be employed.

In-House vs. Vendor-Sourced Digitization

Digitization of unique or rare audiovisual media, as well as media in particularly fragile formats, should be outsourced to specialized vendors. Vendors selected must demonstrate their ability to meet industry standard practices for digitization of archival audiovisual media. The Mendel Music Library has equipment to support the digitization of most audio formats in-house with the exception of reel-to-reel tapes and may be able to provide services to other library units upon request depending on the scale of the project.

Captions and Transcripts

  • Video with audio content significant to it must have captions in the original language. We accept the VTT format for captions.
    • For a video named "video.mp4" in english, its original language captions should be in the same folder named "video--original-language--eng.vtt". "eng" in this example is an ISO639-1 language code.
  • Audio with significant language content should have a transcript.

Vendors

Some vendors who provide captioning and transcription services are:


[1] FADGI. Glossary: “Archival Master File,” “Production Master File,” “Derivative File.” http://www.digitizationguidelines.gov/glossary.php

[2] International Association of Sound and Audiovisual Archives (IASA), https://www.iasa-web.org/tc04/ingest-format; FADGI http://www.digitizationguidelines.gov/audio-visual/documents/IP_Fleischhauer_AudioVisual_Reformatting_isqv22no2.pdf; European Broadcasting Union (EBU) https://tech.ebu.ch/publications/tech3285

[3] Casey, Mike. “White Paper: Encoding and Wrapper Decisions and Implementation for Video Preservation Master Files.” Indiana University Media Digitization and Preservation Initiative (MDPI). March 2017. https://mdpi.iu.edu/doc/MDPIwhitepaperrev.pdf

[4] New York Public Library. AMI Digital Asset Technical Specifications.

https://github.com/NYPL/ami-specifications

[5] According to the MDPI study, using FFV1 resulted in “roughly 65% less data than a comparable file using the v210 codec.”

[6] FADGI. “Digitizing Motion Picture Film: Exploration of the Issues and Sample SOW.” April 18, 2016. http://www.digitizationguidelines.gov/guidelines/FilmScan_PWS-SOW_20160418.pdf

[7] Dependent on the curatorial context, there may be some cases in which the long-term preservation of a large DPX file is not warranted. In these cases, the standards for analog video digitization may be used instead.