Captions & on-screen text

Auto captions

What is Auto captions?

Auto captions are text captions generated automatically by speech recognition software, synchronized to the audio track without manual transcription. Because machine-generated captions regularly contain errors, especially with accents, proper nouns, and background noise, most platforms and accessibility guidelines treat them as a starting draft that needs human review before publishing.

When you'd use it

  1. 1When you need a first-pass transcript of a talking-head video before editing the text manually.
  2. 2When you have a high volume of clips to caption and transcribing by hand would slow down publishing.
  3. 3When the speaker has a clear accent or uses brand-specific proper nouns that speech recognition misreads.
  4. 4When your editing workflow requires a text layer you can proofread and correct before export.

Example

A food creator filming in a noisy kitchen may find that auto captions transcribe "sauté" as "so tay" and a product name as a common misspelling. One pass through the transcript to fix those errors takes about two minutes and prevents viewers from reading garbled text during a critical step.

Use cases

  1. 1Generating a draft caption track for a founder interview before a human editor corrects the errors.
  2. 2Speeding up the caption workflow for a weekly social series with multiple episodes.
  3. 3Catching missed words and brand name misspellings in a product launch clip before posting.

FAQ

Are auto captions the same as closed captions?

Not exactly. Auto captions describes how the captions were generated (by machine). Closed captions describes how they are delivered (as a togglable track separate from the video). Auto-generated text is typically served as a closed caption track, but closed captions can also be created manually.

Make on-brand short-form video from the footage you already have.