How to generate captions for your on-demand video


In addition to displaying a livestream, Vito hubs support video on demand (VoD) content. This means you can upload pre-recorded videos to be playable immediately or scheduled to become available at a later date.

Whether you use this feature to add teaser content ahead of your event, or to upload conference talks for catch-up viewing post-event, you can make your videos more inclusive and accessible by including VTT captions. In this post, we’ll explain how. But first…

What are VTT captions?

Web Video Text Tracks Format (WebVTT, or just VTT) files are a specific type of text document with timestamps that video players can use to display a text overlay on a video. They’re used to provide captions and/or subtitles for the audio track of your video.

What’s the difference between captions and subtitles?

  • Captions are a text version of the dialogue in the video, and match the language of the video. They’re primarily aimed at people who are deaf and hard of hearing (though there are lots of other benefits, as we’ll see below), so you may also include non-verbal audio cues, such as “UPBEAT MUSIC”, for full context.
  • Subtitles allow you to provide translations in other languages for the dialogue in your video.

While you can use VTT files for both use cases, we’re going to focus specifically on captions in this post.

It’s also worth clarifying that Vito uses closed captions (CC), meaning the captions can be toggled on or off by each individual user according to their preference or needs. By contrast, open captions are part of the video meaning they always show. If this is what you’d like for your videos, you can edit the captions into the video file itself before uploading it to Vito, but we recommend CC as it puts your users in control of their own experience (and also requires less of your time spent editing).

For more in-depth information about the technical specifics of VTT Concepts, check out the related W3C spec.

Why add captions?

It takes a little effort to add captions to your videos, so before we go any further let’s explore just three of the many benefits.

  • Captions make your content more accessible to a range of people with specific needs. Of course this includes people who are deaf and hard of hearing, but it can also include some people on the autism spectrum, or who have auditory processing disorder or various forms of ADHD, among others.

Convinced? OK great! So…

How do I add captions in Vito?

It’s super easy to attach VTT files to a video in Vito. Head to Content > Videos while Backstage, click Upload and drop or browse for your video file, and then when it’s uploaded, click the blue “Add caption or subtitle” button:

User interface screenshot from Vito showing the Add caption or subtitle button

From here, you can set the language and optional fields, and choose your captions file — save and you’re done.

User interface screenshot from Vito showing the .vtt file upload form

When enabled by the user, captions appear overlaid on the video like this:

User interface screenshot from Vito of a video with captions

So that’s how to add captions in Vito, but how do you generate the VTT files to begin with? We’ve got you covered.

Ways to generate caption files

There are a few different ways to generate captions, with varying degrees of accuracy, effort and cost.

Human captioner

The most reliable option is naturally also the most expensive, and this is to hire a professional captioner or transcription service. They will be able to pick up nuances that AI (artificial intelligence) might miss, and also more accurately capture any specialist language, brand names and other proper nouns, as well as add markers in to indicate who is speaking.

If your video will be livestreamed to begin with, and subsequently uploaded to be watched on command, you may want to work with a professional captioning service like White Coat Captioning — and be sure to book well in advance! Services like these will provide live captions at a URL which your participants can access in real-time. In Vito, you can link out to this URL directly below the video so folx can follow along. Following the stream, you can retain the text file of the captions and attach them to the recording for future viewers to make use of.

Human captioning will, by far, get you the best results, and is the best option for truly accessible video content. However, if you’re operating on a shoestring budget, there are some inexpensive alternatives which will get you at least part way.

AI captioning services

Audio-to-text platforms like Thisten will allow you to upload an audio or video file — or grant access to your mic for live recordings — and will generate automated transcripts with purportedly 95%. We don’t have any personal experience with these services so can’t vouch for this. Thisten doesn’t export captions in VTT format, so you’ll need to use a subtitle converter tool to finish the job if you go this route.

YouTube hack

If you need captions for VoD content only (not a livestream), then this trick has the enticing benefit of being totally free. However, the drawbacks are that it’s slow and the captions aren’t super reliable unless you do some manual editing, which we’d definitely recommend. It’s the technique that we currently use for our weekly show Ayo!, which we run on a limited budget of €0, because it’s a free event.

Here’s how it works:

  • Upload your video to YouTube (and set it to private if you don’t want the general public to have access to it).
  • Wait a while — typically overnight — for YouTube to generate the automated captions. The actual time depends on the length of your video.
  • Click to edit your video (under Channel content) and then on Subtitles in the left-hand menu.
  • Click to “Duplicate and Edit” the automatic subtitles and the text will open in a new editor.
  • At this point, it’s a good idea to copy the text into your word processor of choice so it’s easier to work with.
  • Now it’s time to edit. Everything will be lowercase, so we recommend going through manually to tidy up the transcript, correct any errors and add in punctuation and capitalization. You may use Find and Replace to make changes in bulk.
  • When you’re happy with the captions, copy and paste them back into the YouTube editor and hit publish. This will automatically sync timings, which is pretty magical!
  • Back on the menu screen, click on the three dots menu, then on Download, and then on .vtt.
  • Voila! You have a caption file that you can upload into Vito.
User interface screenshot from YouTube showing their Video subtitles screen

As this post has shown, captions make your video content more accessible to a range of people, and there are multiple ways to generate them depending on your budget. We hope you found this rundown useful, and encourage you to share any captioning tips you have with us at @vitocommunity on Twitter.