How does YouTube transcription work?

Our system collects available transcriptions and uses it as metada, as well as to enrich other keywords' mentions in the database. Read on to see the details.

YouTube automatically generates captions for videos using speech recognition technology.

This feature is available for videos in various languages, though the accuracy can vary based on the clarity of speech, background noise, and the language spoken. YouTube provides tools for creators to edit automatic captions. This allows for the correction of any inaccuracies that the automatic system might have introduced.

Not all videos have automatic captions. The availability depends on the video's length, the clarity of the audio, and the language spoken.

Video creators can also upload their own caption files, ensuring higher accuracy and better representation of the spoken content.


How can I channel this data into Neticle Media Intelligence?


Whenever a new YouTube video is collected as a mention, the system checks to see if any transcription is available on it. If so, the transcription is collected as metadata to the video and stored as such. This helps the system to allocate mentions to other keywords in case they are mentioned in the transcription. This means that NO YouTube video is collected solely based on its transcription (but the title and the description of the video), but once it is collected, the transcription is used to enrich NMI’s online database by assigning one mention to multiple keywords.


Important information: if a mention does not have transcription at the time of collecting it, the transcription will not be collected as metadata, even if a transcription is added later on. In case a video is deleted and re-uploaded with transcription, the transcription will be collected.


When adding a transcription to a mention, the system prioritizes collection in the following order (set by our team):

  1. First, check if the author of the video has uploaded a manually created transcription. If yes, that will be added to the mention (as that is supposed to be the most exact version).
  2. Secondly, YouTube’s AI generated transcriptions are considered.
  3. If there is no manually or AI-created transcription, NMI checks if there is an English version available (created by anyone).
  4. If none of these are available, nothing will be collected.