← Back to Index

Contents

Summarize youtube videos from the command line tool

Thanks to the recent innovations with LLM, and the fantastic work of justine.lol, we now have open source large language model to play with. While those models have no logic, their capacity to compress language is useful for summarizing.

In this post, we will see some bash functions to transform a youtube video into a summary that can quickly be read to get the gist of what is happening.

These functions can also be used to summarize any content that can be converted to raw txt format.

Download video transcripts

The model deals with text, so we download the video transcripts using yt-dlp, an open source command line interface to youtube and various other video providers; then do some cleaning with ex.

function download_video_transcript() {
 cd "$(mktemp -d)" || return
 yt-dlp --write-auto-sub --skip-download "$1"
 cat <<EOF >process.vim
silent! 1,3d
silent! g/<\/c>/d
silent! g/-->/d
silent! g/^ *$/d
silent! %!uniq
%p
quit!
EOF
 ex -s -c "source process.vim" *.vtt >clean_transcript.txt
 summarize-txt clean_transcript.txt
}

Note the option --write-auto-sub which downloads the automatically generated subtitles. One could also use --write-sub which downloads the authors subtitles, but some videos lack subtitles so I found this script more robust by always downloading auto-generated sub. See this page for more subtitles options.

Iteratively reduce the transcript in smaller summaries

The number of token a model can ingest is limited: if the transcript is too long, the model fails. To avoid so, we ask the model to iteratively reduce parts of the transcript and assemble a summary. The whole process is repeated as necessary.

function summarize-txt() {
 cat <<EOF >process2.vim
silent! 1,$join
w
quit!
EOF
 cp "$1" summary.txt
 while true; do
 size=$(wc -c summary.txt | awk '{print $1}')
 if [ "$size" -lt 1000 ]; then
 break
 fi
 split -b 10000 summary.txt
 parts=(x*)
 rm summary.txt
 pwd
 echo "# Summarizing ${size} bytes, split in ${#parts[@]} parts" | tee -a process.txt
 for part in "${parts[@]}"; do
 ex -s -c "source process2.vim" "$part"
 echo "## Summarizing ${part}" | tee -a process.txt
 summarize-txt-once-api "$part" | tee -a summary.txt | tee -a process.txt
 rm "$part"
 done
 done
}

Summarize a text via llamafile api

In order to summarize a text via api, we use a combination of jq and curl

function summarize-txt-once-api() {
 echo '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "system",
      "content": "You are a summarization assistant, skilled in summarizing any complex text that the user sends with precise and terse output."
    },
    {
      "role": "user",
      "content": ""
    }
  ]
}' >content.json

 jq --rawfile a "$1" '.messages[1].content = ($a)' content.json >content_updated.json
 curl -s http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d @content_updated.json | jq -r ".choices[0].message.content"
}

Download and start a llamafile model

Head over to justine.lol’s huggingface profile and dowload some llamafile. As of 2024-04, Mixtral yields very good results on a macbook M2 laptop.

There are various ways to invoke the model as a bash command, starting and stopping the model when it is done.

wget https://huggingface.co/jartine/Mixtral-8x7B-v0.1.llamafile/resolve/main/mixtral-8x7b-instruct-v0.1.Q5_K_M-server.llamafile
chmod +x mixtral-8x7b-instruct-v0.1.Q5_K_M-server.llamafile
./mixtral-8x7b-instruct-v0.1.Q5_K_M-server.llamafile --nobrowser

Conclusion

By the end of this blog, you should be able to run:

download_video_transcript "ytsearch:the PARA method for organizing one self"

and get a small summary.txt file that contains the gist of the video.

If that video was very rich, the file limit might have forced the model to omit important key results from the video. At any rate, you can always inspect the file process.txt that contains earlier and bigger versions of the summary file.