Thanks to the recent innovations with LLM, and the fantastic work of justine.lol, we now have open source large language model to play with. While those models have no logic, their capacity to compress language is useful for summarizing.
In this post, we will see some bash functions to transform a youtube video into a summary that can quickly be read to get the gist of what is happening.
These functions can also be used to summarize any content that can be converted to raw txt format.
Download video transcripts
The model deals with text, so we download the video transcripts using yt-dlp, an open source command line interface to youtube and various other video providers; then do some cleaning with ex.
function download_video_transcript() {
cd "$(mktemp -d)" || return
yt-dlp --write-auto-sub --skip-download "$1"
cat <<EOF >process.vim
silent! 1,3d
silent! g/<\/c>/d
silent! g/-->/d
silent! g/^ *$/d
silent! %!uniq
%p
quit!
EOF
ex -s -c "source process.vim" *.vtt >clean_transcript.txt
summarize-txt clean_transcript.txt
}
Note the option --write-auto-sub which downloads the automatically generated subtitles. One could also use --write-sub which downloads the authors subtitles, but some videos lack subtitles so I found this script more robust by always downloading auto-generated sub. See this page for more subtitles options.
Iteratively reduce the transcript in smaller summaries
The number of token a model can ingest is limited: if the transcript is too long, the model fails. To avoid so, we ask the model to iteratively reduce parts of the transcript and assemble a summary. The whole process is repeated as necessary.
function summarize-txt() {
cat <<EOF >process2.vim
silent! 1,$join
w
quit!
EOF
cp "$1" summary.txt
while true; do
size=$(wc -c summary.txt | awk '{print $1}')
if [ "$size" -lt 1000 ]; then
break
fi
split -b 10000 summary.txt
parts=(x*)
rm summary.txt
pwd
echo "# Summarizing ${size} bytes, split in ${#parts[@]} parts" | tee -a process.txt
for part in "${parts[@]}"; do
ex -s -c "source process2.vim" "$part"
echo "## Summarizing ${part}" | tee -a process.txt
summarize-txt-once-api "$part" | tee -a summary.txt | tee -a process.txt
rm "$part"
done
done
}Summarize a text via llamafile api
In order to summarize a text via api, we use a combination of jq and curl
function summarize-txt-once-api() {
echo '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are a summarization assistant, skilled in summarizing any complex text that the user sends with precise and terse output."
},
{
"role": "user",
"content": ""
}
]
}' >content.json
jq --rawfile a "$1" '.messages[1].content = ($a)' content.json >content_updated.json
curl -s http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d @content_updated.json | jq -r ".choices[0].message.content"
}Download and start a llamafile model
Head over to justine.lol’s huggingface profile and dowload some llamafile. As of 2024-04, Mixtral yields very good results on a macbook M2 laptop.
There are various ways to invoke the model as a bash command, starting and stopping the model when it is done. While it is convenient for one-off tasks, summarizing a long text might take longer, so we want to start the model as a server to only load the model once and share it for all concurrent queries. To do so we will follow later instructions from the author’s blog:
wget https://huggingface.co/jartine/Mixtral-8x7B-v0.1.llamafile/resolve/main/mixtral-8x7b-instruct-v0.1.Q5_K_M-server.llamafile
chmod +x mixtral-8x7b-instruct-v0.1.Q5_K_M-server.llamafile
./mixtral-8x7b-instruct-v0.1.Q5_K_M-server.llamafile --nobrowserConclusion
By the end of this blog, you should be able to run:
download_video_transcript "ytsearch:the PARA method for organizing one self"and get a small summary.txt file that contains the gist of the video.
If that video was very rich, the file limit might have forced the model to omit important key results from the video. At any rate, you can always inspect the file process.txt that contains earlier and bigger versions of the summary file.