Small Language Models on CM4 locally

I have been wondering whether it would be possible to run some local language models on uConsole. But I never thought it would be possible as I have tried Llama2 7b on an Intel N95 device last semester as school project and it was not ideal.

Luckily, I came across this video comparing Pi 4 with Pi 5 running Llama and realized that the newer Llama 3.2:1b takes significantly less resources to run.

Then I tried it and it works. About 2 to 3 tokens per second, which is usable.

But CM5 would be so much better so I am looking forward to it down the way.

I also have tried Phi3 and Llama3.2:3b. They are usable as well, but you would have to wait a significantly longer time, like 1 to 1.5 tokens per second.

2 Likes