Does anyone understand this claim from the press release? > M4 Max supports up t...

kristianp · on Oct 30, 2024

q4 or q5 quantization.

Edit: Actually you'd want q3 to fit a 200B model into 128GB of RAM. e.g. this one is about 140GB https://huggingface.co/lmstudio-community/DeepSeek-V2.5-GGUF...

cjbprime · on Oct 31, 2024

Wouldn't it be incredibly misleading to say you can interact with an LLM, when they really mean that you can lossy-compress it to like 25% size where it becomes way less useful and then interact with that?

(Isn't that kind of like saying you can do real-time 4k encoding when you actually mean it can do real-time 720p encoding and then interpolate the missing pixels?)

kristianp · on Nov 1, 2024

Yes the size is much reduced, and you do have reduced quality as a result, but it isn't as bad as what you're implying. Just a few days ago Meta released q4 versions of their llama models. It's an active research topic.