So far, running LLMs has required a large amount of computing resources, mainly GPUs. Running locally, a simple prompt with a typical LLM takes on an average Mac ...
💪 FP8 compatibility ! 🚀 Speed Up all Process 🚀 less VRAM consumption (Stay high, batch_size=1 for RTX4090 max, I'm trying to fix that) 🛠️ Better benchmark coming soon ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results
Feedback