vLLM supports generative and pooling models across various tasks. If a model supports more than one task, you can set the task via the --task argument. For each task, we list the model architectures ...
The main goal of llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud. Since its inception, the project has ...