vllm文档(翻译)服务使用NVIDIA Triton部署使用NVIDIA Triton部署 Triton Inference Server 有个用vLLM部署facebook/opt-125m 的教程. 请参考 https://github.com/triton-inference-server/tutorials/blob/main/Quick_Deploy/vLLM/README.md#deploying-a-vllm-model-in-triton