Tag: ml-engineering

All the articles with the tag "ml-engineering".

How We Cut ML Inference Latency by 40% on Kubernetes

31 May, 2023

The architecture behind our async model serving platform at Instabase — async workers, RabbitMQ, multi-level caching, and sticky routing to cut inference time by 40%.