Benchmarks
Benchmarks for LiteLLM Gateway (Proxy Server) tested against a fake OpenAI endpoint.
Use this config for testing:
Note:  we're currently migrating to aiohttp which has 10x higher throughput. We recommend using the aiohttp_openai/ provider for load testing.
model_list:
  - model_name: "fake-openai-endpoint"
    litellm_params:
      model: aiohttp_openai/any
      api_base: https://your-fake-openai-endpoint.com/chat/completions
      api_key: "test"
1 Instance LiteLLM Proxy​
In these tests the median latency of directly calling the fake-openai-endpoint is 60ms.
| Metric | Litellm Proxy (1 Instance) | 
|---|---|
| RPS | 475 | 
| Median Latency (ms) | 100 | 
| Latency overhead added by LiteLLM Proxy | 40ms | 
Key Findings​
- Single instance: 475 RPS @ 100ms latency
- 2 LiteLLM instances: 950 RPS @ 100ms latency
- 4 LiteLLM instances: 1900 RPS @ 100ms latency
2 Instances​
Adding 1 instance, will double the RPS and maintain the 100ms-110ms median latency.
| Metric | Litellm Proxy (2 Instances) | 
|---|---|
| Median Latency (ms) | 100 | 
| RPS | 950 | 
Machine Spec used for testing​
Each machine deploying LiteLLM had the following specs:
- 2 CPU
- 4GB RAM
Logging Callbacks​
GCS Bucket Logging​
Using GCS Bucket has no impact on latency, RPS compared to Basic Litellm Proxy
| Metric | Basic Litellm Proxy | LiteLLM Proxy with GCS Bucket Logging | 
|---|---|---|
| RPS | 1133.2 | 1137.3 | 
| Median Latency (ms) | 140 | 138 | 
LangSmith logging​
Using LangSmith has no impact on latency, RPS compared to Basic Litellm Proxy
| Metric | Basic Litellm Proxy | LiteLLM Proxy with LangSmith | 
|---|---|---|
| RPS | 1133.2 | 1135 | 
| Median Latency (ms) | 140 | 132 | 
Locust Settings​
- 2500 Users
- 100 user Ramp Up