Skip to main content

Cost & Performance Optimization

Tips

  1. Use streaming for long responses
  2. Implement caching where appropriate
  3. Monitor token usage
  4. Set clear maximum token limits

Example Implementation

Here's a basic example using Python requests and caching:

import requests
from cachetools import TTLCache

# Create cache with 100 items max and 1 hour TTL
cache = TTLCache(maxsize=100, ttl=3600)

# Headers configuration
headers = {
"Content-Type": "application/json",
"X-API-KEY": "your_api_key"
}

def get_response(prompt):
# Check cache first
if prompt in cache:
return cache[prompt]

# Make API request
response = requests.post(
"https://mintii-router-500540193826.us-central1.run.app/route/mintiiv0",
headers=headers,
json={
"prompt": prompt,
"max_tokens": 500
}
)

# Store in cache and return
result = response.json()
cache[prompt] = result
return result

Best Practices

Token Management

  • Set appropriate max_tokens limits
  • Monitor token usage through response metrics
  • Implement rate limiting if needed

Caching Strategy

  • Cache common queries
  • Set reasonable TTL (Time To Live)
  • Clear cache periodically

Response Optimization

  • Use concise prompts
  • Request only needed information
  • Handle streaming responses efficiently