Cost & Performance Optimization
Tips
- Use streaming for long responses
- Implement caching where appropriate
- Monitor token usage
- Set clear maximum token limits
Example Implementation
Here's a basic example using Python requests and caching:
import requests
from cachetools import TTLCache
# Create cache with 100 items max and 1 hour TTL
cache = TTLCache(maxsize=100, ttl=3600)
# Headers configuration
headers = {
"Content-Type": "application/json",
"X-API-KEY": "your_api_key"
}
def get_response(prompt):
# Check cache first
if prompt in cache:
return cache[prompt]
# Make API request
response = requests.post(
"https://mintii-router-500540193826.us-central1.run.app/route/mintiiv0",
headers=headers,
json={
"prompt": prompt,
"max_tokens": 500
}
)
# Store in cache and return
result = response.json()
cache[prompt] = result
return result
Best Practices
Token Management
- Set appropriate
max_tokens
limits - Monitor token usage through response metrics
- Implement rate limiting if needed
Caching Strategy
- Cache common queries
- Set reasonable TTL (Time To Live)
- Clear cache periodically
Response Optimization
- Use concise prompts
- Request only needed information
- Handle streaming responses efficiently