Cost & Performance Optimization

Tips

Use streaming for long responses
Implement caching where appropriate
Monitor token usage
Set clear maximum token limits

Example Implementation

Here's a basic example using Python requests and caching:

import requests
from cachetools import TTLCache

# Create cache with 100 items max and 1 hour TTL
cache = TTLCache(maxsize=100, ttl=3600)

# Headers configuration
headers = {
    "Content-Type": "application/json",
    "X-API-KEY": "your_api_key"
}

def get_response(prompt):
    # Check cache first
    if prompt in cache:
        return cache[prompt]
        
    # Make API request
    response = requests.post(
        "https://mintii-router-500540193826.us-central1.run.app/route/mintiiv0",
        headers=headers,
        json={
            "prompt": prompt,
            "max_tokens": 500
        }
    )
    
    # Store in cache and return
    result = response.json()
    cache[prompt] = result
    return result

Best Practices

Token Management

Set appropriate max_tokens limits
Monitor token usage through response metrics
Implement rate limiting if needed

Caching Strategy

Cache common queries
Set reasonable TTL (Time To Live)
Clear cache periodically

Response Optimization

Use concise prompts
Request only needed information
Handle streaming responses efficiently

Tips​

Example Implementation​

Best Practices​

Token Management​

Caching Strategy​

Response Optimization​

Tips

Example Implementation

Best Practices

Token Management

Caching Strategy

Response Optimization