Build a Secure AI API Gateway with Django for WordPress (JWT, Rate Limits, Cost Tracking)

Overview
This tutorial walks through implementing a secure AI API gateway in Django that your WordPress sites can call. It handles JWT auth, per-site rate limits with Redis, provider failover, streaming, and cost tracking. You’ll get a minimal WordPress client to invoke the gateway without exposing raw provider keys.

Architecture
– WordPress site(s) -> your Django Gateway (JWT) -> LLM provider(s)
– Redis for rate limiting and idempotency
– PostgreSQL for tenants, usage, and audit logs
– Optional signed webhooks from Gateway back to WordPress for async tasks

Prerequisites
– Python 3.11+, Django 5.x, djangorestframework
– Redis, Postgres
– Provider keys (e.g., OpenAI/Azure OpenAI/Anthropic)
– A domain with HTTPS (e.g., api.example.com)
– WordPress with REST API enabled

Django project setup
– django-admin startproject ai_gateway
– pip install djangorestframework PyJWT redis httpx pydantic python-dotenv
– Add rest_framework to INSTALLED_APPS

Models (ai/models.py)
from django.db import models

class Tenant(models.Model):
name = models.CharField(max_length=120)
slug = models.SlugField(unique=True)
jwt_secret = models.CharField(max_length=128) # per-tenant JWT secret
rate_limit_rpm = models.IntegerField(default=60)
rate_limit_rpd = models.IntegerField(default=5000)
active = models.BooleanField(default=True)

class UsageEvent(models.Model):
tenant = models.ForeignKey(Tenant, on_delete=models.CASCADE)
provider = models.CharField(max_length=32)
model = models.CharField(max_length=64)
input_tokens = models.IntegerField(default=0)
output_tokens = models.IntegerField(default=0)
cost_usd = models.DecimalField(max_digits=10, decimal_places=6, default=0)
request_id = models.CharField(max_length=64, db_index=True)
created_at = models.DateTimeField(auto_now_add=True)
status = models.CharField(max_length=16, default=’ok’) # ok|error|timeout

Settings (ai_gateway/settings.py)
– Configure DATABASES and CACHES (Redis)
– Add env flags
OPENAI_API_KEY=…
ANTHROPIC_API_KEY=…
DEFAULT_PROVIDER=openai
STREAM_DEFAULT=true

JWT auth (ai/auth.py)
import time, jwt
from django.http import JsonResponse
from .models import Tenant

def authenticate(request):
auth = request.headers.get(‘Authorization’, ”)
if not auth.startswith(‘Bearer ‘):
return None, JsonResponse({‘error’:’missing_bearer’}, status=401)
token = auth.split(‘ ‘)[1]
try:
# Decode without secret to get tenant slug
unverified = jwt.decode(token, options={“verify_signature”: False}, algorithms=[‘HS256’])
slug = unverified.get(‘sub’)
t = Tenant.objects.get(slug=slug, active=True)
payload = jwt.decode(token, t.jwt_secret, algorithms=[‘HS256’])
if payload.get(‘exp’, 0) tenant.rate_limit_rpm: return False, ‘rate_limited_minute’
if d_count > tenant.rate_limit_rpd: return False, ‘rate_limited_day’
return True, None

Provider proxy (ai/providers.py)
import httpx, os

class ProviderError(Exception): pass

async def call_openai(payload, stream=False):
headers = {“Authorization”: f”Bearer {os.environ[‘OPENAI_API_KEY’]}”}
url = “https://api.openai.com/v1/chat/completions”
async with httpx.AsyncClient(timeout=30) as client:
resp = await client.post(url, headers=headers, json=payload)
if resp.status_code >= 400:
raise ProviderError(resp.text)
return resp

async def call_anthropic(payload, stream=False):
headers = {“x-api-key”: os.environ[‘ANTHROPIC_API_KEY’], “anthropic-version”:”2023-06-01″}
url = “https://api.anthropic.com/v1/messages”
async with httpx.AsyncClient(timeout=30) as client:
resp = await client.post(url, headers=headers, json=payload)
if resp.status_code >= 400:
raise ProviderError(resp.text)
return resp

async def provider_call(provider, payload, stream=False):
if provider == ‘openai’: return await call_openai(payload, stream)
if provider == ‘anthropic’: return await call_anthropic(payload, stream)
raise ProviderError(‘unsupported_provider’)

Cost estimation (ai/costs.py)
# Simple example; update with current pricing
PRICES = {
(‘openai’,’gpt-4o-mini’): (0.00015, 0.0006), # input, output per 1k tokens
(‘anthropic’,’claude-3-haiku’): (0.00025, 0.00125),
}

def estimate_cost(provider, model, in_toks, out_toks):
key = (provider, model)
if key not in PRICES: return 0.0
inp, outp = PRICES[key]
return round((in_toks/1000.0)*inp + (out_toks/1000.0)*outp, 6)

Views (ai/views.py)
import json, uuid, asyncio
from django.views import View
from django.http import JsonResponse, StreamingHttpResponse
from .auth import authenticate
from .limits import check_limits
from .models import UsageEvent
from .providers import provider_call, ProviderError
from .costs import estimate_cost

class ChatView(View):
async def post(self, request):
tenant, err = authenticate(request)
if err: return err
ok, reason = check_limits(tenant, ‘chat’)
if not ok: return JsonResponse({‘error’:reason}, status=429)

body = json.loads(request.body.decode(‘utf-8’))
provider = body.get(‘provider’, ‘openai’)
model = body.get(‘model’, ‘gpt-4o-mini’)
messages = body.get(‘messages’, [])
stream = bool(body.get(‘stream’, False))
request_id = body.get(‘request_id’) or str(uuid.uuid4())

payload = {‘model’: model, ‘messages’: messages, ‘stream’: stream}

try:
resp = await provider_call(provider, payload, stream=stream)
data = resp.json()
# Token counts: adapt to provider response
in_toks = data.get(‘usage’, {}).get(‘prompt_tokens’, 0)
out_toks = data.get(‘usage’, {}).get(‘completion_tokens’, 0)
cost = estimate_cost(provider, model, in_toks, out_toks)
UsageEvent.objects.create(
tenant=tenant, provider=provider, model=model,
input_tokens=in_toks, output_tokens=out_toks,
cost_usd=cost, request_id=request_id, status=’ok’
)
return JsonResponse({‘id’: request_id, ‘provider’: provider, ‘model’: model, ‘data’: data})
except ProviderError as e:
UsageEvent.objects.create(
tenant=tenant, provider=provider, model=model,
input_tokens=0, output_tokens=0, cost_usd=0, request_id=request_id, status=’error’
)
return JsonResponse({‘error’:’provider_error’,’detail’:str(e)}, status=502)
except Exception as e:
return JsonResponse({‘error’:’gateway_error’,’detail’:str(e)}, status=500)

URLs (ai/urls.py)
from django.urls import path
from .views import ChatView
urlpatterns = [ path(‘v1/chat’, ChatView.as_view(), name=’chat’) ]

Project urls (ai_gateway/urls.py)
from django.urls import path, include
urlpatterns = [ path(‘api/’, include(‘ai.urls’)) ]

Create a tenant (Django shell)
from ai.models import Tenant
import secrets
Tenant.objects.create(name=’My WP Site’, slug=’mysite’, jwt_secret=secrets.token_urlsafe(48), rate_limit_rpm=30, rate_limit_rpd=3000)

Issue a JWT (server-side script)
import jwt, time
slug=’mysite’; secret=’PASTE_TENANT_SECRET’
token = jwt.encode({‘sub’: slug, ‘exp’: int(time.time())+3600}, secret, algorithm=’HS256′)
print(token)

Test with curl
curl -X POST https://api.example.com/api/v1/chat
-H “Authorization: Bearer YOUR_JWT”
-H “Content-Type: application/json”
-d ‘{“provider”:”openai”,”model”:”gpt-4o-mini”,”messages”:[{“role”:”user”,”content”:”Hello”}]}’

WordPress minimal client (functions.php or small plugin)
function aigw_chat($prompt) {
$endpoint = ‘https://api.example.com/api/v1/chat’;
$token = getenv(‘AIGW_JWT’); // or store in wp_options securely
$body = array(
‘provider’ => ‘openai’,
‘model’ => ‘gpt-4o-mini’,
‘messages’ => array(
array(‘role’ => ‘system’, ‘content’ => ‘You are a helpful assistant.’),
array(‘role’ => ‘user’, ‘content’ => $prompt),
),
‘request_id’ => wp_generate_uuid4()
);
$resp = wp_remote_post($endpoint, array(
‘headers’ => array(‘Authorization’ => ‘Bearer ‘ . $token, ‘Content-Type’ => ‘application/json’),
‘body’ => wp_json_encode($body),
‘timeout’ => 30,
));
if (is_wp_error($resp)) return ‘Gateway error.’;
$code = wp_remote_retrieve_response_code($resp);
$json = json_decode(wp_remote_retrieve_body($resp), true);
if ($code !== 200) return ‘Error: ‘ . sanitize_text_field($json[‘error’] ?? ‘unknown’);
// Extract text depending on provider shape (example for OpenAI)
$msg = $json[‘data’][‘choices’][0][‘message’][‘content’] ?? ”;
return wp_kses_post($msg);
}

Add shortcode
add_shortcode(‘aigw’, function($atts, $content=”){
$prompt = $content ?: ($atts[‘q’] ?? ‘Say hi.’);
return aigw_chat($prompt);
});

Usage in WordPress
[aigw]Write a 1-sentence welcome message for new subscribers.[/aigw]

Optional: signed webhooks to WordPress
– For long tasks, post results back to WP with an HMAC header.
– WordPress verifies signature before saving.

Django webhook signer (utils)
import hmac, hashlib, base64, os
WEBHOOK_SECRET = os.environ.get(‘WEBHOOK_SECRET’,”)

def sign_body(body_bytes):
sig = hmac.new(WEBHOOK_SECRET.encode(), body_bytes, hashlib.sha256).digest()
return ‘sha256=’ + base64.b64encode(sig).decode()

WP verify webhook (endpoint handler)
$raw = file_get_contents(‘php://input’);
$sig = $_SERVER[‘HTTP_X_SIG’] ?? ”;
$calc = ‘sha256=’ . base64_encode(hash_hmac(‘sha256’, $raw, getenv(‘AIGW_WEBHOOK_SECRET’), true));
if (!hash_equals($calc, $sig)) { status_header(401); exit; }

Operational guidance
– Run Django with ASGI (uvicorn/daphne) behind Nginx. Enable HTTPS.
– Store provider keys and JWT secrets in env vars or a secrets manager.
– Use Redis for limits and request dedupe. Add a 60s idempotency key on request_id.
– Log all 4xx/5xx with request_id. Ship logs to your SIEM.
– Monitor cost by aggregating UsageEvent daily. Alert on anomalies.
– Backoff and failover: if provider_error, retry with alternate provider/model when safe.
– Implement per-tenant model allowlist if needed.

Performance tips
– Reuse HTTP clients when streaming; consider httpx.AsyncClient lifespan.
– Compress responses via Nginx gzip. Set reasonable timeouts.
– Cache static system prompts by hash to reduce payload size.
– For high volume, move UsageEvent writes to a queue (Celery) with buffered inserts.

Security checklist
– Per-tenant JWT secrets, short token TTLs.
– CORS locked to your WP origins.
– Validate model names against a whitelist.
– Strip PII if relaying user content to third parties where required.
– Rotate secrets regularly and audit admin access.

Next steps
– Add embeddings and RAG endpoints.
– Implement function/tool calling with a registry and execution sandbox.
– Expose batch endpoints with async job status and webhooks.

AI Guy in LA