A production-grade pattern for AI endpoints in WordPress (secure proxy, caching, rate limits, observability)

This post shows a real, production-ready pattern for running AI inference from WordPress without exposing API keys to the browser. We’ll build a secure REST endpoint, add caching and rate limits, handle retries and webhooks, and log everything for observability.

Use case examples:
– Generate product descriptions or summaries from authenticated admin screens
– Enrich form submissions (classify, route, extract fields)
– Power a custom block or dashboard tool that fetches AI results server-side

High-level architecture
– Frontend (block/admin page) → WP REST endpoint (server) → AI provider (OpenAI/Anthropic/etc.)
– Caching at the WordPress layer (transient or object cache)
– Rate limiting per user/site to prevent abuse
– Background jobs for long-running tasks via Action Scheduler
– Optional webhooks from AI provider back to WordPress
– Audit logs stored in a custom table with PII minimization

Prerequisites
– WordPress 6.4+
– PHP 8.1+
– Persistent object cache (Redis/Memcached) recommended
– Action Scheduler plugin or equivalent job runner
– Environment configuration for secrets (wp-config.php or environment variables)

1) Minimal plugin scaffold
Create wp-content/plugins/ai-secure-proxy/ai-secure-proxy.php

prefix . ‘aisp_logs’;
$charset = $wpdb->get_charset_collate();
$sql = “CREATE TABLE IF NOT EXISTS $table (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
user_id BIGINT UNSIGNED NULL,
route VARCHAR(191) NOT NULL,
req_hash CHAR(64) NOT NULL,
tokens_in INT UNSIGNED DEFAULT 0,
tokens_out INT UNSIGNED DEFAULT 0,
duration_ms INT UNSIGNED DEFAULT 0,
status_code INT DEFAULT 0,
error TEXT NULL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
) $charset;”;
require_once ABSPATH . ‘wp-admin/includes/upgrade.php’;
dbDelta($sql);
}

public function register_routes() {
register_rest_route(self::NAMESPACE, ‘/infer’, [
‘methods’ => WP_REST_Server::CREATABLE,
‘permission_callback’ => [$this, ‘can_access’],
‘callback’ => [$this, ‘handle_infer’],
‘args’ => [
‘prompt’ => [‘required’ => true, ‘type’ => ‘string’, ‘minLength’ => 1, ‘maxLength’ => 8000],
‘model’ => [‘required’ => false, ‘type’ => ‘string’, ‘default’ => ‘gpt-4o-mini’],
‘cache’ => [‘required’ => false, ‘type’ => ‘boolean’, ‘default’ => true]
]
]);

register_rest_route(self::NAMESPACE, ‘/webhook’, [
‘methods’ => WP_REST_Server::CREATABLE,
‘permission_callback’ => ‘__return_true’,
‘callback’ => [$this, ‘handle_webhook’],
]);
}

public function can_access(WP_REST_Request $req) {
// Require logged-in user with capability. For public use, implement signed HMAC header instead.
return current_user_can(self::CAPABILITY);
}

private function rate_key($user_id) {
return self::OPT_PREFIX . ‘rate_’ . $user_id . ‘_’ . gmdate(‘YmdH’);
}

private function is_rate_limited($user_id) {
$key = $this->rate_key($user_id);
$count = (int) wp_cache_get($key, ”, false, $found);
if (!$found) $count = (int) get_option($key, 0);
return $count >= self::RATE_LIMIT;
}

private function bump_rate($user_id) {
$key = $this->rate_key($user_id);
$count = (int) get_option($key, 0) + 1;
update_option($key, $count, false);
wp_cache_set($key, $count, ”, 3600);
}

public function handle_infer(WP_REST_Request $req) {
$user_id = get_current_user_id();
if ($this->is_rate_limited($user_id)) {
return new WP_Error(‘rate_limited’, ‘Rate limit exceeded’, [‘status’ => 429]);
}

$prompt = trim($req->get_param(‘prompt’));
$model = sanitize_text_field($req->get_param(‘model’));
$use_cache = (bool) $req->get_param(‘cache’);

// Hash to cache against normalized inputs
$req_hash = hash(‘sha256’, json_encode([‘m’=>$model, ‘p’=>$prompt]));

$cache_key = self::OPT_PREFIX . ‘c_’ . $req_hash;
if ($use_cache) {
$cached = wp_cache_get($cache_key);
if ($cached !== false) {
$this->log($user_id, ‘/infer’, $req_hash, 0, 0, 1, 200, null);
return new WP_REST_Response([‘cached’ => true, ‘result’ => $cached], 200);
}
}

$start = microtime(true);
$resp = $this->call_ai_provider($model, $prompt);
$duration_ms = (int) round((microtime(true) – $start) * 1000);

if (is_wp_error($resp)) {
$this->log($user_id, ‘/infer’, $req_hash, 0, 0, $duration_ms, 500, $resp->get_error_message());
return $resp;
}

$result = [
‘text’ => $resp[‘text’] ?? ”,
‘tokens_in’ => $resp[‘tokens_in’] ?? 0,
‘tokens_out’ => $resp[‘tokens_out’] ?? 0,
‘model’ => $model,
];

if ($use_cache && !empty($result[‘text’])) {
wp_cache_set($cache_key, $result, ”, self::CACHE_TTL);
}

$this->bump_rate($user_id);
$this->log($user_id, ‘/infer’, $req_hash, (int)$result[‘tokens_in’], (int)$result[‘tokens_out’], $duration_ms, 200, null);

return new WP_REST_Response([‘cached’ => false, ‘result’ => $result], 200);
}

private function call_ai_provider($model, $prompt) {
// Secrets via environment or wp-config
$api_key = getenv(‘OPENAI_API_KEY’) ?: (defined(‘OPENAI_API_KEY’) ? OPENAI_API_KEY : ”);
if (!$api_key) return new WP_Error(‘config_error’, ‘Missing AI API key’, [‘status’ => 500]);

$body = [
‘model’ => $model,
‘input’ => $prompt,
];

$attempts = 0;
$max_attempts = 3;
$last_err = null;

while ($attempts 15,
‘headers’ => [
‘Authorization’ => ‘Bearer ‘ . $api_key,
‘Content-Type’ => ‘application/json’
],
‘body’ => wp_json_encode($body),
]);

if (is_wp_error($response)) {
$last_err = $response;
} else {
$code = wp_remote_retrieve_response_code($response);
$data = json_decode(wp_remote_retrieve_body($response), true);

// Simple provider-agnostic parse
if ($code >= 200 && $code $text, ‘tokens_in’ => $tokens_in, ‘tokens_out’ => $tokens_out];
}

// Retry on 429/5xx with exponential backoff
if (in_array($code, [429, 500, 502, 503, 504], true)) {
$last_err = new WP_Error(‘ai_retry’, ‘Transient AI error: ‘ . $code);
} else {
return new WP_Error(‘ai_error’, ‘AI provider error: ‘ . $code, [‘status’ => $code, ‘data’ => $data]);
}
}

// Backoff
usleep((int) (pow(2, $attempts) * 200000)); // 200ms, 400ms, 800ms
}

return $last_err ?: new WP_Error(‘ai_error’, ‘Unknown AI error’, [‘status’ => 500]);
}

public function handle_webhook(WP_REST_Request $req) {
// Optional: verify HMAC signature from provider
$shared = getenv(‘AI_WEBHOOK_SECRET’) ?: ”;
$sig = $req->get_header(‘x-aisig’);
if ($shared && $sig) {
$calc = hash_hmac(‘sha256’, $req->get_body(), $shared);
if (!hash_equals($calc, $sig)) {
return new WP_Error(‘forbidden’, ‘Invalid signature’, [‘status’ => 403]);
}
}
// Process event (store job result, update post meta, etc.)
do_action(‘aisp_webhook_received’, $req->get_json_params());
return new WP_REST_Response([‘ok’ => true], 200);
}

private function log($user_id, $route, $req_hash, $tin, $tout, $dur_ms, $code, $error) {
global $wpdb;
$table = $wpdb->prefix . ‘aisp_logs’;
$wpdb->insert($table, [
‘user_id’ => $user_id ?: null,
‘route’ => $route,
‘req_hash’ => $req_hash,
‘tokens_in’ => $tin,
‘tokens_out’ => $tout,
‘duration_ms’ => $dur_ms,
‘status_code’ => $code,
‘error’ => $error,
], [‘%d’,’%s’,’%s’,’%d’,’%d’,’%d’,’%d’,’%s’]);
}
}
new AI_Secure_Proxy();

2) Secure frontend usage
– Admin page or block should never expose provider keys.
– Call the endpoint with wp.apiFetch or fetch, including nonce.
Example:

const res = await wp.apiFetch({
path: ‘/ai-secure-proxy/v1/infer’,
method: ‘POST’,
data: { prompt, model: ‘gpt-4o-mini’, cache: true }
});

3) Authentication options
– Logged-in only (capability-based): safest for back-office tools.
– Public usage: require HMAC-signed requests.
– Client sends X-Sign: HMAC_SHA256(body, PUBLIC_CLIENT_SECRET).
– Server validates before processing.
– Consider IP allowlists for server-to-server.

4) Caching strategy
– Key by normalized inputs and model.
– Use persistent object cache for speed and TTL-based eviction.
– Invalidate on model/version changes or when content sources update.

5) Rate limiting
– Example above uses per-user/hour counters.
– For high-traffic public endpoints, use Redis INCR with TTL or a token bucket.
– Return 429 with Retry-After.

6) Background jobs
– For long prompts or batch operations, offload to Action Scheduler:
– POST creates a job and returns job_id.
– Worker picks up job, calls AI, stores result in post meta or custom table.
– Optional webhook updates job status when provider supports async.

7) Webhooks
– Verify HMAC signature.
– Idempotency: store last seen event IDs to avoid duplicate processing.
– Enqueue work; do not block the webhook handler.

8) Observability
– Log request hash, tokens, duration, status code.
– Build admin UI with filters by user/date/status.
– Emit wp-json logs or use error_log for quick triage in staging.
– Add metrics: P95 latency, cache hit ratio, 4xx/5xx rates.

9) Security checklist
– Store API keys in environment or wp-config, never the DB.
– Enforce capability or signed requests.
– Validate and length-limit inputs.
– Strip PII where possible before sending to provider.
– Set conservative timeouts; implement retries with backoff.
– Disable indexing for admin tools; use HTTPS everywhere.

10) Performance notes
– Keep endpoint thin; avoid loading unnecessary WP subsystems.
– Enable OPcache and a persistent object cache.
– Batch multiple small prompts into one call when possible.
– Circuit breaker: if upstream is failing consistently, short-circuit for a cooldown period.

11) Deployment tips
– Separate staging keys and webhooks.
– Run a load test (k6/Locust) with realistic prompts.
– Monitor Redis hit ratio and PHP-FPM slow logs.
– Backup logs table and rotate old entries.

Extending the pattern
– Add SSE endpoint for streaming tokens to an admin tool.
– Implement content moderation via a pre-check model before saving results.
– Build a “dry run” mode for previews without committing changes.

This approach keeps AI logic server-side, with guardrails for cost, performance, and security—ready for production in WordPress environments.

AI Guy in LA