Cloudflare Workers AI β€” Llama 3.3 70B Now Generally Available

Cloudflare has promoted Meta’s Llama 3.3 70B Instruct model to GA on Workers AI. Customers can hit it via the OpenAI-compatible /v1/chat/completions endpoint, with no rate-limit changes announced.

Why it matters

  • 70B-class inference at the edge, billed per token
  • Drop-in OpenAI compatibility simplifies code migration
  • Pairs well with Vectorize for retrieval-augmented flows

Caveats

  • Cold start on first request to a region can spike to ~3s
  • 8K context window β€” still small for long-doc pipelines