Cloudflare Workers AI β Llama 3.3 70B Now Generally Available
Cloudflare has promoted Metaβs Llama 3.3 70B Instruct model to GA on Workers AI. Customers can hit it via the OpenAI-compatible /v1/chat/completions endpoint, with no rate-limit changes announced.
Why it matters
- 70B-class inference at the edge, billed per token
- Drop-in OpenAI compatibility simplifies code migration
- Pairs well with Vectorize for retrieval-augmented flows
Caveats
- Cold start on first request to a region can spike to ~3s
- 8K context window β still small for long-doc pipelines