Phala Cloud Documentation — Confidential AI on TEE

Overview

On-demand Confidential AI API provides a secure, OpenAI-compatible interface for running AI models in TEE on GPU hardware. Pay per request with no infrastructure management. This enables developers to integrate AI applications with hardware-level privacy protection, ensuring user data remain confidential during inference. Browse available confidential AI models for your application. For dedicated GPU resources with hourly pricing, see Dedicated Models. Both options use the same API with identical features - the only difference is billing and resource allocation.

Prerequisites

Before you begin, ensure you have enough funds to get the API key. You need at least $5 in your account. Go to Dashboard and click Deposit to add funds. Navigate to Dashboard → Confidential AI API and click Enable. Then create your first API key and click the key to copy.

Once you get the API Key, you can start making requests to the Confidential AI API.

Make Your Secure Request

Replace <API_KEY> with your actual API key. The examples below use phala/qwen3.5-27b; use List Models to choose a model for your workload.

# Install OpenAI SDK: `pip3 install openai`

from openai import OpenAI

client = OpenAI(
    api_key="<API_KEY>",
    base_url="https://api.redpill.ai/v1",
)

response = client.chat.completions.create(
    model="phala/qwen3.5-27b",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "What is your model name?"},
    ],
)

print(response.choices[0].message.content)

Available Models

Confidential AI models are available through several GPU TEE providers. The live catalog is authoritative; query it before hardcoding model IDs:

curl https://api.redpill.ai/v1/models \
  -H "Authorization: Bearer <API_KEY>"

To list Phala-backed models only:

curl https://api.redpill.ai/v1/models/phala \
  -H "Authorization: Bearer <API_KEY>"

The following table reflects the current model families added in the RedPill model catalog update. Pricing and availability can change; use the API response for production routing.

Phala Provider

Model ID	Context	Modality	Pricing (input/output per 1M tokens)
`phala/qwen3.5-27b`	262K	Text	$0.30 /$ 2.40
`phala/qwen3-vl-30b-a3b-instruct`	128K	Vision + Text	$0.20 /$ 0.70
`qwen/qwen3-embedding-8b`	32K	Embeddings	$0.01 /$ 0
`phala/gemma-3-27b-it`	53K	Vision + Text	$0.11 /$ 0.40
`phala/glm-4.7-flash`	202K	Text	$0.10 /$ 0.43
`phala/gpt-oss-20b`	131K	Text	$0.04 /$ 0.15
`phala/qwen-2.5-7b-instruct`	32K	Text	$0.04 /$ 0.10
`phala/qwen2.5-vl-72b-instruct`	128K	Vision + Text	$0.40 /$ 1.20
`phala/uncensored-24b`	32K	Text	$0.20 /$ 0.90
`sentence-transformers/all-minilm-l6-v2`	512	Embeddings	$0.005 /$ 0

phala/qwen2.5-vl-72b-instruct is a legacy alias that may route to phala/qwen3-vl-30b-a3b-instruct. Prefer the canonical ID returned by /v1/models.

NearAI Provider

Model ID	Context	Modality	Pricing (input/output per 1M tokens)
`z-ai/glm-5`	203K	Text	$1.20 /$ 3.50
`deepseek/deepseek-chat-v3.1`	164K	Text	$1.05 /$ 3.10
`openai/gpt-oss-120b`	131K	Text	$0.10 /$ 0.49
`qwen/qwen3-30b-a3b-instruct-2507`	262K	Text	$0.15 /$ 0.55
`z-ai/glm-4.7`	131K	Text	$0.85 /$ 3.30

Chutes Provider

Model ID	Context	Modality	Pricing (input/output per 1M tokens)
`z-ai/glm-5.1`	203K	Text	$1.21 /$ 4.20
`moonshotai/kimi-k2.6`	262K	Text + Image	$1.09 /$ 4.60
`qwen/qwen3.5-397b-a17b`	262K	Text	$0.55 /$ 3.50
`qwen/qwen3-coder-next`	262K	Text	$0.18 /$ 1.20
`minimax/minimax-m2.5`	197K	Text	$0.20 /$ 1.38
`xiaomi/mimo-v2-flash`	262K	Text	$0.10 /$ 0.30
`deepseek/deepseek-v3.2`	164K	Text	$0.32 /$ 0.48
`moonshotai/kimi-k2.5`	262K	Text + Image	$0.60 /$ 3.00

Tinfoil Provider

Model ID	Context	Modality	Pricing (input/output per 1M tokens)
`qwen/qwen3-coder-480b-a35b-instruct`	262K	Text	$2.00 /$ 2.00
`moonshotai/kimi-k2-thinking`	262K	Text	$2.00 /$ 2.00
`deepseek/deepseek-r1-0528`	163K	Text	$2.00 /$ 2.00
`meta-llama/llama-3.3-70b-instruct`	131K	Text	$2.00 /$ 2.00

TEE provider presence and attestation support are not identical for every provider and model. For production verification, test Attestation Report with the exact model ID you plan to use.

Verify Your AI is Running Securely

After you make a request, use Request Signature to fetch the signature for that response. Then fetch a fresh Attestation Report with the returned signing_address to bind the response to TEE evidence.

Next Steps

Use the API reference and feature guides for the next step:

Chat Completions documents the core request and response shape.
List Models shows how to discover models programmatically.
Embeddings covers embedding model calls.
Tool Calling helps you call tools from your AI models.
Images and Vision helps you use image-capable models.
Structured Output helps you get JSON responses.
Streaming helps you consume streaming responses.
Playground helps you test models in a private environment.

Phala Cloud

Documentation Index

​Overview

​Prerequisites

​Make Your Secure Request

​Available Models

​Phala Provider

​NearAI Provider

​Chutes Provider

​Tinfoil Provider

​Verify Your AI is Running Securely

​Next Steps

Overview

Prerequisites

Make Your Secure Request

Available Models

Phala Provider

NearAI Provider

Chutes Provider

Tinfoil Provider

Verify Your AI is Running Securely

Next Steps