Start Lesson
By the end of this lesson, you will have a working Next.js API route that accepts text and returns a one-sentence summary from an LLM. You will also understand how all three major providers (OpenAI, Anthropic, Google) structure their APIs, so you are never locked in.
Here is the finished product --- a /api/summarize endpoint:
// app/api/summarize/route.ts
export async function POST(request: Request) {
const { text } = await request.json()
if (!text || typeof text !== 'string') {
return Response.json({ error: 'Missing text field' }, { status: 400 })
}
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`
},
body: JSON.stringify({
model: 'gpt-4o-mini',
messages: [
{ role: 'system', content: 'Summarize the following text in one sentence.' },
{ role: 'user', content: text }
],
temperature: 0,
max_tokens: 100
})
})
if (!response.ok) {
const error = await response.json()
return Response.json(
{ error: error.error?.message ?? 'LLM request failed' },
{ status: 502 }
)
}
const data = await response.json()
return Response.json({
summary: data.choices[0].message.content,
tokensUsed: data.usage.total_tokens
})
}
Copy that into your project. Run npm run dev. Hit the endpoint with curl or Postman. You have a working AI feature. Now let us understand every line.
Three things to make an LLM API call:
Get your keys here:
Store them in a .env.local file. Never hardcode API keys in your source code.
# .env.local
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
Every LLM API uses the same core concept: a messages array. Each message has a role and content.
const messages = [
{
role: 'system',
content: 'You are a concise summarizer. Respond in one sentence.'
},
{
role: 'user',
content: 'Summarize this: The global AI market is projected to reach $1.8 trillion by 2030, driven primarily by enterprise adoption of generative AI tools for content creation, code generation, and customer service automation.'
}
]
The three roles:
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`
},
body: JSON.stringify({
model: 'gpt-4o-mini',
messages,
temperature: 0,
max_tokens: 150
})
})
const data = await response.json()
const answer = data.choices[0].message.content
const tokensUsed = data.usage // { prompt_tokens, completion_tokens, total_tokens }
const response = await fetch('https://api.anthropic.com/v1/messages', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': process.env.ANTHROPIC_API_KEY!,
'anthropic-version': '2023-06-01'
},
body: JSON.stringify({
model: 'claude-sonnet-4-20250514',
max_tokens: 150,
system: messages[0].content, // Anthropic uses a separate system field
messages: [{ role: 'user', content: messages[1].content }],
temperature: 0
})
})
const data = await response.json()
const answer = data.content[0].text
const tokensUsed = data.usage // { input_tokens, output_tokens }
const response = await fetch(
`https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=${process.env.GOOGLE_API_KEY}`,
{
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
systemInstruction: {
parts: [{ text: messages[0].content }]
},
contents: [{
role: 'user',
parts: [{ text: messages[1].content }]
}],
generationConfig: {
temperature: 0,
maxOutputTokens: 150
}
})
}
)
const data = await response.json()
const answer = data.candidates[0].content.parts[0].text
const tokensUsed = data.usageMetadata // { promptTokenCount, candidatesTokenCount }
Notice the differences: each provider has its own endpoint structure, authentication pattern, and response format. The underlying concept --- messages in, text out --- is the same. This is why abstractions like the Vercel AI SDK exist, and why you will adopt one in the next lesson.
Tokens are not words. They are subword units --- roughly 0.75 words per token in English. The response from every provider includes a usage object that tells you exactly how many tokens were consumed.
Here is how to estimate cost from a response:
function estimateCost(
usage: { promptTokens: number; completionTokens: number },
pricing: { inputPerMillion: number; outputPerMillion: number }
) {
const inputCost = (usage.promptTokens / 1_000_000) * pricing.inputPerMillion
const outputCost = (usage.completionTokens / 1_000_000) * pricing.outputPerMillion
return { inputCost, outputCost, total: inputCost + outputCost }
}
// GPT-4o-mini pricing (as of early 2025)
const cost = estimateCost(
{ promptTokens: 85, completionTokens: 32 },
{ inputPerMillion: 0.15, outputPerMillion: 0.60 }
)
// { inputCost: 0.00001275, outputCost: 0.0000192, total: 0.00003195 }
// That's $0.00003 — roughly 33,000 requests per dollar.
Track this from day one. Small per-request costs compound fast at scale.
Temperature controls randomness. It is the single most important generation parameter.
Same prompt, different temperatures:
// temperature: 0 (three runs)
// "The global AI market will reach $1.8T by 2030, driven by enterprise generative AI adoption."
// "The global AI market will reach $1.8T by 2030, driven by enterprise generative AI adoption."
// "The global AI market will reach $1.8T by 2030, driven by enterprise generative AI adoption."
// temperature: 0.8 (three runs)
// "AI is on track to become a $1.8 trillion industry by 2030 as businesses embrace generative tools."
// "Enterprise adoption of generative AI for content and code is propelling the AI market toward $1.8T by 2030."
// "By 2030, generative AI adoption across enterprises could push the global AI market to $1.8 trillion."
For most product features --- summarization, data extraction, customer support --- start at temperature 0 and increase only if the outputs feel too rigid.
Let us go back to the summarize endpoint from the top of this lesson and break down why each piece exists.
Input validation: The if (!text || typeof text !== 'string') check prevents empty or malformed requests from burning API credits.
Error handling: The if (!response.ok) block catches provider errors (rate limits, bad keys, model outages) and returns them as a 502 to the client, rather than crashing silently.
Token tracking: Returning tokensUsed alongside the summary gives you cost visibility from your first request. You will build on this habit in every lesson.
The route works. But it has a problem: the user waits for the entire response to generate before seeing anything. For short summaries, that is fine. For longer generations --- chat, analysis, reports --- the wait kills the experience.
Modify the summarize endpoint to accept a provider parameter and call the appropriate API:
const { text, provider = 'openai' } = await request.json()
// Switch on provider to call OpenAI, Anthropic, or Google
// Normalize the response so the client always gets { summary, tokensUsed }
This forces you to handle three different response shapes and collapse them into one interface --- exactly what the Vercel AI SDK does for you in the next lesson.
You have a working API call that waits for the full response. In the next lesson, you turn this into a real-time experience with Streaming Chat and the AI SDK --- the user sees each word as the model generates it, and you replace all this raw fetch boilerplate with two functions.