Your First LLM API Call | Building Your First AI Product | Celestinosalim.com

Your First LLM API Call

What You Will Build

By the end of this lesson, you will have a working Next.js API route that accepts text and returns a one-sentence summary from an LLM. You will also understand how all three major providers (OpenAI, Anthropic, Google) structure their APIs, so you are never locked in.

Here is the finished product --- a /api/summarize endpoint:

// app/api/summarize/route.ts
export async function POST(request: Request) {
  const { text } = await request.json()

  if (!text || typeof text !== 'string') {
    return Response.json({ error: 'Missing text field' }, { status: 400 })
  }

  const response = await fetch('https://api.openai.com/v1/chat/completions', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`
    },
    body: JSON.stringify({
      model: 'gpt-4o-mini',
      messages: [
        { role: 'system', content: 'Summarize the following text in one sentence.' },
        { role: 'user', content: text }
      ],
      temperature: 0,
      max_tokens: 100
    })
  })

  if (!response.ok) {
    const error = await response.json()
    return Response.json(
      { error: error.error?.message ?? 'LLM request failed' },
      { status: 502 }
    )
  }

  const data = await response.json()
  return Response.json({
    summary: data.choices[0].message.content,
    tokensUsed: data.usage.total_tokens
  })
}

Copy that into your project. Run npm run dev. Hit the endpoint with curl or Postman. You have a working AI feature. Now let us understand every line.

What You Need

Three things to make an LLM API call:

An API key from the provider (OpenAI, Anthropic, or Google).
A request body specifying the model, messages, and generation parameters.
A POST request to the provider's endpoint.

Get your keys here:

Store them in a .env.local file. Never hardcode API keys in your source code.

# .env.local
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...

The Request Format

Every LLM API uses the same core concept: a messages array. Each message has a role and content.

const messages = [
  {
    role: 'system',
    content: 'You are a concise summarizer. Respond in one sentence.'
  },
  {
    role: 'user',
    content: 'Summarize this: The global AI market is projected to reach $1.8 trillion by 2030, driven primarily by enterprise adoption of generative AI tools for content creation, code generation, and customer service automation.'
  }
]

The three roles:

system: Sets the behavior and constraints for the model. The model treats this as its instructions.
user: The human's input.
assistant: The model's previous responses (used for multi-turn conversations).

Three Providers, Side by Side

OpenAI

const response = await fetch('https://api.openai.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`
  },
  body: JSON.stringify({
    model: 'gpt-4o-mini',
    messages,
    temperature: 0,
    max_tokens: 150
  })
})

const data = await response.json()
const answer = data.choices[0].message.content
const tokensUsed = data.usage // { prompt_tokens, completion_tokens, total_tokens }

Anthropic

const response = await fetch('https://api.anthropic.com/v1/messages', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-api-key': process.env.ANTHROPIC_API_KEY!,
    'anthropic-version': '2023-06-01'
  },
  body: JSON.stringify({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 150,
    system: messages[0].content, // Anthropic uses a separate system field
    messages: [{ role: 'user', content: messages[1].content }],
    temperature: 0
  })
})

const data = await response.json()
const answer = data.content[0].text
const tokensUsed = data.usage // { input_tokens, output_tokens }

Google (Gemini)

const response = await fetch(
  `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.0-flash:generateContent?key=${process.env.GOOGLE_API_KEY}`,
  {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      systemInstruction: {
        parts: [{ text: messages[0].content }]
      },
      contents: [{
        role: 'user',
        parts: [{ text: messages[1].content }]
      }],
      generationConfig: {
        temperature: 0,
        maxOutputTokens: 150
      }
    })
  }
)

const data = await response.json()
const answer = data.candidates[0].content.parts[0].text
const tokensUsed = data.usageMetadata // { promptTokenCount, candidatesTokenCount }

Notice the differences: each provider has its own endpoint structure, authentication pattern, and response format. The underlying concept --- messages in, text out --- is the same. This is why abstractions like the Vercel AI SDK exist, and why you will adopt one in the next lesson.

Understanding Tokens and Cost

Tokens are not words. They are subword units --- roughly 0.75 words per token in English. The response from every provider includes a usage object that tells you exactly how many tokens were consumed.

Here is how to estimate cost from a response:

function estimateCost(
  usage: { promptTokens: number; completionTokens: number },
  pricing: { inputPerMillion: number; outputPerMillion: number }
) {
  const inputCost = (usage.promptTokens / 1_000_000) * pricing.inputPerMillion
  const outputCost = (usage.completionTokens / 1_000_000) * pricing.outputPerMillion
  return { inputCost, outputCost, total: inputCost + outputCost }
}

// GPT-4o-mini pricing (as of early 2025)
const cost = estimateCost(
  { promptTokens: 85, completionTokens: 32 },
  { inputPerMillion: 0.15, outputPerMillion: 0.60 }
)
// { inputCost: 0.00001275, outputCost: 0.0000192, total: 0.00003195 }
// That's $0.00003 — roughly 33,000 requests per dollar.

Track this from day one. Small per-request costs compound fast at scale.

Temperature: Deterministic vs. Creative

Temperature controls randomness. It is the single most important generation parameter.

Temperature 0: The model picks the most probable next token every time. Deterministic. Use for extraction, classification, and anything where consistency matters.
Temperature 0.7-1.0: The model samples from a broader distribution. More varied outputs. Use for creative writing, brainstorming, and exploration.

Same prompt, different temperatures:

// temperature: 0 (three runs)
// "The global AI market will reach $1.8T by 2030, driven by enterprise generative AI adoption."
// "The global AI market will reach $1.8T by 2030, driven by enterprise generative AI adoption."
// "The global AI market will reach $1.8T by 2030, driven by enterprise generative AI adoption."

// temperature: 0.8 (three runs)
// "AI is on track to become a $1.8 trillion industry by 2030 as businesses embrace generative tools."
// "Enterprise adoption of generative AI for content and code is propelling the AI market toward $1.8T by 2030."
// "By 2030, generative AI adoption across enterprises could push the global AI market to $1.8 trillion."

For most product features --- summarization, data extraction, customer support --- start at temperature 0 and increase only if the outputs feel too rigid.

Walking Through the API Route

Let us go back to the summarize endpoint from the top of this lesson and break down why each piece exists.

Input validation: The if (!text || typeof text !== 'string') check prevents empty or malformed requests from burning API credits.

Error handling: The if (!response.ok) block catches provider errors (rate limits, bad keys, model outages) and returns them as a 502 to the client, rather than crashing silently.

Token tracking: Returning tokensUsed alongside the summary gives you cost visibility from your first request. You will build on this habit in every lesson.

The route works. But it has a problem: the user waits for the entire response to generate before seeing anything. For short summaries, that is fine. For longer generations --- chat, analysis, reports --- the wait kills the experience.

Try This

Modify the summarize endpoint to accept a provider parameter and call the appropriate API:

const { text, provider = 'openai' } = await request.json()

// Switch on provider to call OpenAI, Anthropic, or Google
// Normalize the response so the client always gets { summary, tokensUsed }

This forces you to handle three different response shapes and collapse them into one interface --- exactly what the Vercel AI SDK does for you in the next lesson.

What's Next

You have a working API call that waits for the full response. In the next lesson, you turn this into a real-time experience with Streaming Chat and the AI SDK --- the user sees each word as the model generates it, and you replace all this raw fetch boilerplate with two functions.