API Integration with FastAPI

AI Bootcamp


Mork Mongkul

Table of Contents

  1. What is an API?
  2. Why FastAPI?
  3. Installation & Project Structure
  4. Core Concepts — Routes, Pydantic, HTTP Methods
  5. Serving an ML Model
  6. ML Pipeline as an API
  7. Deployment — Streamlit, Hugging Face, React/TS
  8. Best Practices

What is an API?

What is an API?

An API (Application Programming Interface) is a contract between systems — it defines how they communicate.

In ML context:

  • Your model lives on a server
  • A client (UI, app, script) sends data
  • The server runs inference and returns predictions
  • Neither side needs to know the other’s internals
Client (React / Streamlit)
        │
        │  HTTP Request
        ▼
  ┌─────────────┐
  │  FastAPI    │
  │  /predict   │
  └──────┬──────┘
         │
         ▼
   ML Model / Pipeline
         │
         ▼
  JSON Response → Client

REST API — Key HTTP Methods

Method Usage Example
GET Retrieve data /models, /health
POST Send data / predict /predict, /train
PUT Update a resource /model/{id}
DELETE Remove a resource /model/{id}

For ML APIs, POST /predict is the most common pattern — the client sends features, the server returns a prediction.

Why FastAPI?

Why FastAPI for ML?

FastAPI is:

  • Built on Starlette (web) + Pydantic (data validation)
  • Async-ready — handles concurrent requests efficiently
  • Auto-generates docs at /docs (Swagger) and /redoc
  • Type-safe — validates inputs before they reach your model
  • Used in production by Microsoft, Uber, Netflix
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Item(BaseModel):
    name: str
    price: float

@app.post("/items/")
def create_item(item: Item):
    return {"name": item.name}

FastAPI validates item automatically — wrong types return clear errors.

Installation & Project Structure

Installation

Install FastAPI with standard dependencies:

pip install "fastapi[standard]"

Run your app in development:

fastapi dev main.py

Run in production:

fastapi run main.py
# or with uvicorn directly
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

Visit http://127.0.0.1:8000/docs for the interactive Swagger UI.

Core Concepts

Pydantic — Input & Output Schemas

Define strict input/output contracts for your ML endpoints:

from pydantic import BaseModel, Field
from typing import Literal

# Input schema
class PredictRequest(BaseModel):
    age: int = Field(..., ge=0, le=120, description="Age of the person")
    income: float = Field(..., gt=0, description="Annual income in USD")
    education: Literal["high_school", "bachelor", "master", "phd"]

# Output schema
class PredictResponse(BaseModel):
    prediction: int
    label: str
    confidence: float
    model_version: str

Pydantic rejects invalid inputs before they reach your model — no manual validation needed.

Path, Query & Body Parameters

Path parameter — part of the URL:

@app.get("/models/{model_id}")
def get_model(model_id: int):
    return {"model_id": model_id}

Query parameter — optional filter:

@app.get("/predictions/")
def list_predictions(limit: int = 10,
                     skip: int = 0):
    return {"limit": limit, "skip": skip}

Request body — JSON payload:

@app.post("/predict/")
def predict(request: PredictRequest):
    result = model.predict([
        request.age,
        request.income
    ])
    return {"prediction": result}

Use body for ML inputs — it supports complex nested structures.

HTTP Status Codes & Error Handling

from fastapi import FastAPI, HTTPException, status

app = FastAPI()

@app.post("/predict/", status_code=status.HTTP_200_OK)
def predict(request: PredictRequest):
    if model is None:
        raise HTTPException(
            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
            detail="Model is not loaded"
        )
    try:
        result = model.predict([[request.age, request.income]])
        return PredictResponse(prediction=int(result[0]))
    except Exception as e:
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=str(e)
        )
Code Meaning
200 OK — prediction returned
422 Unprocessable — invalid input schema
503 Service unavailable — model not loaded
500 Internal server error

Serving an ML Model

Loading the Model at Startup

Use lifespan to load your model once when the server starts:

from fastapi import FastAPI
from contextlib import asynccontextmanager
import joblib

ml_models = {}

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Load model on startup
    ml_models["classifier"] = joblib.load("models/classifier.pkl")
    print("Model loaded successfully")
    yield
    # Clean up on shutdown
    ml_models.clear()

app = FastAPI(lifespan=lifespan)

Never load the model inside the endpoint function — that reloads it on every request.

A Complete Prediction Endpoint

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI(title="ML Prediction API", version="1.0.0")

class PredictRequest(BaseModel):
    features: list[float]

class PredictResponse(BaseModel):
    prediction: int
    confidence: float
    model_version: str = "1.0.0"

model = joblib.load("models/classifier.pkl")

@app.post("/predict/", response_model=PredictResponse)
def predict(request: PredictRequest):
    X = np.array(request.features).reshape(1, -1)
    prediction = int(model.predict(X)[0])
    confidence = float(model.predict_proba(X).max())
    return PredictResponse(
        prediction=prediction,
        confidence=round(confidence, 4)
    )

@app.get("/health/")
def health():
    return {"status": "ok", "model_loaded": model is not None}

ML Pipeline as an API

Full ML Pipeline Architecture

Raw Input (JSON)
      │
      ▼
┌─────────────────┐
│  Pydantic Schema │  ← validates & parses input
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Preprocessing  │  ← scaling, encoding, feature engineering
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   ML Model      │  ← sklearn, PyTorch, HuggingFace
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Postprocessing │  ← decode labels, format output
└────────┬────────┘
         │
         ▼
JSON Response → Client

Pipeline Endpoint Example

from fastapi import FastAPI
from pydantic import BaseModel
from sklearn.pipeline import Pipeline
import joblib

app = FastAPI()

# Load the full sklearn pipeline (preprocessor + model)
pipeline: Pipeline = joblib.load("models/pipeline.pkl")

class InputData(BaseModel):
    age: float
    income: float
    education: str
    occupation: str

class OutputData(BaseModel):
    prediction: str
    confidence: float

@app.post("/predict/", response_model=OutputData)
def predict(data: InputData):
    import pandas as pd
    df = pd.DataFrame([data.model_dump()])
    pred = pipeline.predict(df)[0]
    proba = pipeline.predict_proba(df).max()
    return OutputData(
        prediction=str(pred),
        confidence=round(float(proba), 4)
    )

Saving the full sklearn pipeline (scaler + encoder + model) keeps the API clean — one pipeline.predict() call handles everything.

Async Endpoint for Heavy Inference

For deep learning models or long inference times, use async def with background tasks:

from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel
import asyncio

app = FastAPI()

class InferenceRequest(BaseModel):
    text: str

@app.post("/predict/async/")
async def predict_async(request: InferenceRequest,
                        background_tasks: BackgroundTasks):
    # Run heavy model in thread pool to avoid blocking
    loop = asyncio.get_event_loop()
    result = await loop.run_in_executor(
        None, model.predict, request.text
    )
    return {"prediction": result}

Use run_in_executor to run synchronous ML inference without blocking FastAPI’s async event loop.

Deployment

Deployment Options Overview

Platform Best For Complexity
Streamlit Quick internal demos Low
Hugging Face Spaces Public ML demos Low
Docker + Cloud Production APIs Medium
React / TypeScript Custom production UI High

Option 1 — Streamlit Frontend

Streamlit calls your FastAPI backend via requests:

# streamlit_app.py
import streamlit as st
import requests

st.title("ML Prediction App")

age = st.slider("Age", 18, 80, 30)
income = st.number_input("Annual Income", value=50000.0)

if st.button("Predict"):
    response = requests.post(
        "http://localhost:8000/predict/",
        json={"age": age, "income": income}
    )
    if response.status_code == 200:
        result = response.json()
        st.success(f"Prediction: {result['prediction']}")
        st.metric("Confidence", f"{result['confidence']:.2%}")
    else:
        st.error("API error — check your FastAPI server")
# Run both together
fastapi dev app/main.py &
streamlit run streamlit_app.py

Option 2 — Hugging Face Spaces

Deploy FastAPI on HF Spaces with a Dockerfile:

# Dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# HuggingFace Spaces requires port 7860
EXPOSE 7860

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]
# requirements.txt
fastapi[standard]
scikit-learn
joblib
pandas
numpy

Push to a HF Space repo — Spaces automatically builds and deploys your container. Set Space SDK to Docker.

Option 3 — React / TypeScript Frontend

Call FastAPI from a React component:

// src/api/predict.ts
const API_URL = import.meta.env.VITE_API_URL || "http://localhost:8000"

export interface PredictRequest {
  age: number
  income: number
  education: string
}

export interface PredictResponse {
  prediction: number
  confidence: number
  model_version: string
}

export async function predict(data: PredictRequest): Promise<PredictResponse> {
  const response = await fetch(`${API_URL}/predict/`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(data),
  })
  if (!response.ok) {
    throw new Error(`API error: ${response.statusText}`)
  }
  return response.json()
}

React Component — Calling the API

// src/components/PredictForm.tsx
import { useState } from "react"
import { predict, PredictResponse } from "../api/predict"

export default function PredictForm() {
  const [result, setResult] = useState<PredictResponse | null>(null)
  const [loading, setLoading] = useState(false)

  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault()
    setLoading(true)
    try {
      const data = await predict({ age: 30, income: 60000, education: "bachelor" })
      setResult(data)
    } catch (err) {
      console.error(err)
    } finally {
      setLoading(false)
    }
  }

  return (
    <form onSubmit={handleSubmit}>
      <button type="submit" disabled={loading}>
        {loading ? "Predicting..." : "Predict"}
      </button>
      {result && <p>Prediction: {result.prediction} ({(result.confidence * 100).toFixed(1)}%)</p>}
    </form>
  )
}

Enable CORS in FastAPI so the browser allows requests from your React dev server.

Enabling CORS in FastAPI

Required when your frontend (React, TypeScript) and API run on different origins:

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=[
        "http://localhost:5173",   # Vite dev server
        "http://localhost:3000",   # Create React App
        "https://your-app.vercel.app",  # production
    ],
    allow_credentials=True,
    allow_methods=["GET", "POST", "PUT", "DELETE"],
    allow_headers=["*"],
)

In production, never use allow_origins=["*"] — always specify exact origins.

Best Practices

API Best Practices for ML

Structure:

  • Version your API: /api/v1/predict/
  • Always define response_model on endpoints
  • Use status_code explicitly
  • Add a /health/ endpoint for monitoring
  • Load models at startup, not per-request

Validation:

  • Use Pydantic Field() with ge, le, gt constraints
  • Return clear error messages with HTTPException
  • Validate output shapes before returning

Performance:

  • Use async def for I/O-bound tasks
  • Use run_in_executor for CPU-bound inference
  • Cache model artifacts in memory
  • Add request logging with middleware

Security:

  • Never expose raw stack traces in production
  • Use environment variables for secrets (.env)
  • Add API key authentication for public endpoints

API Versioning & Routers

Organise endpoints with APIRouter for scalability:

# app/routers/predict.py
from fastapi import APIRouter
router = APIRouter(prefix="/api/v1", tags=["predictions"])

@router.post("/predict/", response_model=PredictResponse)
def predict(request: PredictRequest):
    ...

@router.get("/health/")
def health():
    return {"status": "ok"}
# app/main.py
from fastapi import FastAPI
from app.routers import predict

app = FastAPI(
    title="ML API",
    description="Production ML Prediction API",
    version="1.0.0"
)

app.include_router(predict.router)

The auto-generated docs at /docs will group endpoints by tags automatically.

Questions?

Instinct Institute

Mork Mongkul