API Integration with FastAPI

AI Bootcamp

Mork Mongkul

What is an API?
Why FastAPI?
Installation & Project Structure
Core Concepts — Routes, Pydantic, HTTP Methods
Serving an ML Model
ML Pipeline as an API
Deployment — Streamlit, Hugging Face, React/TS
Best Practices

What is an API?

An API (Application Programming Interface) is a contract between systems — it defines how they communicate.

In ML context:

Your model lives on a server
A client (UI, app, script) sends data
The server runs inference and returns predictions
Neither side needs to know the other’s internals

Client (React / Streamlit)
        │
        │  HTTP Request
        ▼
  ┌─────────────┐
  │  FastAPI    │
  │  /predict   │
  └──────┬──────┘
         │
         ▼
   ML Model / Pipeline
         │
         ▼
  JSON Response → Client

REST API — Key HTTP Methods

Method	Usage	Example
`GET`	Retrieve data	`/models`, `/health`
`POST`	Send data / predict	`/predict`, `/train`
`PUT`	Update a resource	`/model/{id}`
`DELETE`	Remove a resource	`/model/{id}`

For ML APIs, POST /predict is the most common pattern — the client sends features, the server returns a prediction.

Why FastAPI?

Why FastAPI for ML?

FastAPI is:

Built on Starlette (web) + Pydantic (data validation)
Async-ready — handles concurrent requests efficiently
Auto-generates docs at /docs (Swagger) and /redoc
Type-safe — validates inputs before they reach your model
Used in production by Microsoft, Uber, Netflix

from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()

class Item(BaseModel):
    name: str
    price: float

@app.post("/items/")
def create_item(item: Item):
    return {"name": item.name}

FastAPI validates item automatically — wrong types return clear errors.

Installation & Project Structure

Installation

Install FastAPI with standard dependencies:

pip install "fastapi[standard]"

Run your app in development:

fastapi dev main.py

Run in production:

fastapi run main.py
# or with uvicorn directly
uvicorn main:app --host 0.0.0.0 --port 8000 --workers 4

Visit http://127.0.0.1:8000/docs for the interactive Swagger UI.

Recommended Project Structure

ml-api/
├── app/
│   ├── __init__.py
│   ├── main.py          # FastAPI app entry point
│   ├── routers/
│   │   ├── predict.py   # prediction endpoints
│   │   └── health.py    # health check endpoints
│   ├── schemas/
│   │   └── prediction.py  # Pydantic input/output models
│   ├── models/
│   │   └── classifier.pkl # saved ML model
│   └── pipeline/
│       └── preprocess.py  # feature engineering
├── requirements.txt
├── Dockerfile
└── README.md

Separating routers, schemas, and pipeline keeps the codebase clean and testable.

Core Concepts

Pydantic — Input & Output Schemas

Define strict input/output contracts for your ML endpoints:

from pydantic import BaseModel, Field
from typing import Literal

# Input schema
class PredictRequest(BaseModel):
    age: int = Field(..., ge=0, le=120, description="Age of the person")
    income: float = Field(..., gt=0, description="Annual income in USD")
    education: Literal["high_school", "bachelor", "master", "phd"]

# Output schema
class PredictResponse(BaseModel):
    prediction: int
    label: str
    confidence: float
    model_version: str

Pydantic rejects invalid inputs before they reach your model — no manual validation needed.

Path, Query & Body Parameters

Path parameter — part of the URL:

@app.get("/models/{model_id}")
def get_model(model_id: int):
    return {"model_id": model_id}

Query parameter — optional filter:

@app.get("/predictions/")
def list_predictions(limit: int = 10,
                     skip: int = 0):
    return {"limit": limit, "skip": skip}

Request body — JSON payload:

@app.post("/predict/")
def predict(request: PredictRequest):
    result = model.predict([
        request.age,
        request.income
    ])
    return {"prediction": result}

Use body for ML inputs — it supports complex nested structures.

HTTP Status Codes & Error Handling

from fastapi import FastAPI, HTTPException, status

app = FastAPI()

@app.post("/predict/", status_code=status.HTTP_200_OK)
def predict(request: PredictRequest):
    if model is None:
        raise HTTPException(
            status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
            detail="Model is not loaded"
        )
    try:
        result = model.predict([[request.age, request.income]])
        return PredictResponse(prediction=int(result[0]))
    except Exception as e:
        raise HTTPException(
            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
            detail=str(e)
        )

Code	Meaning
`200`	OK — prediction returned
`422`	Unprocessable — invalid input schema
`503`	Service unavailable — model not loaded
`500`	Internal server error

Serving an ML Model

Loading the Model at Startup

Use lifespan to load your model once when the server starts:

from fastapi import FastAPI
from contextlib import asynccontextmanager
import joblib

ml_models = {}

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Load model on startup
    ml_models["classifier"] = joblib.load("models/classifier.pkl")
    print("Model loaded successfully")
    yield
    # Clean up on shutdown
    ml_models.clear()

app = FastAPI(lifespan=lifespan)

Never load the model inside the endpoint function — that reloads it on every request.

A Complete Prediction Endpoint

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np

app = FastAPI(title="ML Prediction API", version="1.0.0")

class PredictRequest(BaseModel):
    features: list[float]

class PredictResponse(BaseModel):
    prediction: int
    confidence: float
    model_version: str = "1.0.0"

model = joblib.load("models/classifier.pkl")

@app.post("/predict/", response_model=PredictResponse)
def predict(request: PredictRequest):
    X = np.array(request.features).reshape(1, -1)
    prediction = int(model.predict(X)[0])
    confidence = float(model.predict_proba(X).max())
    return PredictResponse(
        prediction=prediction,
        confidence=round(confidence, 4)
    )

@app.get("/health/")
def health():
    return {"status": "ok", "model_loaded": model is not None}

ML Pipeline as an API

Full ML Pipeline Architecture

Raw Input (JSON)
      │
      ▼
┌─────────────────┐
│  Pydantic Schema │  ← validates & parses input
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Preprocessing  │  ← scaling, encoding, feature engineering
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│   ML Model      │  ← sklearn, PyTorch, HuggingFace
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Postprocessing │  ← decode labels, format output
└────────┬────────┘
         │
         ▼
JSON Response → Client

Pipeline Endpoint Example

from fastapi import FastAPI
from pydantic import BaseModel
from sklearn.pipeline import Pipeline
import joblib

app = FastAPI()

# Load the full sklearn pipeline (preprocessor + model)
pipeline: Pipeline = joblib.load("models/pipeline.pkl")

class InputData(BaseModel):
    age: float
    income: float
    education: str
    occupation: str

class OutputData(BaseModel):
    prediction: str
    confidence: float

@app.post("/predict/", response_model=OutputData)
def predict(data: InputData):
    import pandas as pd
    df = pd.DataFrame([data.model_dump()])
    pred = pipeline.predict(df)[0]
    proba = pipeline.predict_proba(df).max()
    return OutputData(
        prediction=str(pred),
        confidence=round(float(proba), 4)
    )

Saving the full sklearn pipeline (scaler + encoder + model) keeps the API clean — one pipeline.predict() call handles everything.

Async Endpoint for Heavy Inference

For deep learning models or long inference times, use async def with background tasks:

from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel
import asyncio

app = FastAPI()

class InferenceRequest(BaseModel):
    text: str

@app.post("/predict/async/")
async def predict_async(request: InferenceRequest,
                        background_tasks: BackgroundTasks):
    # Run heavy model in thread pool to avoid blocking
    loop = asyncio.get_event_loop()
    result = await loop.run_in_executor(
        None, model.predict, request.text
    )
    return {"prediction": result}

Use run_in_executor to run synchronous ML inference without blocking FastAPI’s async event loop.

Deployment

Deployment Options Overview

Platform	Best For	Complexity
Streamlit	Quick internal demos	Low
Hugging Face Spaces	Public ML demos	Low
Docker + Cloud	Production APIs	Medium
React / TypeScript	Custom production UI	High

Option 1 — Streamlit Frontend

Streamlit calls your FastAPI backend via requests:

# streamlit_app.py
import streamlit as st
import requests

st.title("ML Prediction App")

age = st.slider("Age", 18, 80, 30)
income = st.number_input("Annual Income", value=50000.0)

if st.button("Predict"):
    response = requests.post(
        "http://localhost:8000/predict/",
        json={"age": age, "income": income}
    )
    if response.status_code == 200:
        result = response.json()
        st.success(f"Prediction: {result['prediction']}")
        st.metric("Confidence", f"{result['confidence']:.2%}")
    else:
        st.error("API error — check your FastAPI server")

# Run both together
fastapi dev app/main.py &
streamlit run streamlit_app.py

Option 2 — Hugging Face Spaces

Deploy FastAPI on HF Spaces with a Dockerfile:

# Dockerfile
FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

# HuggingFace Spaces requires port 7860
EXPOSE 7860

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "7860"]

# requirements.txt
fastapi[standard]
scikit-learn
joblib
pandas
numpy

Push to a HF Space repo — Spaces automatically builds and deploys your container. Set Space SDK to Docker.

Option 3 — React / TypeScript Frontend

Call FastAPI from a React component:

// src/api/predict.ts
const API_URL = import.meta.env.VITE_API_URL || "http://localhost:8000"

export interface PredictRequest {
  age: number
  income: number
  education: string
}

export interface PredictResponse {
  prediction: number
  confidence: number
  model_version: string
}

export async function predict(data: PredictRequest): Promise<PredictResponse> {
  const response = await fetch(`${API_URL}/predict/`, {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify(data),
  })
  if (!response.ok) {
    throw new Error(`API error: ${response.statusText}`)
  }
  return response.json()
}

React Component — Calling the API

// src/components/PredictForm.tsx
import { useState } from "react"
import { predict, PredictResponse } from "../api/predict"

export default function PredictForm() {
  const [result, setResult] = useState<PredictResponse | null>(null)
  const [loading, setLoading] = useState(false)

  const handleSubmit = async (e: React.FormEvent) => {
    e.preventDefault()
    setLoading(true)
    try {
      const data = await predict({ age: 30, income: 60000, education: "bachelor" })
      setResult(data)
    } catch (err) {
      console.error(err)
    } finally {
      setLoading(false)
    }
  }

  return (
    <form onSubmit={handleSubmit}>
      <button type="submit" disabled={loading}>
        {loading ? "Predicting..." : "Predict"}
      </button>
      {result && <p>Prediction: {result.prediction} ({(result.confidence * 100).toFixed(1)}%)</p>}
    </form>
  )
}

Enable CORS in FastAPI so the browser allows requests from your React dev server.

Enabling CORS in FastAPI

Required when your frontend (React, TypeScript) and API run on different origins:

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI()

app.add_middleware(
    CORSMiddleware,
    allow_origins=[
        "http://localhost:5173",   # Vite dev server
        "http://localhost:3000",   # Create React App
        "https://your-app.vercel.app",  # production
    ],
    allow_credentials=True,
    allow_methods=["GET", "POST", "PUT", "DELETE"],
    allow_headers=["*"],
)

In production, never use allow_origins=["*"] — always specify exact origins.

Best Practices

API Best Practices for ML

Structure:

Version your API: /api/v1/predict/
Always define response_model on endpoints
Use status_code explicitly
Add a /health/ endpoint for monitoring
Load models at startup, not per-request

Validation:

Use Pydantic Field() with ge, le, gt constraints
Return clear error messages with HTTPException
Validate output shapes before returning

Performance:

Use async def for I/O-bound tasks
Use run_in_executor for CPU-bound inference
Cache model artifacts in memory
Add request logging with middleware

Security:

Never expose raw stack traces in production
Use environment variables for secrets (.env)
Add API key authentication for public endpoints

API Versioning & Routers

Organise endpoints with APIRouter for scalability:

# app/routers/predict.py
from fastapi import APIRouter
router = APIRouter(prefix="/api/v1", tags=["predictions"])

@router.post("/predict/", response_model=PredictResponse)
def predict(request: PredictRequest):
    ...

@router.get("/health/")
def health():
    return {"status": "ok"}

# app/main.py
from fastapi import FastAPI
from app.routers import predict

app = FastAPI(
    title="ML API",
    description="Production ML Prediction API",
    version="1.0.0"
)

app.include_router(predict.router)

The auto-generated docs at /docs will group endpoints by tags automatically.

Questions?

Instinct Institute

Mork Mongkul

API Integration with FastAPI

Table of Contents

What is an API?

What is an API?

REST API — Key HTTP Methods

Why FastAPI?

Why FastAPI for ML?

Installation & Project Structure

Installation

Recommended Project Structure

Core Concepts

Pydantic — Input & Output Schemas

Path, Query & Body Parameters

HTTP Status Codes & Error Handling

Serving an ML Model

Loading the Model at Startup

A Complete Prediction Endpoint

ML Pipeline as an API

Full ML Pipeline Architecture

Pipeline Endpoint Example

Async Endpoint for Heavy Inference

Deployment

Deployment Options Overview

Option 1 — Streamlit Frontend

Option 2 — Hugging Face Spaces

Option 3 — React / TypeScript Frontend

React Component — Calling the API

Enabling CORS in FastAPI

Best Practices

API Best Practices for ML

API Versioning & Routers

Questions?