AI Bootcamp
An API (Application Programming Interface) is a contract between systems — it defines how they communicate.
In ML context:
Client (React / Streamlit)
│
│ HTTP Request
▼
┌─────────────┐
│ FastAPI │
│ /predict │
└──────┬──────┘
│
▼
ML Model / Pipeline
│
▼
JSON Response → Client
| Method | Usage | Example |
|---|---|---|
GET |
Retrieve data | /models, /health |
POST |
Send data / predict | /predict, /train |
PUT |
Update a resource | /model/{id} |
DELETE |
Remove a resource | /model/{id} |
For ML APIs,
POST /predictis the most common pattern — the client sends features, the server returns a prediction.
FastAPI is:
/docs (Swagger) and /redocInstall FastAPI with standard dependencies:
Run your app in development:
Run in production:
Visit
http://127.0.0.1:8000/docsfor the interactive Swagger UI.
ml-api/
├── app/
│ ├── __init__.py
│ ├── main.py # FastAPI app entry point
│ ├── routers/
│ │ ├── predict.py # prediction endpoints
│ │ └── health.py # health check endpoints
│ ├── schemas/
│ │ └── prediction.py # Pydantic input/output models
│ ├── models/
│ │ └── classifier.pkl # saved ML model
│ └── pipeline/
│ └── preprocess.py # feature engineering
├── requirements.txt
├── Dockerfile
└── README.md
Separating routers, schemas, and pipeline keeps the codebase clean and testable.
Define strict input/output contracts for your ML endpoints:
from pydantic import BaseModel, Field
from typing import Literal
# Input schema
class PredictRequest(BaseModel):
age: int = Field(..., ge=0, le=120, description="Age of the person")
income: float = Field(..., gt=0, description="Annual income in USD")
education: Literal["high_school", "bachelor", "master", "phd"]
# Output schema
class PredictResponse(BaseModel):
prediction: int
label: str
confidence: float
model_version: strPydantic rejects invalid inputs before they reach your model — no manual validation needed.
Path parameter — part of the URL:
Query parameter — optional filter:
from fastapi import FastAPI, HTTPException, status
app = FastAPI()
@app.post("/predict/", status_code=status.HTTP_200_OK)
def predict(request: PredictRequest):
if model is None:
raise HTTPException(
status_code=status.HTTP_503_SERVICE_UNAVAILABLE,
detail="Model is not loaded"
)
try:
result = model.predict([[request.age, request.income]])
return PredictResponse(prediction=int(result[0]))
except Exception as e:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=str(e)
)| Code | Meaning |
|---|---|
200 |
OK — prediction returned |
422 |
Unprocessable — invalid input schema |
503 |
Service unavailable — model not loaded |
500 |
Internal server error |
Use lifespan to load your model once when the server starts:
from fastapi import FastAPI
from contextlib import asynccontextmanager
import joblib
ml_models = {}
@asynccontextmanager
async def lifespan(app: FastAPI):
# Load model on startup
ml_models["classifier"] = joblib.load("models/classifier.pkl")
print("Model loaded successfully")
yield
# Clean up on shutdown
ml_models.clear()
app = FastAPI(lifespan=lifespan)Never load the model inside the endpoint function — that reloads it on every request.
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import joblib
import numpy as np
app = FastAPI(title="ML Prediction API", version="1.0.0")
class PredictRequest(BaseModel):
features: list[float]
class PredictResponse(BaseModel):
prediction: int
confidence: float
model_version: str = "1.0.0"
model = joblib.load("models/classifier.pkl")
@app.post("/predict/", response_model=PredictResponse)
def predict(request: PredictRequest):
X = np.array(request.features).reshape(1, -1)
prediction = int(model.predict(X)[0])
confidence = float(model.predict_proba(X).max())
return PredictResponse(
prediction=prediction,
confidence=round(confidence, 4)
)
@app.get("/health/")
def health():
return {"status": "ok", "model_loaded": model is not None}Raw Input (JSON)
│
▼
┌─────────────────┐
│ Pydantic Schema │ ← validates & parses input
└────────┬────────┘
│
▼
┌─────────────────┐
│ Preprocessing │ ← scaling, encoding, feature engineering
└────────┬────────┘
│
▼
┌─────────────────┐
│ ML Model │ ← sklearn, PyTorch, HuggingFace
└────────┬────────┘
│
▼
┌─────────────────┐
│ Postprocessing │ ← decode labels, format output
└────────┬────────┘
│
▼
JSON Response → Client
from fastapi import FastAPI
from pydantic import BaseModel
from sklearn.pipeline import Pipeline
import joblib
app = FastAPI()
# Load the full sklearn pipeline (preprocessor + model)
pipeline: Pipeline = joblib.load("models/pipeline.pkl")
class InputData(BaseModel):
age: float
income: float
education: str
occupation: str
class OutputData(BaseModel):
prediction: str
confidence: float
@app.post("/predict/", response_model=OutputData)
def predict(data: InputData):
import pandas as pd
df = pd.DataFrame([data.model_dump()])
pred = pipeline.predict(df)[0]
proba = pipeline.predict_proba(df).max()
return OutputData(
prediction=str(pred),
confidence=round(float(proba), 4)
)Saving the full sklearn pipeline (scaler + encoder + model) keeps the API clean — one
pipeline.predict()call handles everything.
For deep learning models or long inference times, use async def with background tasks:
from fastapi import FastAPI, BackgroundTasks
from pydantic import BaseModel
import asyncio
app = FastAPI()
class InferenceRequest(BaseModel):
text: str
@app.post("/predict/async/")
async def predict_async(request: InferenceRequest,
background_tasks: BackgroundTasks):
# Run heavy model in thread pool to avoid blocking
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(
None, model.predict, request.text
)
return {"prediction": result}Use
run_in_executorto run synchronous ML inference without blocking FastAPI’s async event loop.
| Platform | Best For | Complexity |
|---|---|---|
| Streamlit | Quick internal demos | Low |
| Hugging Face Spaces | Public ML demos | Low |
| Docker + Cloud | Production APIs | Medium |
| React / TypeScript | Custom production UI | High |
Streamlit calls your FastAPI backend via requests:
# streamlit_app.py
import streamlit as st
import requests
st.title("ML Prediction App")
age = st.slider("Age", 18, 80, 30)
income = st.number_input("Annual Income", value=50000.0)
if st.button("Predict"):
response = requests.post(
"http://localhost:8000/predict/",
json={"age": age, "income": income}
)
if response.status_code == 200:
result = response.json()
st.success(f"Prediction: {result['prediction']}")
st.metric("Confidence", f"{result['confidence']:.2%}")
else:
st.error("API error — check your FastAPI server")Deploy FastAPI on HF Spaces with a Dockerfile:
# requirements.txt
fastapi[standard]
scikit-learn
joblib
pandas
numpy
Push to a HF Space repo — Spaces automatically builds and deploys your container. Set Space SDK to Docker.
Call FastAPI from a React component:
// src/api/predict.ts
const API_URL = import.meta.env.VITE_API_URL || "http://localhost:8000"
export interface PredictRequest {
age: number
income: number
education: string
}
export interface PredictResponse {
prediction: number
confidence: number
model_version: string
}
export async function predict(data: PredictRequest): Promise<PredictResponse> {
const response = await fetch(`${API_URL}/predict/`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(data),
})
if (!response.ok) {
throw new Error(`API error: ${response.statusText}`)
}
return response.json()
}// src/components/PredictForm.tsx
import { useState } from "react"
import { predict, PredictResponse } from "../api/predict"
export default function PredictForm() {
const [result, setResult] = useState<PredictResponse | null>(null)
const [loading, setLoading] = useState(false)
const handleSubmit = async (e: React.FormEvent) => {
e.preventDefault()
setLoading(true)
try {
const data = await predict({ age: 30, income: 60000, education: "bachelor" })
setResult(data)
} catch (err) {
console.error(err)
} finally {
setLoading(false)
}
}
return (
<form onSubmit={handleSubmit}>
<button type="submit" disabled={loading}>
{loading ? "Predicting..." : "Predict"}
</button>
{result && <p>Prediction: {result.prediction} ({(result.confidence * 100).toFixed(1)}%)</p>}
</form>
)
}Enable CORS in FastAPI so the browser allows requests from your React dev server.
Required when your frontend (React, TypeScript) and API run on different origins:
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=[
"http://localhost:5173", # Vite dev server
"http://localhost:3000", # Create React App
"https://your-app.vercel.app", # production
],
allow_credentials=True,
allow_methods=["GET", "POST", "PUT", "DELETE"],
allow_headers=["*"],
)In production, never use
allow_origins=["*"]— always specify exact origins.
Structure:
/api/v1/predict/response_model on endpointsstatus_code explicitly/health/ endpoint for monitoringValidation:
Field() with ge, le, gt constraintsHTTPExceptionPerformance:
async def for I/O-bound tasksrun_in_executor for CPU-bound inferenceSecurity:
.env)Organise endpoints with APIRouter for scalability:
The auto-generated docs at
/docswill group endpoints bytagsautomatically.
Instinct Institute
Mork MongkulAPI Integration with FastAPI | AI Bootcamp