Introduction
Sentiment analysis is one of the most practical applications of natural language processing (NLP). Businesses use it to monitor brand reputation, analyze customer feedback, triage support tickets, and gauge public opinion on social media. Despite its power, building a sentiment analysis service does not require a machine learning PhD or months of model training. With the right tools, you can stand up a production-grade REST API in a single afternoon.
In this tutorial you will build a fully functional sentiment analysis API using three technologies: FastAPI for the web framework, Hugging Face Transformers for the pre-trained NLP model, and Python to glue everything together. The final API will accept a text string, classify it as positive or negative, and return a confidence score, all in under 50 lines of code.
By the end of this guide you will understand how to load a transformer model at startup, expose it through a typed REST endpoint, handle errors gracefully, and prepare the service for deployment. Every code snippet is copy-paste ready, so you can follow along on your own machine.
Prerequisites
Before you begin, make sure you have the following ready:
- Python 3.9 or later installed on your system.
- Basic familiarity with Python functions, type hints, and pip.
- A general understanding of REST APIs (HTTP methods, JSON request/response).
- A terminal or command prompt and a code editor of your choice.
- Roughly 500 MB of free disk space for the model weights (downloaded automatically on first run).
Step 1: Set Up the Project
Start by creating a dedicated project directory and a Python virtual environment. Virtual environments isolate your project dependencies from system-wide packages, which prevents version conflicts and keeps your setup reproducible.
mkdir sentiment-api && cd sentiment-api
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activateNext, install the three core dependencies. fastapi is the web framework, uvicorn is the ASGI server that will run it, and transformers provides access to thousands of pre-trained NLP models. The torch package is the deep learning backend that Transformers relies on for inference.
pip install fastapi uvicorn transformers torchOnce the installation finishes, freeze your dependencies into a requirements file so that anyone else (or your deployment pipeline) can replicate the environment exactly:
pip freeze > requirements.txtStep 2: Create the FastAPI App
FastAPI is a modern, high-performance Python web framework built on top of Starlette and Pydantic. It generates interactive API documentation automatically, validates request bodies through type annotations, and supports asynchronous request handling out of the box. Refer to the FastAPI documentation for a deeper dive into its capabilities.
Create a file called main.py in your project root. This single file will contain the entire application. Start with the imports, the Pydantic models for request and response validation, and a basic health-check endpoint:
from contextlib import asynccontextmanager
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from transformers import pipeline
# ---------------------------------------------------------------------------
# Pydantic models – these define and validate the API contract
# ---------------------------------------------------------------------------
class SentimentRequest(BaseModel):
"""Request body: the text to analyse."""
text: str = Field(
...,
min_length=1,
max_length=5000,
examples=["I absolutely love this product!"],
description="The text string to classify.",
)
class SentimentResponse(BaseModel):
"""Response body: label + confidence score."""
text: str
label: str = Field(..., description="POSITIVE or NEGATIVE")
score: float = Field(
..., ge=0.0, le=1.0, description="Confidence between 0 and 1"
)
# ---------------------------------------------------------------------------
# Application lifespan – load the model once at startup
# ---------------------------------------------------------------------------
classifier = None # will hold the pipeline after startup
@asynccontextmanager
async def lifespan(app: FastAPI):
"""Load the sentiment model when the server starts."""
global classifier
classifier = pipeline(
"sentiment-analysis",
model="distilbert-base-uncased-finetuned-sst-2-english",
)
yield # application runs here
classifier = None # cleanup on shutdown
app = FastAPI(
title="Sentiment Analysis API",
version="1.0.0",
description="Classify text as POSITIVE or NEGATIVE.",
lifespan=lifespan,
)
# ---------------------------------------------------------------------------
# Routes
# ---------------------------------------------------------------------------
@app.get("/health")
async def health_check():
"""Return service status and model availability."""
return {
"status": "healthy",
"model_loaded": classifier is not None,
}
@app.post("/analyze", response_model=SentimentResponse)
async def analyze_sentiment(payload: SentimentRequest):
"""Classify the input text and return a sentiment label with confidence."""
if classifier is None:
raise HTTPException(status_code=503, detail="Model not loaded yet.")
result = classifier(payload.text)[0] # returns [{label, score}]
return SentimentResponse(
text=payload.text,
label=result["label"],
score=round(result["score"], 4),
)Let us break down what is happening in this file. The lifespan context manager is FastAPI's recommended pattern for running startup and shutdown logic. Inside it, we instantiate the Hugging Face pipeline exactly once, which downloads the model weights on the first run and caches them locally. The /analyze endpoint receives a POST request, feeds the text into the classifier, and returns the label and confidence score.
Pydantic's Field constraints handle input validation automatically. If a client sends an empty string or a text longer than 5,000 characters, FastAPI returns a 422 Unprocessable Entity response with a clear error message. There is no need to write manual validation logic.
Step 3: Understanding the Sentiment Model
The model powering this API is DistilBERT SST-2 model, a distilled version of BERT fine-tuned on the Stanford Sentiment Treebank (SST-2) dataset. DistilBERT retains 97% of BERT's language understanding while being 60% faster and 40% smaller. That makes it an excellent choice for a production API where latency matters.
The Hugging Face Transformers library abstracts away all the complexity of tokenization, model loading, and inference. The pipeline("sentiment-analysis") call handles three things behind the scenes: it downloads the tokenizer and model weights, tokenizes the input text into sub-word tokens, runs a forward pass through the neural network, and maps the output logits to a human-readable label with a softmax confidence score.
Here is a quick standalone script you can run to verify the model works before integrating it into the API. This is useful for debugging model issues in isolation:
from transformers import pipeline
classifier = pipeline(
"sentiment-analysis",
model="distilbert-base-uncased-finetuned-sst-2-english",
)
samples = [
"I love this product! It exceeded all my expectations.",
"Terrible experience. The service was slow and unhelpful.",
"The weather is okay today, nothing special.",
]
for text in samples:
result = classifier(text)[0]
print(f"{result['label']} ({result['score']:.4f}): {text}")Running this script produces output like the following. Notice how the model assigns high confidence to clearly positive or negative statements, and a lower (but still decisive) score to the more neutral sentence:
POSITIVE (0.9999): I love this product! It exceeded all my expectations.
NEGATIVE (0.9998): Terrible experience. The service was slow and unhelpful.
POSITIVE (0.9234): The weather is okay today, nothing special.
Step 4: Run and Test the API
Start the server with Uvicorn. The --reload flag enables hot-reloading during development, so the server restarts automatically whenever you save a change to main.py:
uvicorn main:app --reload --host 0.0.0.0 --port 8000The first time you start the server, Hugging Face will download the model weights (approximately 260 MB). Subsequent starts load the model from the local cache in a few seconds. Once you see the line Uvicorn running on http://0.0.0.0:8000, the API is ready to receive requests.
Open a new terminal window and test the health check endpoint first:
curl http://localhost:8000/healthYou should see a JSON response confirming the model is loaded:
{"status": "healthy", "model_loaded": true}Now send a POST request to the /analyze endpoint with a sample text:
curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{"text": "FastAPI makes building APIs a breeze! Absolutely fantastic framework."}'The API responds with the original text, a sentiment label, and a confidence score:
{
"text": "FastAPI makes building APIs a breeze! Absolutely fantastic framework.",
"label": "POSITIVE",
"score": 0.9999
}Try a negative example to confirm the model differentiates correctly:
curl -X POST http://localhost:8000/analyze \
-H "Content-Type: application/json" \
-d '{"text": "This is the worst customer service I have ever encountered."}'{
"text": "This is the worst customer service I have ever encountered.",
"label": "NEGATIVE",
"score": 0.9998
}FastAPI also auto-generates interactive documentation. Open http://localhost:8000/docs in your browser to access the Swagger UI, where you can test the endpoint directly from the browser without writing any curl commands.
Model Caching Saves Time
Hugging Face Transformers caches downloaded models in ~/.cache/huggingface/ by default. On subsequent server starts, the model loads from this local cache instead of re-downloading. You can control the cache directory by setting the HF_HOME environment variable, which is useful for Docker builds or shared CI environments. For example: export HF_HOME=/opt/models/cache
Error Handling and Edge Cases
A production API must handle errors gracefully. The current implementation already covers several cases through Pydantic validation and the health check, but let us add a global exception handler and a batch endpoint for processing multiple texts at once. Update main.py by appending the following code after the existing routes:
from fastapi.responses import JSONResponse
class BatchRequest(BaseModel):
"""Request body for analysing multiple texts at once."""
texts: list[str] = Field(
...,
min_length=1,
max_length=32,
description="A list of 1-32 text strings.",
)
class BatchResponse(BaseModel):
results: list[SentimentResponse]
@app.post("/analyze/batch", response_model=BatchResponse)
async def analyze_batch(payload: BatchRequest):
"""Classify multiple texts in a single request."""
if classifier is None:
raise HTTPException(status_code=503, detail="Model not loaded yet.")
raw_results = classifier(payload.texts)
results = [
SentimentResponse(
text=text,
label=res["label"],
score=round(res["score"], 4),
)
for text, res in zip(payload.texts, raw_results)
]
return BatchResponse(results=results)
@app.exception_handler(Exception)
async def global_exception_handler(request, exc):
"""Catch unhandled exceptions and return a clean JSON error."""
return JSONResponse(
status_code=500,
content={
"detail": "An unexpected error occurred. Please try again later."
},
)The batch endpoint accepts a JSON array of up to 32 texts and returns results for all of them in a single response. Hugging Face's pipeline natively supports batch inputs, which means the model processes the batch more efficiently than sending 32 individual requests. The global exception handler ensures that even unexpected runtime errors return a clean JSON payload instead of an HTML traceback, which would confuse API consumers.
Test the batch endpoint with this curl command:
curl -X POST http://localhost:8000/analyze/batch \
-H "Content-Type: application/json" \
-d '{"texts": ["I love sunny days.", "Traffic jams are frustrating.", "The meeting was productive."]}'Going Further: Preparing for Production
The API you have built is fully functional, but moving it to production requires a few more considerations. Below are the most impactful improvements you can make.
Authentication and rate limiting. Protect your endpoint by adding API key authentication through FastAPI's dependency injection system. You can store API keys in environment variables and validate them using a Depends() parameter. For rate limiting, consider packages like slowapi which integrate directly with FastAPI and use Redis or in-memory stores to track request counts per client.
Docker deployment. Containerizing the API makes deployment reproducible across environments. Here is a minimal Dockerfile that packages the entire application:
FROM python:3.11-slim
WORKDIR /app
# Install dependencies first for better layer caching
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# Pre-download the model during the build so startup is instant
RUN python -c "from transformers import pipeline; pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')"
COPY main.py .
EXPOSE 8000
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]Build and run the container with two commands:
docker build -t sentiment-api .
docker run -p 8000:8000 sentiment-apiPerformance optimization. For higher throughput, consider these strategies. First, use ONNX Runtime by exporting the model to ONNX format, which can provide a 2-3x speed improvement on CPU. Second, if you have a GPU available, pass device=0 to the pipeline constructor to run inference on CUDA. Third, run multiple Uvicorn workers behind a load balancer. Since each worker loads its own copy of the model, make sure your server has enough RAM. A typical deployment uses 2-4 workers on a machine with 8 GB of RAM.
Monitoring and logging. Add structured logging using Python's built-in logging module or a library like structlog. Log every request with the input length, the predicted label, the confidence score, and the latency in milliseconds. This data is invaluable for debugging model drift and identifying slow requests. Integrate with Prometheus or Datadog for real-time dashboards and alerting on error rates.
Model Limitations
The DistilBERT SST-2 model is trained on English movie reviews. It performs well on general English text but may be less accurate on domain-specific jargon, sarcasm, or non-English languages. For production use cases with specialized text, consider fine-tuning the model on your own labeled dataset using Hugging Face's Trainer API.
Related Reading
Continue learning with these related articles:
- Deploy this model to production with Docker
- Fine-tune your own custom model with LoRA
- Top open-source AI tools for developers
Key Takeaways
Here is what you accomplished and the main lessons from this tutorial:
- You built a complete sentiment analysis REST API in under 50 lines of core code, demonstrating that modern Python tooling makes NLP accessible without deep ML expertise.
- FastAPI's type-annotated design provides automatic request validation, interactive documentation, and high performance with minimal boilerplate.
- Hugging Face's pipeline abstraction lets you swap models by changing a single string parameter. You can upgrade from DistilBERT to a larger, more accurate model without rewriting any application code.
- Loading the model at server startup (via the lifespan pattern) avoids the overhead of loading it on every request, which would add several seconds of latency.
- Batch processing improves throughput significantly when clients need to classify multiple texts, since the model can process them in a single forward pass.
- For production, always containerize with Docker, add authentication, implement rate limiting, and set up monitoring. These practices apply to any ML-powered API, not just sentiment analysis.
Sentiment analysis is just the beginning. The same architecture you built here, a FastAPI wrapper around a Hugging Face pipeline, works for text summarization, named entity recognition, question answering, translation, and dozens of other NLP tasks. Swap the pipeline task and model name, adjust the Pydantic schemas, and you have a new API ready to deploy.



