Vespa Learn
This is a self-paced course that teaches you how to build search, recommendation, and RAG applications with Vespa. You will go from zero to a working e-commerce search engine with hybrid retrieval and machine learning ranking, building it up one piece at a time across six modules.
This is not a replacement for the Vespa documentation. Think of it as the guided path that teaches you how things work and why, with links to the docs when you want the full reference.
What you will build
Every module has a hands-on lab. The labs are not isolated exercises. They build a single e-commerce search application that grows with you as you learn. You start with a minimal schema and 10 products. By the end of the course, you have a production-shaped search pipeline with hybrid retrieval, faceted navigation, and a trained ML reranker.
Here is what the application looks like at each stage:
| Lab | What you add | End state |
|---|---|---|
| 1 | Vespa setup, minimal schema, first queries | Running app with 10 products |
| 2 | More fields, BM25, partial updates, filtering, sorting | Rich schema with 20 products |
| 3 | Application package structure, query profiles | Production-shaped package |
| 4 | Multi-signal ranking, match-features, grouping | Ranking + faceted navigation |
| 5 | Embedding model, vector search, hybrid with RRF | Hybrid search (BM25 + vectors) |
| 6 | LightGBM reranker trained on Vespa features | Full ML-powered pipeline |
Course structure
The course is organized into six modules. Each one builds on the previous.
Module 1: What is Vespa
Get oriented. You will learn what Vespa is, how its architecture works, when it is the right choice, and how to set it up with Docker or Vespa Cloud. The lab gets your first Vespa application running with a product schema, sample data, and basic queries.
Module 2: Vespa basics
Learn the core building blocks. Schemas define your data model with fields, indexing modes, and match settings. The feeding API gets data into Vespa. YQL is the query language you use to retrieve it. You will understand how index, attribute, and summary work together, and how document summaries control what comes back in results. The lab extends your schema with text search, filtering, sorting, and partial updates.
Module 3: Application packages and deployment
Understand how Vespa applications are structured and deployed. An application package bundles schemas, configuration, models, and query profiles into a single deployable unit. You will learn how services.xml defines your cluster topology and how configuration changes apply live without downtime. The lab structures your app as a proper package with query profiles for different clients.
Module 4: Ranking fundamentals
This is where search gets interesting. Rank profiles control how documents are scored. You will learn how to combine BM25 text relevance with business signals like ratings, freshness, and in-stock status. You will use match-features to see exactly why one result ranks above another. You will also learn grouping for faceted navigation: category counts, price range buckets, and aggregations. The lab adds multi-signal ranking and grouping queries to your application.
Module 5: Vector search
Move beyond keywords. This module covers tensors, embedding models, approximate nearest neighbor search with HNSW indexes, and hybrid search that combines BM25 and vector retrieval. You will learn how reciprocal rank fusion merges signals in the global phase. The lab adds a built-in E5 embedding model to your application and implements hybrid search with RRF.
Module 6: Re-ranking and learning to rank
Take ranking to the next level with machine learning. You will learn Vespa's multi-phase ranking pipeline, how to train and deploy GBDT models (LightGBM, XGBoost), how to run neural re-rankers (cross-encoders, ColBERT) via ONNX, and how to collect training data from production queries. The module also covers performance tuning: threading, attribute storage modes, and graceful degradation. The lab trains a LightGBM model on features logged from your application and deploys it as a second-phase re-ranker.
Prerequisites
You should be comfortable with:
- Python for the labs (we use PyVespa and standard data science libraries)
- Basic search concepts like relevance and ranking help, but are not required
You also need a running Vespa instance. You have two options:
- Docker for running Vespa on your local machine. You need Docker Desktop with at least 4 GB of RAM.
- Vespa Cloud for a fully managed setup with no local infrastructure. Sign up at console.vespa-cloud.com and use the free dev environment.
Both paths use the same Vespa CLI and the same commands for deploying, feeding, and querying. The labs show instructions for both side by side. Pick whichever fits your workflow and you can switch at any time.
No prior Vespa experience is needed. That is what this course is for.
How to use this course
Work through the modules in order. Each one assumes you have completed the ones before it, and the labs build on the same application.
Read the chapters to understand the concepts, study the configuration examples to see how things are set up, and run the labs to get hands-on experience. If you already know some of the earlier material, skim the labs to make sure your application matches what later modules expect.
If you are running Vespa locally, keep your Docker container running between labs. Your data and configuration persist across sessions. If you need to stop and come back later, docker start vespa picks up where you left off. If you are using Vespa Cloud, your application stays running in the dev environment (it auto-expires after 14 days of inactivity, but you can extend it from the console).
Getting help
- Vespa documentation for the complete reference on all features
- Vespa Community on Slack for questions
- Vespa GitHub for issues and source code
- Vespa Blog for tutorials, benchmarks, and deep dives