Prompt Experiments
Prompt Experiments allows you to test a prompt version from Prompt Management on a Dataset of inputs and expected outputs. Thereby, you can verify that the change yields the expected outputs and does not cause regressions. You can directly analyze the results of different prompt experiments side-by-side.
Optionally, you can use LLM-as-a-Judge Evaluators to automatically evaluate the responses based on the expected outputs to further analyze the results on an aggregate level.
This is a no-code feature within Langfuse. You can run more complex experiments via the Langfuse SDKs/API. Follow this guide to get started.
Key benefits
- Feedback loop: Quickly iterate on prompts by running experiments and directly comparing evaluation results side-by-side.
- Regression prevention: When making prompt changes, run an experiment to make sure that the new version does not cause bad outputs.
Overview
Requirements
For prompt experiments to work correctly, you must ensure:
- Your prompt contains at least one variable using the
{{variableName}}
syntax - Variable names in your prompt must exactly match the keys in your dataset items
- Dataset items must have their input formatted as valid JSON
Variable Mapping Example
The following example demonstrates how prompt variables are mapped to dataset item inputs:
Prompt:
You are a Langfuse expert. Answer based on:
{{documentation}}
Question: {{question}}
Dataset Item:
{
"documentation": "Langfuse is an LLM Engineering Platform",
"question": "What is Langfuse?"
}
In this example:
- The prompt variable
{{documentation}}
maps to the JSON key"documentation"
- The prompt variable
{{question}}
maps to the JSON key"question"
- Both keys must exist in the dataset item’s input JSON for the experiment to run successfully
Setup
If you already have a dataset and a prompt, you can skip the following steps.
In Prompt Experiments, the items of a dataset are mapped to the variables of the prompt. In the following example, the variables (documentation
and question
) are mapped to the input
of the dataset which is a JSON object. The expected output
contains a reference answer for the given dataset item.
Configure LLM connection
Prompt Experiments runs LLM calls within Langfuse. Thus, you need to configure an LLM connection in the project settings.
Supported LLM providers
- OpenAI, or OpenAI-compatible providers (e.g. LiteLLM, Google Vertex AI)
- Anthropic
- Azure OpenAI
- AWS Bedrock
Create a dataset
Create a dataset with the inputs and expected outputs that you want to test your prompt on.
langfuse.create_dataset(
name="<dataset_name>",
# optional description
description="My first dataset",
# optional metadata
metadata={
"author": "Alice",
"date": "2022-01-01",
"type": "benchmark"
}
)
See low-level SDK docs for details on how to initialize the Python client.
Create dataset items with test cases
Dataset items include the input variables that should be inserted into the prompt.
The input must be a JSON object where each key exactly matches a variable name in your prompt. For example, if your prompt contains {{question}}
, your dataset item’s input JSON must have a "question"
key.
Example Dataset Item with variables
{
"question": "What is Langfuse?",
"documentation": "Langfuse - the LLM Engineering Platform"
}
Langfuse is the LLM Engineering Platform.
langfuse.create_dataset_item(
dataset_name="<dataset_name>",
# any python object or value, optional
input={
"text": "hello world"
},
# any python object or value, optional
expected_output={
"text": "hello world"
},
# metadata, optional
metadata={
"model": "llama3",
}
)
See low-level SDK docs for details on how to initialize the Python client.
Create a prompt with variables
Use {{variables}}
to insert the dataset variables into the prompt during experiments.
Each {{variableName}}
in your prompt must have a corresponding key in your dataset items’ input JSON. The names must match exactly (case-sensitive).
Example Prompt
You are a Langfuse expert. Please answer questions based on the following documentation:
DOCUMENTATION
{{documentation}}
{{question}}
Run a prompt experiment
Now that we have set up a prompt version and a dataset, we can run a prompt experiment in Langfuse for each prompt version that we want to test.
When viewing the prompt details or a dataset, use the following button to run a prompt experiment:
Select the prompt version, dataset, and model configuration that you want to test. Before running the experiment, you will see whether the prompt variables match the dataset variables.
Troubleshooting: If you see a warning about mismatched variables, ensure that:
- Every
{{variable}}
in your prompt has a matching key in your dataset items’ input JSON - The names match exactly (including case sensitivity)
- Your dataset input is valid JSON format