Upsert Prompt
Create a Prompt or update it with a new version if it already exists.
Prompts are identified by the ID
or their path
. The parameters (i.e. the prompt template, temperature, model etc.) determine the versions of the Prompt.
You can provide version_name
and version_description
to identify and describe your versions.
Version names must be unique within a Prompt - attempting to create a version with a name
that already exists will result in a 409 Conflict error.
Headers
Request
The model instance used, e.g. gpt-4
. See supported models
Path of the Prompt, including the name. This locates the Prompt in the Humanloop filesystem and is used as as a unique identifier. For example: folder/name
or just name
.
The template contains the main structure and instructions for the model, including input variables for dynamic values.
For chat models, provide the template as a ChatTemplate (a list of messages), e.g. a system message, followed by a user message with an input variable. For completion models, provide a prompt template as a string.
Input variables should be specified with double curly bracket syntax: {{input_name}}
.
The maximum number of tokens to generate. Provide max_tokens=-1 to dynamically calculate the maximum number of tokens to generate given the length of the prompt
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
The string (or list of strings) after which the model will stop generating. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the generation so far.
Number between -2.0 and 2.0. Positive values penalize new tokens based on how frequently they appear in the generation so far.
The format of the response. Only {"type": "json_object"}
is currently supported for chat.
Guidance on how many reasoning tokens it should generate before creating a response to the prompt. OpenAI reasoning models (o1, o3-mini) expect a OpenAIReasoningEffort enum. Anthropic reasoning models expect an integer, which signifies the maximum token budget.
Response
The model instance used, e.g. gpt-4
. See supported models
The template contains the main structure and instructions for the model, including input variables for dynamic values.
For chat models, provide the template as a ChatTemplate (a list of messages), e.g. a system message, followed by a user message with an input variable. For completion models, provide a prompt template as a string.
Input variables should be specified with double curly bracket syntax: {{input_name}}
.
The maximum number of tokens to generate. Provide max_tokens=-1 to dynamically calculate the maximum number of tokens to generate given the length of the prompt
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass.
The string (or list of strings) after which the model will stop generating. The returned text will not contain the stop sequence.
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the generation so far.
Number between -2.0 and 2.0. Positive values penalize new tokens based on how frequently they appear in the generation so far.
The format of the response. Only {"type": "json_object"}
is currently supported for chat.
Guidance on how many reasoning tokens it should generate before creating a response to the prompt. OpenAI reasoning models (o1, o3-mini) expect a OpenAIReasoningEffort enum. Anthropic reasoning models expect an integer, which signifies the maximum token budget.