Skip to content

ResponsesCreateRequest

Request schema for creating a new agent response. Configures the agent's model, input, tools, output format, and behavior. Supports both foreground (blocking) and background (asynchronous) execution with optional streaming.

Example Usage

typescript
import { ResponsesCreateRequest } from "@meetkai/mka1/models/components";

let value: ResponsesCreateRequest = {
  model: "Model T",
};

Fields

FieldTypeRequiredDescription
backgroundbooleanWhether to run the model response in the background asynchronously. When true, the request returns immediately with a response ID while the agent processes in the background. Use GET /responses/{id} to retrieve results later. When false, the request blocks until completion. Defaults to false.
conversationcomponents.ResponsesCreateRequestConversationThe conversation that this response belongs to. Can be a conversation ID string or a conversation object. Used to maintain context and history across multiple agent interactions. Optional - omit for one-off interactions.
includecomponents.Include[]Additional fields to include in the response output. Allows requesting specific nested data like web search sources, code interpreter outputs, computer screenshots, file search results, input images, output logprobs, or reasoning content. These fields may have performance or cost implications.
inputcomponents.ResponsesCreateRequestInputThe user input to send to the agent. Can be a simple text string for basic queries, or an array of item objects containing text, images, files, or audio for multimodal interactions. This is the main content the agent will respond to.
instructionsstringSystem or developer instructions inserted into the model's context before user input. Use this to guide the agent's behavior, set personality, define constraints, or provide domain-specific knowledge. Acts as a persistent system message for this response.
maxOutputTokensnumberThe maximum number of tokens the model can generate in its output. Sets an upper bound to control costs and response length. The actual output may be shorter if the model finishes naturally. Must be a positive integer.
maxToolCallsnumberThe maximum total number of tool calls the agent can make during this response. Useful for controlling execution time and preventing infinite loops. Applies to all built-in tools like web search, file search, code interpreter, etc. Must be a positive integer.
metadataRecord<string, string>N/A
modelstring✔️The ID of the model to use for generating the response. Can be an OpenAI model (e.g., 'gpt-4', 'gpt-4-turbo', 'o1'), Anthropic model (e.g., 'claude-3-5-sonnet'), Google model, or a custom registered model. Required field.
parallelToolCallsbooleanWhether to allow the agent to execute multiple tool calls in parallel. When true, the agent can make concurrent tool calls for efficiency. When false, tools are called sequentially. Defaults to true for better performance.
previousResponseIdstringThe unique ID of a previous response to continue from. Used for multi-turn conversations to maintain context and history. The agent will have access to all previous interactions in the chain.
promptcomponents.PromptTemplateN/A
promptCacheKeystringA stable identifier used for caching prompts and context. Helps reduce costs and latency by reusing cached prompt processing. Useful for prompts that are used repeatedly with minor variations.
reasoningcomponents.ReasoningConfigurationN/A
safetyIdentifierstringA stable identifier used to help detect and prevent policy violations across multiple requests. Helps the system identify patterns of abuse or misuse. Useful for compliance tracking and safety monitoring.
serviceTiercomponents.ServiceTierSpecifies the processing tier for serving the request. 'auto' lets the system choose, 'default' uses standard processing, 'flex' allows flexible scheduling for lower priority, 'priority' provides faster processing. Defaults to 'auto'.
storebooleanWhether to store the generated response for later retrieval. When true, the response is saved and can be retrieved via GET /responses/{id}. When false, the response is not persisted after generation. Defaults to true.
streambooleanIf set to true, the response data will be streamed using Server-Sent Events (SSE) for real-time updates as the agent generates the response. When false, the response is returned as a single complete object. Defaults to false.
streamOptionscomponents.ResponsesCreateRequestStreamOptionsAdditional options for configuring streaming behavior when stream is enabled.
temperaturenumberControls randomness in the model's output. Higher values (e.g., 1.5-2.0) make output more random and creative, lower values (e.g., 0.0-0.5) make it more focused and deterministic. Must be between 0 and 2. Defaults to 1 for balanced behavior.
textcomponents.TextConfigurationN/A
toolChoicecomponents.ToolChoiceN/A
toolscomponents.ResponsesCreateRequestTools[]Array of tool definitions that the agent can use during response generation. Supports built-in tools (web search, file search, code interpreter, computer use, image generation) and custom tools (functions, MCP, local shell). Each tool definition specifies its type and configuration.
topLogprobsnumberThe number of most likely tokens to return at each position along with their log probabilities. Must be between 0 and 20. Useful for understanding model confidence and exploring alternative outputs.
topPnumberNucleus sampling parameter. The model considers only the tokens with top_p cumulative probability. Lower values (e.g., 0.1) make output more focused, higher values (e.g., 0.9) allow more diversity. Must be between 0 and 1. Defaults to 1. Alternative to temperature.
truncationcomponents.TruncationThe truncation strategy for handling inputs that exceed the model's context window. 'auto' automatically truncates old messages to fit, 'disabled' returns an error if context is too long. Defaults to 'disabled'.
userstringDEPRECATED: A unique identifier for the end-user. Use safety_identifier for policy violation detection and prompt_cache_key for caching instead. This field is maintained for backwards compatibility.
presencePenaltynumberControls the penalty applied to new tokens based on their existing frequency in the text. Higher values (e.g., 1.0) penalize new tokens that appear in the text more frequently, reducing repetition. Lower values (e.g., 0.0) allow more frequent tokens. Must be between 0 and 1. Defaults to 0.5 for balanced behavior.