Make LLM answer as JSON (with optional schema)

This functions wraps a prompt with settings that ensure the LLM response is a valid JSON object, optionally matching a given JSON schema.

The function can work with all models and providers through text-based handling, but also supports native settings for the OpenAI and Ollama API types. (See argument 'type'.) This means that it is possible to easily switch between providers with different levels of JSON support, while ensuring the results will be in the correct format.

Usage

answer_as_json(
  prompt,
  schema = NULL,
  schema_strict = FALSE,
  schema_in_prompt_as = c("example", "schema"),
  type = c("text-based", "auto", "openai", "ollama", "openai_oo", "ollama_oo")
)

Arguments

prompt

A single string or a tidyprompt() object

schema

A list which represents a JSON schema that the response should match.See example and your API's documentation for more information on defining JSON schemas. Note that the schema should be a list (R object) representing a JSON schema, not a JSON string (use jsonlite::fromJSON() and jsonlite::toJSON() to convert between the two)

schema_strict

If TRUE, the provided schema will be strictly enforced. This option is passed as part of the schema when using type type "openai" or "ollama", and when using the other types it is passed to jsonvalidate::json_validate()

schema_in_prompt_as

If providing a schema and when using type "text-based", "openai_oo", or "ollama_oo", this argument specifies how the schema should be included in the prompt:

"example" (default): The schema will be included as an example JSON object (tends to work best). r_json_schema_to_example() is used to generate the example object from the schema
"schema": The schema will be included as a JSON schema

type

The way that JSON response should be enforced:

"text-based": Instruction will be added to the prompt asking for JSON; when a schema is provided, this will also be included in the prompt (see argument 'schema_in_prompt_as'). JSON will be parsed from the LLM response and, when a schema is provided, it will be validated against the schema with jsonvalidate::json_validate(). Feedback is sent to the LLM when the response is not valid. This option always works, but may in some cases may be less powerful than the other native JSON options
"auto": Automatically determine the type based on 'llm_provider$api_type'. This does not consider model compatibility and could lead to errors; set 'type' manually if errors occur; use 'text-based' if unsure
"openai" and "ollama": The response format will be set via the relevant API parameters, making the API enforce a valid JSON response. If a schema is provided, it will also be included in the API parameters and also be enforced by the API. When no schema is provided, a request for JSON is added to the prompt (as required by the APIs). Note that these JSON options may not be available for all models of your provider; consult their documentation for more information. If you are unsure or encounter errors, use "text-based"
"openai_oo" and "ollama_oo": Similar to "openai" and "ollama", but if a schema is provided it is not included in the API parameters. Schema validation will be done in R with jsonvalidate::json_validate(). This can be useful if you want to use the API's JSON support, but their schema support is limited

Note that the "openai" and "ollama" types may also work for other APIs with a similar structure

Value

A tidyprompt() with an added prompt_wrap() which will ensure that the LLM response is a valid JSON object

Examples

base_prompt <- "How can I solve 8x + 7 = -23?"

# This example will show how to enforce JSON format in the response,
#   with and without a schema, using the 'answer_as_json()' prompt wrap.
# If you use type = 'auto', the function will automatically detect the
#   best way to enforce JSON based on the LLM provider you are using.
# Note that the default type is 'text-based', which will work for any provider/model

#### Enforcing JSON without a schema: ####

if (FALSE) { # \dontrun{
  ## Text-based (works for any provider/model):
  #   Adds request to prompt for a JSON object
  #   Extracts JSON from textual response (feedback for retry if no JSON received)
  #   Parses JSON to R object
  json_1 <- base_prompt |>
    answer_as_json() |>
    send_prompt(llm_provider_ollama())
  # --- Sending request to LLM provider (llama3.1:8b): ---
  # How can I solve 8x + 7 = -23?
  #
  # Your must format your response as a JSON object.
  # --- Receiving response from LLM provider: ---
  # Here is the solution to the equation formatted as a JSON object:
  #
  # ```
  # {
  #   "equation": "8x + 7 = -23",
  #   "steps": [
  #     {
  #       "step": "Subtract 7 from both sides of the equation",
  #       "expression": "-23 - 7"
  #     },
  #     {
  #       "step": "Simplify the expression on the left side",
  #       "result": "-30"
  #     },
  #     {
  #       "step": "Divide both sides by -8 to solve for x",
  #       "expression": "-30 / -8"
  #     },
  #     {
  #       "step": "Simplify the expression on the right side",
  #       "result": "3.75"
  #     }
  #   ],
  #   "solution": {
  #     "x": 3.75
  #   }
  # }
  # ```


  ## Ollama:
  #   - Sets 'format' parameter to 'json', enforcing JSON
  #   - Adds request to prompt for a JSON object, as is recommended by the docs
  #   - Parses JSON to R object
  json_2 <- base_prompt |>
    answer_as_json(type = "auto") |>
    send_prompt(llm_provider_ollama())
  # --- Sending request to LLM provider (llama3.1:8b): ---
  # How can I solve 8x + 7 = -23?
  #
  # Your must format your response as a JSON object.
  # --- Receiving response from LLM provider: ---
  # {"steps": [
  #   "Subtract 7 from both sides to get 8x = -30",
  #   "Simplify the right side of the equation to get 8x = -30",
  #   "Divide both sides by 8 to solve for x, resulting in x = -30/8",
  #   "Simplify the fraction to find the value of x"
  # ],
  # "value_of_x": "-3.75"}


  ## OpenAI-type API without schema:
  #   - Sets 'response_format' parameter to 'json_object', enforcing JSON
  #   - Adds request to prompt for a JSON object, as is required by the API
  #   - Parses JSON to R object
  json_3 <- base_prompt |>
    answer_as_json(type = "auto") |>
    send_prompt(llm_provider_openai())
  # --- Sending request to LLM provider (gpt-4o-mini): ---
  # How can I solve 8x + 7 = -23?
  #
  # Your must format your response as a JSON object.
  # --- Receiving response from LLM provider: ---
  # {
  #   "solution_steps": [
  #     {
  #       "step": 1,
  #       "operation": "Subtract 7 from both sides",
  #       "equation": "8x + 7 - 7 = -23 - 7",
  #       "result": "8x = -30"
  #     },
  #     {
  #       "step": 2,
  #       "operation": "Divide both sides by 8",
  #       "equation": "8x / 8 = -30 / 8",
  #       "result": "x = -3.75"
  #     }
  #   ],
  #   "solution": {
  #     "x": -3.75
  #   }
  # }
} # }



#### Enforcing JSON with a schema: ####

# Make a list representing a JSON schema,
#   which the LLM response must adhere to:
json_schema <- list(
  name = "steps_to_solve", # Required for OpenAI API
  description = NULL, # Optional for OpenAI API
  schema = list(
    type = "object",
    properties = list(
      steps = list(
        type = "array",
        items = list(
          type = "object",
          properties = list(
            explanation = list(type = "string"),
            output = list(type = "string")
          ),
          required = c("explanation", "output"),
          additionalProperties = FALSE
        )
      ),
      final_answer = list(type = "string")
    ),
    required = c("steps", "final_answer"),
    additionalProperties = FALSE
  )
  # 'strict' parameter is set as argument 'answer_as_json()'
)
# Note: when you are not using an OpenAI API, you can also pass just the
#   internal 'schema' list object to 'answer_as_json()' instead of the full
#   'json_schema' list object

# Generate example R object based on schema:
r_json_schema_to_example(json_schema)
#> $steps
#> $steps[[1]]
#> $steps[[1]]$explanation
#> [1] "..."
#> 
#> $steps[[1]]$output
#> [1] "..."
#> 
#> 
#> 
#> $final_answer
#> [1] "..."
#> 

if (FALSE) { # \dontrun{
  ## Text-based with schema (works for any provider/model):
  #   - Adds request to prompt for a JSON object
  #   - Adds schema to prompt
  #   - Extracts JSON from textual response (feedback for retry if no JSON received)
  #   - Validates JSON against schema with 'jsonvalidate' package (feedback for retry if invalid)
  #   - Parses JSON to R object
  json_4 <- base_prompt |>
    answer_as_json(schema = json_schema) |>
    send_prompt(llm_provider_ollama())
  # --- Sending request to LLM provider (llama3.1:8b): ---
  # How can I solve 8x + 7 = -23?
  #
  # Your must format your response as a JSON object.
  #
  # Your JSON object should match this example JSON object:
  #   {
  #     "steps": [
  #       {
  #         "explanation": "...",
  #         "output": "..."
  #       }
  #     ],
  #     "final_answer": "..."
  #   }
  # --- Receiving response from LLM provider: ---
  # Here is the solution to the equation:
  #
  # ```
  # {
  #   "steps": [
  #     {
  #       "explanation": "First, we want to isolate the term with 'x' by
  #       subtracting 7 from both sides of the equation.",
  #       "output": "8x + 7 - 7 = -23 - 7"
  #     },
  #     {
  #       "explanation": "This simplifies to: 8x = -30",
  #       "output": "8x = -30"
  #     },
  #     {
  #       "explanation": "Next, we want to get rid of the coefficient '8' by
  #       dividing both sides of the equation by 8.",
  #       "output": "(8x) / 8 = (-30) / 8"
  #     },
  #     {
  #       "explanation": "This simplifies to: x = -3.75",
  #       "output": "x = -3.75"
  #     }
  #   ],
  #   "final_answer": "-3.75"
  # }
  # ```


  ## Ollama with schema:
  #   - Sets 'format' parameter to 'json', enforcing JSON
  #   - Adds request to prompt for a JSON object, as is recommended by the docs
  #   - Adds schema to prompt
  #   - Validates JSON against schema with 'jsonvalidate' package (feedback for retry if invalid)
  json_5 <- base_prompt |>
    answer_as_json(json_schema, type = "auto") |>
    send_prompt(llm_provider_ollama())
  # --- Sending request to LLM provider (llama3.1:8b): ---
  # How can I solve 8x + 7 = -23?
  #
  # Your must format your response as a JSON object.
  #
  # Your JSON object should match this example JSON object:
  # {
  #   "steps": [
  #     {
  #       "explanation": "...",
  #       "output": "..."
  #     }
  #   ],
  #   "final_answer": "..."
  # }
  # --- Receiving response from LLM provider: ---
  # {
  #   "steps": [
  #     {
  #       "explanation": "First, subtract 7 from both sides of the equation to
  #       isolate the term with x.",
  #       "output": "8x = -23 - 7"
  #     },
  #     {
  #       "explanation": "Simplify the right-hand side of the equation.",
  #       "output": "8x = -30"
  #     },
  #     {
  #       "explanation": "Next, divide both sides of the equation by 8 to solve for x.",
  #       "output": "x = -30 / 8"
  #     },
  #     {
  #       "explanation": "Simplify the right-hand side of the equation.",
  #       "output": "x = -3.75"
  #     }
  #   ],
  #   "final_answer": "-3.75"
  # }

  ## OpenAI with schema:
  #   - Sets 'response_format' parameter to 'json_object', enforcing JSON
  #   - Adds json_schema to the API request, API enforces JSON adhering schema
  #   - Parses JSON to R object
  json_6 <- base_prompt |>
    answer_as_json(json_schema, type = "auto") |>
    send_prompt(llm_provider_openai())
  # --- Sending request to LLM provider (gpt-4o-mini): ---
  # How can I solve 8x + 7 = -23?
  # --- Receiving response from LLM provider: ---
  # {"steps":[
  # {"explanation":"Start with the original equation.",
  # "output":"8x + 7 = -23"},
  # {"explanation":"Subtract 7 from both sides to isolate the term with x.",
  # "output":"8x + 7 - 7 = -23 - 7"},
  # {"explanation":"Simplify the left side and the right side of the equation.",
  # "output":"8x = -30"},
  # {"explanation":"Now, divide both sides by 8 to solve for x.",
  # "output":"x = -30 / 8"},
  # {"explanation":"Simplify the fraction by dividing both the numerator and the
  # denominator by 2.",
  # "output":"x = -15 / 4"}
  # ], "final_answer":"x = -15/4"}
} # }

Usage

Arguments

Value

See also

Examples