/Release

New Features and Performance Improvements in Pydantic v2.7

Sydney Runkle avatar
Sydney Runkle
10 mins
2024/04/11

Pydantic v2.7 is now available! This release is our biggest since v2.0, with a focus on performance improvements and highly requested new features. This release also featured the work of over 30 new contributors! In this post, we'll cover the highlights of the release.

You can see the full changelog here.

Pydantic's JSON parser offers support for partial JSON parsing. This capability allows the parser to read input until it encounters invalid syntax, making a best-effort attempt to return a JSON object that accurately represents the valid portion of the input. Exposed via the from_json method, this feature is especially valuable for processing streaming outputs from Large Language Models (LLMs), which often generate partial JSON objects that traditional parsers cannot handle without errors.

One of the reasons that this feature is so helpful is that it's beneficial for validating LLM outputs. In particular, LLMs often return a partial JSON object that's not syntactically correct JSON. In the past, it wasn't possible to parse said response without a JSON parsing error. Now, you can enable partial JSON parsing to parse the response, and then subsequently validate the parsed object against a Pydantic model with model_validate.

Here's a simple example:

from pydantic_core import from_json

partial_json_data = '["aa", "bb", "c'  # (1)!

try:
    result = from_json(partial_json_data, allow_partial=False)
except ValueError as e:
    print(e)  # (2)!
    #> EOF while parsing a string at line 1 column 15

result = from_json(partial_json_data, allow_partial=True)
print(result)  # (3)!
#> ['aa', 'bb']
  1. The JSON list is incomplete - it's missing a closing "]
  2. When allow_partial is set to False (the default), a parsing error occurs.
  3. When allow_partial is set to True, part of the input is deserialized successfully.

You can learn more about integrating Pydantic with your LLM work from some of our blog posts.

For more information, check out the docs for this new feature!

Pydantic offers support for SecretStr and SecretBytes types, which are used to represent sensitive data. We've extended this support to include a generic Secret base type, which can be used to create custom secret types.

For example, you could create a SecretSalary type that wraps an integer salary value and and customizes the display of the secret value like so:

from datetime import date

from pydantic import BaseModel, Secret

class SecretSalary(Secret[int]):
    def _display(self) -> str:
        return '$******'


class Employee(BaseModel):
    name: str
    salary: SecretSalary


employee = Employee(name='John Doe', salary=100_000)

print(repr(employee))
#> Employee(name='John Doe', salary=SecretSalary('$******'))

print(employee.salary)
#> $******

print(employee.salary.get_secret_value())
#> 100000

If you're satisfied with a more generalized repr output, you can use this even more concise version, where the Secret type is directly parametrized with no need for the subclass:

from typing_extensions import Annotated

from pydantic import Secret, TypeAdapter

ta = TypeAdapter(Secret[int])

my_secret_int = ta.validate_python(123)
print(my_secret_int)
#> **********

print(my_secret_int.get_secret_value())
#> 123

This feature is incredibly extensible and can be used to create custom secret types for a wide variety of base types.

Explore the usage docs to learn more!

One of the most highly requested features in Pydantic (ever) is the ability to mark fields as deprecated. Thanks to the hard work of @Viicos, this feature has been realized!

Marking a field as deprecated will result in:

  1. A runtime deprecation warning emitted when accessing the field
  2. The deprecated parameter being set to true in the generated JSON schema

The deprecated field can be set to any of:

  • A string, which will be used as the deprecation message.
  • An instance of the warnings.deprecated decorator (or the typing_extensions backport).
  • A boolean, which will be used to mark the field as deprecated with a default 'deprecated' deprecation message.

Here's a simple example:

from pydantic import BaseModel, Field


class Model(BaseModel):
    deprecated_field: int = Field(deprecated=True)

print(Model.model_json_schema()['properties']['deprecated_field'])
#> {'deprecated': True, 'title': 'Deprecated Field', 'type': 'integer'}

The docs for this feature delve into more details about the various ways to mark and customize deprecated fields.

In v1, Pydantic used serialization with duck-typing by default. In an attempt to improve security, Pydantic v2 switched away from this approach.

In Pydantic v2.7, we've reintroduced serialization with duck typing as an opt-in feature via a new serialize_as_any runtime flag. This opt in feature was available in previous v2.X versions via the SerializeAsAny annotation, but that required annotating each field individually. The new serialize_as_any flag allows you to enable duck-typing serialization for all fields in a model with a single flag.

Here's an example showcasing the basic usage of the setting:

from pydantic import BaseModel, TypeAdapter


class User(BaseModel):
    name: str


class UserLogin(User):
    password: str


ta = TypeAdapter(User)
user_login = UserLogin(name='John Doe', password='some secret')

print(ta.dump_python(user_login, serialize_as_any=False))  # (1)!
#> {'name': 'John Doe'}

print(ta.dump_python(user_login, serialize_as_any=True))  # (2)!
#> {'name': 'John Doe', 'password': 'some secret'}
  1. This is the default behavior - fields not present in the schema are not serialized.
  2. With serialize_as_any set to True, fields not present in the schema are serialized.

We've upgraded the documentation for serialization with duck typing. This section, in particular, covers the new serialize_as_any runtime flag.

Pydantic previously supported context in validation, but not in serialization. With the help of @ornariece, we've added support for using a context object during serialization.

Here's a simple example, where we use a unit provided in the context to convert a distance field:

from pydantic import BaseModel, SerializationInfo, field_serializer

class Measurement(BaseModel):
    distance: float  # in meters

    @field_serializer('distance')
    def convert_units(self, v: float, info: SerializationInfo):
        context = info.context
        if context and 'unit' in context:
            if context['unit'] == 'km':
                v /= 1000 # convert to kilometers
            elif context['unit'] == 'cm':
                v *= 100  # convert to centimeters
        return v

measurement = Measurement(distance=500)

print(measurement.model_dump())  # no context
#> {'distance': 500.0}

print(measurement.model_dump(context={'unit': 'km'}))  # with context
#> {'distance': 0.5}

print(measurement.model_dump(context={'unit': 'cm'}))  # with context
#> {'distance': 50000.0}

This feature is powerful as it further extends Pydantic's flexibility and customization capabilities when it comes to serialization.

See the documentation for more information.

Pydantic uses PyO3 to connect our core Rust code to Python. This under the hood upgrade brings a significant performance improvement to Pydantic, as seen in these benchmarks.

For detailed information on the improvements and changes in PyO3 0.21, check out this blog post from David Hewitt, a Rust 🤝 Python expert!

Pydantic now uses SIMD instructions for integer and string JSON parsing on aarch64 (ARM) platforms.

enum validation and serialization logic was moved to pydantic-core, which is written in Rust. This migration results in a ~4x speedup for enum validation and serialization.

jiter, Pydantic's JSON parser, now has a fast path for creating ASCII Python strings. This change results in a ~15% performance improvement for Python string parsing.

Pydantic's JSON parser offers support for configuring how Python strings are cached during JSON parsing and validation. Memory usage increases slightly when caching strings, but it can improve performance significantly, especially in cases where certain strings are repeated frequently.

The cache_strings setting (in model config or as an argument to from_json) can take any of the following values:

  • True or 'all' (the default): cache all strings
  • 'keys': cache only dictionary keys
  • False or 'none': no caching

Learn more about this feature here.

With these new features and performance improvements, Pydantic v2.7 is the fastest and most feature-rich version of Pydantic yet. If you have any questions or feedback, please open a Github discussion. If you encounter any bugs, please open a Github issue.