Home

Rustdoc JSON in 2023

31 December 2023

It’s been another year again, and folks are writing about what happened on their project this year, and looking forward to next year. I did this last year, and it was a super useful exercise for me personally, and also a good resource of what’s changed, without having to trawl through GitHub.

What’s Rustdoc JSON

(note: You can skip this section if you’re already familiar with it.)

Rustdoc JSON is a unstable feature for rustdoc that allows generating machine readable JSON output describing the API of a crate (instead of the normal human readable HTML output). If you think of rustdoc like a compiler from a crate to a description of its API 1, this is an alternative target.

This allows tools to reason mechanically about an API in rust. It’s the underlying data source for roogle, cargo-check-external-types, pavex, and many more.

Format Changes

The most user-facing changes this year were changes to the JSON format itself. We made 5 of them:

  1. #106354: Variant was split into Variant and VariantKind, so the enum discriminant can always be reported. Previously, it could only be reported for a plain enum variant (i.e. one with no fields or braces).
  2. #109410: Support inherent associated types.
  3. #111427: Serialize all enums using external tagging. This changed the JSON representation of the data, but it’s the same after deserialization into rust values. Doing this is more consistent 2, and allows (de)serializing to non-self-describing formats, such as postcard and bincode3. This can give a significant performance improvement. Special thanks to Luca Palmieri for the heroics in landing this, as it required changing every file in the test suite.
  4. #115078: Rename Typedef to TypeAlias to be more in line with standard terminology.
  5. #119246: Add is_object_safe field to Trait.

That makes this year significantly more stable than last year, where we changed the format 13 times.

rustdoc-types release scare, and ownership transfer.

On the opposite end of the spectrum, the changes to the rustdoc-types crate shouldn’t be user-visible at all (touch wood), but are no less important for the long-term health of the project.

The canonical, upstream definition of the rustdoc-json format lives in src/rustdoc-json-types/ of the rust repo. It’s used as a dependency by librustdoc, and some in-tree test tooling. However, it can’t be directly used by 3rd party code, as it isn’t packaged here at all.

To ease adoption, I created the rustdoc-types crate. It’s a somewhat automated repackaging of the in-tree rustdoc-json-types crate onto crates.io. Most consumers (AFAICT) of rustdoc-json do so via this crate. However, despite its importance, it’s a personal project. It lives in my github account, and only I have the permissions to publish new versions to crates.io.

This is mostly transparent to users, who can think of rustdoc-types as being the same as canonical in-tree representation of the format. However, this relies on the shell script being run to update and publish the crate. Normally this isn’t a problem, as I tend to review all the changes to rustdoc-json, and am automatically pinged when someone makes a PR changing it.

However, we risked breaking this illusion with #115078. It got into the merge queue the night before I was about to leave for a week long camping trip. If it’d been merged while I was in a field with no internet, a new version of rustdoc-types wouldn’t be published, and users would be broken. Fortunately for us, the bors queue was quick that night, so I could publish the new version in morning, just before I left. However this was a close call, and no-one’s eager for it to happen again.

After some discussion on Zulip, we decided the right thing to do would be to make rustdoc-types owned by the Rustdoc Team (instead of me personally). This means that someone else would be able to make releases if I can’t for whatever reason. It also provides succession planning for when I inevitably stop working on rust at some point.

To do this, I’ve written and opened an RFC, which contains motivation, as well as the logistical details of the ownership transfer. Once this gets merged, and the crate gets moved, we shouldn’t have to worry about this happening again.

Good Chats on Zulip

This section is half so you can see how the sausage gets made with designing stuff, and half so I can find these links easier in the future.

The Metaformat, and documenting signatures that rely on nightly features

Link. The core question here is “How much support should rustdoc JSON give to nightly language features”, and “How should they be versioned”. The conclusion is that for now, we should version them like regular language features (to keep format_version as an unambiguous description of which schema was used to serialize).

The main idea that came out of this was the idea of the “metaformat”. Rustdoc JSON has had a number of different formats over the years, but the way they were versioned and released has been the same. The idea here is that in addition to thinking about the format we stabilize, we should also think how we can change the format after stabilization. The metaformat refers to the design of a series of formats that use the same mechanism for communicating changes 4.

The conclusion is that the existing metaformat is fine for now, but probably not suitable for stabilization.

Changing the metaformat is much more disruptive to the ecosystem than changing the format. This is because existing tools rely on the metaformat to detect if they’re using the correct format and even support multiple format versions at once. Therefore, we can be quite free to change the format, as the ecosystem is used to it, and has mechanisms to minimize disruption. But a metaformat change would break all these mechanisms, and would be much more unexpected. Therefore, we should aim to only change metaformat once, and to a metaformat that we believe we can stabilize.

Stabilization Requirements

We also talked a couple of times 5 about the path to stabilization. The core blockers are

  1. Long term metaformat, that allows adding new language concepts (that don’t exist yet) without breaking users.
  2. Reliable cross-crate ID lookup. (See the first part of this issue) for details.
  3. Move rustdoc-types into T-Rustdoc ownership.
  4. Ensure everything’s fully documented.
  5. Ensure core (and popular crates) produces correct output under jsondoclint 6

The first two will require significant design work. The rest is clearer on how to do them, but may well also throw wrench into the works. I don’t want to speculate on a timeline, but I’d not hold my breath on all this getting done in anything less than ~2 years.

cargo-semver-checks/trustfall Test Suite

Rustdoc JSON has a test suite that’s built using JSONPath to write assertions about the contents of the JSON. Someone was wondering if it made sense to complement this with trustfall driven tests, potentially based on the cargo-semver-checks or trustfall-rustdoc-adaptor suites.

We concluded that this wouldn’t be a good idea, as it would require all format changes to also rewrite the trustfall code, which would add a significant barrier. In addition, the higher-level invariant checks (that can be run on every document, IE not asserts for specific items presence) can already be written inside of jsondoclint, which is much simpler to understand and modify.

Conclusion

??? IDK ???. Some stuff happened in 2023. Some of it was Rustdoc JSON related. Does this post even need a conclusion? Probably.

If you have questions or comments on this post, I’d love to here them. You can reply to me on the Fediverse, open a discussion on GitHub, or send me an email

Thanks to jyn and Predrag for their feedback on drafts of this post. Any and all mistakes are solely my own.


  1. I find this is the most helpful way to think about rustdoc. It’s an alternative backend for rustc, albeit one that forks off much earlier in the compilation pipeline, and doesn’t produce executables/libraries.

  2. Previously we had an ad-hoc mix of 3 different ways of serializing enums to JSON.

  3. To be clear, their are no plans for rustdoc itself to emit a binary format. However, it allows 3rd party tools to easily convert the JSON to some other format, that they themselves can load.

  4. That would make the current (and only so far) metaformat be “we have a field called format_version as the root of the JSON object, that is incremented on every change”.

  5. This latter one was in 2022, but it’s still relevant today, so cut me some slack.

  6. A testing tool to find dangling ID’s and other invalid output in Rustdoc JSON output.


GitHub Mastodon Email RSS