31 December 2022
It’s that time of year again, when everyone is writing excellent summary articles about what happened this year, and goals for the next one. I figured I should do the same for rustdoc JSON.
If you haven’t heard of it yet, rustdoc JSON is an unstable feature for rustdoc that allows rustdoc to output a JSON description of a crates API, as opposed to the stable HTML output.
This allows tools to be written that reason about an API without them having to interface directly with the (even more unstable) rustc APIs, and that benefit from all the processing and cleanup that rustdoc does.
The biggest user-facing change has been the number of changes to the JSON Format itself. Version 1 11 to 23 were released this year 2.
Header
struct by:
ABI
an enum, instead of being stringly typed.HashSet<Qualifiers>
with 3 bools (const_
, unsafe_
, and async_
).ABI
field into Header
, as they always occur together.synthetic
(generated from impl Trait
in argument position).type_
over ty
.Trait
call it implementations
, (not implementors
), because the
Id’s are the impl
blocks, not the types that impl the trait.is_stripped
field to Module
.dyn Trait
as
a separate variant to the Type
enum. This allows HRTBs to be reported, and
is also more principled as it doesn’t use Type::ResolvedPath
for both
concrete types (struct, enum, union) and dyn traits.Path
struct, instead of using Type::ResolvedPath
.discriminant
of enum variants. 3#[doc(hidden)]
.Struct
’s
fields
inside the StructKind
enum, to better support ordering and
#[doc(hidden)]
.impls
to
Primitive
s.While doing this many changes (on average about 2 a month), may seem disruptive, there are many things that make it less of a burden for users:
cargo-check-external-types
first attempts to deserialize just the format
version, and bails if that doesn’t match. This means the user receives an
error about the version of nightly being wrong, which is much more useful and
actionable than an error about a missing or unknown JSON field.serde_json
works, adding a
field won’t break old code, nor will removing an enum variant. This means
that many of the smaller changes may not actually require users to update.Format version 16, introduced in #99287 merits its own discussion, as it was a much deeper change to how the format represents rust code, and fixed a lot more bugs.
The root of the problem is that each new file in rust is its own module
4. This means that if each type went in its own file (which was a pub mod
), then the type name is duplicated with the module name.
Eg if there’s a library called collections
that’s laid out like
collections/
├── Cargo.lock
├── Cargo.toml
└── src
├── lib.rs
├── list.rs
├── map.rs
└── set.rs
And written like
// collections/src/lib.rs
pub mod list;
pub mod set;
pub mod map;
// collections/src/list.rs
pub struct List;
// collections/src/set.rs
pub struct Set;
// collections/src/map.rs
pub struct Map;
Then users of the module see the paths like collections::list::List
, which
needlessly duplicates “list”. To avoid this, code like this tends to get
written as
// collections/src/lib.rs
mod list;
mod set;
mod map;
pub use list::List;
pub use set::Set;
pub use map::Map;
// collections/src/list.rs
pub struct List;
// collections/src/set.rs
pub struct Set;
// collections/src/map.rs
pub struct Map;
And the user now sees List
as collections::List
, which is much nicer. It is as if the library author instead just wrote
// collections/src/lib.rs
pub struct List;
pub struct Set;
pub struct Map;
But also allows the separate types to be in their own files, which is much nicer for the library author.
Rustdoc goes to a lot of effort to make the code with private mod
s and pub use
s look like it was all written in one file. In particular it sometimes
“inline“s items into the locations that they are use
d, by replacing a
pub use
of an item with the item being used.
While this is great for the HTML output, it caused boundless problems for JSON. The most canonical example is
mod style {
pub struct Color;
}
pub use style::Color;
pub use style::Color as Colour;
In HTML Output,
both Color
and Colour
are created as separate pages, with no indication that
they are the same item. In fact, it is the same result as if
pub struct Color;
pub struct Colour;
was written.
In JSON this would crash, as two different items were created with the same ID, triggering an assertion failure.
The fix for this in JSON is to not inline, and instead report the root module as having two items, both of which are imports of the same struct item. The struct item isn’t a member of any module, and is only accessible via the imports. While this would be an unacceptable UI issue for HTML, in JSON it’s better to report the true nature of the code than to try to clean it up with inlines.
Changing this fixed a major source of issues for rustdoc JSON, and make the output far less likely to ICE.
Another nice user-facing change this year was including the docs for std
(and
friends) as a rustup component. Because std
is special in that it isn’t built
like normal dependencies, but is magically made available by cargo and rustc, its
JSON 5 docs can’t be produced by cargo like they can for normal
dependencies. Therefore they need to be shipped by rustup.
rust-json-docs
to bootstrap
rust-docs-json
and try to add to rustup build-manifest./x doc library/core/ --json
panicking if HTML docs weren’t built.Making this work took several attempts, but now that this is all done, anyone
can run rustup component add --toolchain nightly rust-docs-json
to get the
docs for std
, alloc
, core
, test
, and proc_macro
in the
share/doc/rust/json/
directory of the rustup toolchain directory, and
automatically kept up to date with the nightly toolchain by rustup.
While these were the big user-facing improvements, there were also many internal improvements, particularly around the test tooling.
Rustdoc JSON is currently tested with two tools. The first, jsondocck
reads
comments from the files which contain assertions about the JSON output, and
checks that the output matches the assertions. The assertions are written in
JsonPath, and let you check that the
output has (and doesn’t have) the values that you expect.
Eg src/test/rustdoc-json/reexport/reexport_method_from_private_module.rs
currently looks like
// @set impl_S = "$.index[*][?(@.docs=='impl S')].id"
// @has "$.index[*][?(@.name=='S')].inner.impls[*]" $impl_S
// @set is_present = "$.index[*][?(@.name=='is_present')].id"
// @is "$.index[*][?(@.docs=='impl S')].inner.items[*]" $is_present
// @!has "$.index[*][?(@.name=='hidden_impl')]"
// @!has "$.index[*][?(@.name=='hidden_fn')]"
mod private_mod {
pub struct S;
/// impl S
impl S {
pub fn is_present() {}
#[doc(hidden)]
pub fn hidden_fn() {}
}
#[doc(hidden)]
impl S {
pub fn hidden_impl() {}
}
}
pub use private_mod::*;
It checks that the struct S
has an impl block whose only method is
is_present
, and that hidden_impl
and hidden_fn
aren’t mentioned.
Over this year, two major changes were landed to jsondocck
that make writing these tests much nicer.
@ismany
to jsondocck
to do a setwise comparison.jsondocck
.Between them, they mean a test like:
struct S
/// the impl
impl S {
pub fn foo() {}
pub fn bar() {}
}
// @set foo = name_of_test.rs "$.index[*][?(@.name=='foo')].id"
// @set bar = - "$.index[*][?(@.name=='foo')].id"
// @count - "$.index[*][?(@.docs=='the impl')].inner.items[*]" 2
// @has - "$.index[*][?(@.docs=='the impl')].inner.items[*]" $foo
// @has - "$.index[*][?(@.docs=='the impl')].inner.items[*]" $bar
can be rewritten to be
// @set foo = "$.index[*][?(@.name=='foo')].id"
// @set bar = "$.index[*][?(@.name=='foo')].id"
// @ismany "$.index[*][?(@.docs=='the impl')].inner.items[*]" $foo $bar
which is much nicer.
The other tool that’s used is one that checks that all Id
s mentioned are
present in the index (or paths). Originally this was a python script called
check_missing_items.py
, but in
#101809, it was replaced with
jsondoclint
, a rust rewrite. This had many advantages, such as being able to
use rustdoc-json-types
to keep up with format changes, and exhaustively matching
on kinds, leading to more bugs being caught.
Interestingly, these bugs all had to be fixed before the tool could be landed,
and in doing so, check_missing_items.py
was fixed so it could catch them if
they regressed before jsondoclint
landed. Despite this, it was still great to
get rid of it, and replace it with a much more maintainable tool.
However, with any big rewrite, there were bound to be bugs, and this was no
exception. In particular, a number of false positives were introduced for code
patterns not covered by the test suite. They were only unearthed when
jsondoclint
was run on core.json
, which isn’t currently done in CI, but
should be. 6. These were fixed, and tests were added.
Typedef
as valid kind for Type::ResolvedPath
use
ing enum variants and glob use
ing enums.Another longstanding issue that was partially addressed this year is the
relative lack of tests. This year the rustdoc-json
suite has grown from 26 to
98 tests 7. (For what it’s worth, in the same time period, the main
rustdoc suite 8 went from 484 to 586 tests.)
This was addressed in part with dedicated test adding PRs 9, but mainly due to good habits of always adding tests when changing behaviour that we were lucky to inherit from the wider rust project.
The final change for 2022 was the vast, vast number of bug fixes 10. The fact that we were able to make so many fixes is a testament to how many users are reporting issues. This is mainly driven by tools that make use rustdoc JSON, and in paticular cargo-public-api and cargo-semver-checks have driven a lot more eyes towards the code.
The other major source of bug reports was running with
crater, which while it can
only find assertion failures, makes up for this with sheer volume. One thing I
want to look into next year is running the jsondoclint
tool in crater, so it
can catch missing IDs, instead of just internal assertions failing.
2022 was a good year for rustdoc JSON. The format is better; The code is more reliable; The tests are more numerous and easier to write; There are more users depending on it. All this was made possible by many people working on and around the format. In particular, I’d like to thank Alex Kladov, Didrik Nordström, Guillaume Gomez, Jacob Hoffman-Andrews, Joseph Ryan, Jynn Nelson, León Orell Valerian Liehr, Luca Palmieri, Martin Nordholts, Matthias Krüger, Michael Goulet, Michael Howell, Noah Lev, Predrag Gruevski, QuietMisdreavus, Rune Tynan, Tyler Mandry, and Urgau for their invaluable contributions.
Hopefully next year we can continue to improve at this solid pace. My main goal
is to improve the way cross-crate ID lookup works, but there’s also more work to
be done to fix more bugs, further flesh out the test suite, and increase
performance. I’ll write more about these in a future post.
EDIT(2023-09-06): This ended up being an issue, not a post. You can find it here
If you want to hear about that when it comes out, or just generally want to be notified the next time I have something to share online, you can find me in the Fediverse @aDot@treehouse.systems. If you have questions or comments on this post, I’d love to hear them on github.
At the root level of the output, there’s a field called
format_version
, which gets increased by 1 every time we change the
definition of the types that get serialized.↩
Assuming nothing gets released on New Year’s Eve.↩
It turns out this support isn’t great. While writing this post, I realised that we only support discriminants on unit variants. This restriction has been lifted, and I’ve filled an issue and intend to fix it in the new year.↩
Or HTML for that matter. Rustup has long distributed html docs for std
as a component.↩
Hopefully I’ll talk more about this in an upcoming post about my goals for rustdoc JSON next year.↩
Measured on
bbdca4c
(most recent commit as of the time of writing) and
1e6ced3
(last change to src/test/rustdoc-json
in 2021). Number of tests measured
with fd -e rs | rg -v "auxiliary" | wc -l
.↩
This is only for the src/test/rustdoc/
suite, and doesn’t
include ui, gui and std-json. But these are much smaller, and I’m trying to
make a point about the rate of growth and the size of a mature test suite,
not provide exact numbers. ↩
#93660 , #94861 , #98166 , #98548 , #99479 , #101634 , #101701 , #103065 , #105027 , #105063↩
#92860, #93132, #93954, #97599, #98053, #98195, #98390, #98577, #98611, #98681, #100299, #100325, #100582, #100630, #101106, #101204, #101633, #101722, #101770, #101914, #103653, #105182↩