[engine] Native scheduler implementation

Review Request #4270 — Created Sept. 28, 2016 and submitted

3821, 4030
benjyw, jsirois, mateor, nhoward_tw, yujiec

This is the preliminary review of a native (in rust) implementation of the inner loop of the v2 engine.

See the Pants Native Engine document for motivation for the use of an additional language.

  • Add a native implementation of most of {scheduler,nodes,selectors}.py, with access to the native binary provided by binary_utils in engine/subsystems/native.py
  • Replace legacy/graph.py's usage of ProductGraph.walk with an explicit request for transitive dependencies implemented via SelectDependencies(.., transitive=True). This avoids leaking the implementation details of the ProductGraph, and keeps the API boundary smaller.

The interface between the native code and python is largely defined in src/rust/engine/src/lib.rs and src/python/pants/engine/subsystem/native.py, which (respectively) define the C-compatible public API, and the CFFI definitions to consume it.


See https://github.com/pantsbuild/pants/labels/native for a list of tests that are being temporarily disabled in order to expedite landing this.

  1. I'm wicked excited about this! That said, I fly out of town tomorrow and don't get back until late on 10/5 so I'm not going to get a chance to review this in the short term. I'll step back from this review as a result but I'll be eagerly looking at the code when I get back to my desk.
  1. Fine, I will learn enough Rust to review this... :)

    Meanwhile, I added some comments to the google doc.

  1. incredibly, incredibly stellar work on this Stu! overall, looks awesome - the rust code is very clean and clear mod the initial overhead of learning rust's syntax and mechanics (which wasn't bad given their docs and the prior frame of reference from the python implementation). really liking rust and stoked about the potential it brings to the table for pants overall.

    had a handful of misc comments and one general ask: overall, I found some sections of the rust code to be a little undercommented for my taste. I think esp since rust is super new to most folks in the pants' community, it may make sense to edify this a bit and err on the side of being overly verbose - particularly for some of the "small footprint that packs a lot of punch" statements. there were at least a few times when a naturally stated comment could've concisely explained something that otherwise took a bit of referencing to piece together, so overall it seems like more healthy commenting/docstring summary style comments would go a long way in making the code more approachable for rust newcomers.

    also, it looks like this change will regress the engine a couple of areas (traces, possibly invalidation? etc), so it'd be great to carve out tickets for those to track the gaps that we'll need to backfill.


  2. pants.ini (Diff revision 2)

    could this get sync'd out to a bintray dir for general use vs hardcoding your home dir?

    1. Better - in follow-up short-term changes a prep_command can be used in the right places to run rustc. This might be inspiration - uses prep_commands to ensure aurora's custom thrift binary is compiled and used: https://github.com/apache/aurora/compare/master...jsirois:jsirois/pants/thrift/local_binary?diff=split&expand=1&name=jsirois%2Fpants%2Fthrift%2Flocal_binary

  3. src/python/pants/engine/engine.py (Diff revision 2)


  4. src/python/pants/engine/engine.py (Diff revision 2)

    any specific reason for halving this?

    2*cpu seems like a fairly common practice to take advantage of modern systems w/ hyperthreading etc - but maybe there's added overhead to this approach w/ the engine?

    1. Deleted in favor of doubling down on moving the engine loop to rust: https://github.com/pantsbuild/pants/issues/4024

  5. src/python/pants/engine/fs.py (Diff revision 2)

    consider functools.update_wrapper here and above (unless you specifically need alternate naming):

    p = functools.partial(func, project_tree)
    functools.update_wrapper(p, func)
    return p
    1. Thanks, good to know... I do want the explicit naming here though.

  6. src/python/pants/engine/legacy/graph.py (Diff revision 2)

    probably another candidate for assert + AssertionError?

  7. src/python/pants/engine/scheduler.py (Diff revision 2)

    nit: wrapped param indents +/-1 throughout

  8. src/python/pants/engine/scheduler.py (Diff revision 2)

    this might make more sense as an assert?

  9. src/python/pants/engine/scheduler.py (Diff revision 2)


    1. This is now implemented.

  10. src/python/pants/engine/scheduler.py (Diff revision 2)

    this comment is still probably useful?

    1. Agreed... restored.

  11. src/python/pants/engine/scheduler.py (Diff revision 2)

    I'm guessing this is useful during development, but maybe you could just wire it up to an env var that would dump this to the specified file ala PANTS_PROFILE?

    1. Added a subsystem.native option to visualize each run into a specified directory (--native-engine-visualize-to=$dir)... ie:

      PANTS_NATIVE_ENGINE_VISUALIZE_TO=$dir ./pants --enable-v2-engine list ::
  12. src/python/pants/engine/selectors.py (Diff revision 2)

    nit: indent

  13. should these be using cffi "new style callbacks" (http://cffi.readthedocs.io/en/latest/using.html#extern-python-new-style-callbacks) vs the "legacy" style?

    1. Maybe... talked about this a bit here: https://docs.google.com/document/d/1C64MreDeVoZAl3HrqtWUVE-qnj3MyWW0NQ52xejjaWw/edit#heading=h.of0mlrd2gsjd but it imposes another build step, so it's not a crystal clear choice. Opened https://github.com/pantsbuild/pants/issues/4036

  14. src/python/pants/engine/subsystem/native.py (Diff revision 2)

    the general idiom for using reserved words/builtins is a _ suffix, e.g. id_ vs _id - here, it looks like you're trying to use the "private" convention which reads odd for locally scoped vars.

    1. Thanks. Think I got all of these.

  15. src/python/pants/engine/subsystem/native.py (Diff revision 2)

    should this preserve order in the case of merge=True?

    1. Good eye! We had a unit test that it was causing to fail, so it was fixed before merging. =D

  16. src/rust/engine/README.md (Diff revision 2)

    might be useful to quickly doc how to install rust (and thus cargo) here for folks who may not be familiar.

    1. The builtin bootstrap that John implemented should handle this in the default case. There is more to do around releases, but John is going to tackle that on https://github.com/pantsbuild/pants/issues/4035

  17. src/rust/engine/src/core.rs (Diff revision 2)

    is this necessary given the bare block?

    if so, what purpose does it serve?

    1. I looked it up. Eq has no methods. It's a marker to the compiler that comparisons for that type are equivalence relations, ie that == and != are strict inverses, and are reflexive, symmetric and transitive. You can have the compiler do it for you using [#derive(Eq)], but that requires all of the fields to be PartialEq.


    2. Yep!

  18. src/rust/engine/src/graph.rs (Diff revision 2)


  19. src/rust/engine/src/graph.rs (Diff revision 2)

    should these other string literals get the b"" treatment too?

    1. The other ones are preprocessed by format_args! to convert them into dynamically sized byte strings... in cases where the b is prefixed, it is because the compiler requires it (since the method requires raw/ascii bytes). Strings may not be easy in rust, but they're hard to do wrong!

  20. src/rust/engine/src/lib.rs (Diff revision 2)


  21. src/rust/engine/src/lib.rs (Diff revision 2)


  22. src/rust/engine/src/lib.rs (Diff revision 2)

    it's not clear how well println would work in the case of e.g. the daemon + stdio redirection - could this pass the error in some form back across the calling boundary and let the python side decide how to handle?

    1. Yea, it should. Will add a TODO here, as I don't think the convention for errors at the boundary is established yet... relates to https://github.com/pantsbuild/pants/issues/4025 prolly.

  23. src/rust/engine/src/nodes.rs (Diff revision 2)

    I wonder if it'd make sense to adopt a styleguide policy of explicit return in these cases (e.g. as used below) just for consistency/readability sake?

    1. Not really in favor... there are already multiple signals that this is a return value. For one: no semi-colon. This value could only be being returned here, as otherwise the compiler would complain about a statement being used in a position that is expecting a value.

  24. src/rust/engine/src/scheduler.rs (Diff revision 2)


    1. The blocks of a match statement always end in commas (the compiler enforces it).

  25. src/rust/engine/src/selectors.rs (Diff revision 2)


  2. src/rust/engine/src/core.rs (Diff revision 2)
    ... and we don't want to box these off to the heap?
    1. Yea, but the issue is more that it is infinitely recursive. The type of a type of a type, etc.

  3. src/rust/engine/src/core.rs (Diff revision 2)

    How about leveraging Default?, This does it I think, a bit more standard, saves some code:

    #[derive(Clone, Copy, Debug, Default, Eq, Hash, PartialEq)]
    pub struct Id {
        key: u64,
    pub type TypeId = Id;
    #[derive(Clone, Copy, Debug)]
    pub struct Value {
        handle: *const libc::c_void,
        type_id: TypeId,
    impl Default for Value {
        fn default() -> Self {
            Value {
                handle: ptr::null() as *const libc::c_void,
                type_id: TypeId::default(),
    #[derive(Clone, Copy, Debug, Default)]
    pub struct Key {
        key: Id,
        value: Value,
        type_id: TypeId,
    1. Ironically, I think I like default more for this case than for the scheduler/graph/etc, despite the fact that a "default" key could potentially be ambiguous. Unsure why! But yea, will add.

  4. src/rust/engine/src/core.rs (Diff revision 2)
    Talked about this offline, but it sounds like these accessors are intentional to buy future flexibility - may make sense to `#[inline]` at some point but it'd also be nice to not go down the road too early and be looking over our shoulders at ghosts all the time.
  5. src/rust/engine/src/externs.rs (Diff revision 2)
    2x /,/, / based on formatting in core.rs anyway.
    1. I wonder if early adoption of https://github.com/rust-lang-nursery/rustfmt or similar would make sense?

    2. Done.

  6. src/rust/engine/src/graph.rs (Diff revision 2)
    It looks like this can derive Default. Its not much, but `ensure_entry_internal` could save some boilerplate and only specify the 2 non-default fields in the new entry case if this had a Default impl.
    1. AFAICT, to use Default, there must be a fully specified set of defaults, which you then selectively replace:

      let options = SomeOptions { foo: 42, ..Default::default() };

      I don't think there is a reasonable default Node, so I think that Default would be a stretch here.

  7. src/rust/engine/src/graph.rs (Diff revision 2)
    Easier for now, or do you have a method you're using to determine when to bail? This is public, so my gut is wary of unchecked unwraps by default.
    1. The implication was that EntryIds are only "created" by the graph, and so given an EntryId, it must have come from the graph (rather than just being a random value generated somewhere else). But since this is just a type alias for usize, I don't think we're doing enough to enforce that. Will change EntryId to a struct tuple with private fields, to help enforce that it isn't just some random value.

      It is still possible that graph/filesystem invalidation happening outside of the scheduler lock could mean that an entry does not exist: https://github.com/pantsbuild/pants/blob/40c214aa1c044a02b5290aa3ec7cc9570dbe24ad/src/python/pants/engine/scheduler.py#L61-L62 ... but I don't know of a good technique to defend against that yet.

  8. src/rust/engine/src/lib.rs (Diff revision 2)
    Why can't these be done inline above where RawNodes is initialized?
    1. This is an unsafe operation, doing a thing that is not otherwise possible in rust. AFAIK, it is impossible for a rust struct to safely reference itself, as that causes a cycle in the ownership graph.

      Because we need a raw pointer to a value in the struct, we start by initializing it to a meaningless value (since we don't know where it will be in memory during struct construction), and then afterwards we update it to be valid. Will expand this comment.

  9. src/rust/engine/src/lib.rs (Diff revision 2)
    The Box functions used below pretty well stand in for the comment.
  2. src/rust/engine/src/nodes.rs (Diff revision 2)
    I have no useful advice quickly, but is there not any way to collapse the match here and those below with the right use of default trait implementations and parameterization?
  3. src/rust/engine/src/scheduler.rs (Diff revision 2)
    Can Scheduler derive Default?
  1. Thanks Stu - happy to see this get going and land when you're ready.
  2. src/rust/engine/README.md (Diff revision 2)
    This link is bad.
    1. Fixing in followup.

  2. dylib is OSX-specific - lib?
  3. Bot why do this at all and not just use the globally configured values? This is a (platform specific) binary, its just not executable.
    1. Hm... this was cloned from another user of binary_util... it might not be necessary?

    2. I suspect not.
    3. Erm, no its needed - sorry about that.  But sticking with 'bin/native-engine' or 'lib/native-engine' would still be better than using 'dylib'
  2. build-support/bin/ci.sh (Diff revision 8)
    Right or wrong, running through the pex was intentional to prove out pants in the form used by Twitter/Square (and maybe Foursquare - not sure how they run). A comment or an issue to circle back would be good.
    1. Good point... will open an issue.

  3. build-support/bootstrap_native (Diff revision 8)

    What's the motivation for using a nonstandard binary path? For one (minor), this allows garbage to grow in the native-engine/ dir. I'd think a small smih binary added to the native-engine that could print its version would be enough to eliminate the readonly VERSION=0.0.1 if that had something to do with this.

    1. I experienced issues where the engine was not re-bootstrapping, but I now suspect that they were related to the use of a symlink into the working copy (which might have been git-cleaned or moved since then).

      Yes, this will cause growth of the cache directory for pants developers. But I always feel very uncomfortable using the mutability of binaries to accomplish stuff like this (ditto mutable=True on ivy artifacts).

Review request changed

Status: Closed (submitted)

Change Summary:

Merged as 40c214aa1c044a02b5290aa3ec7cc9570dbe24ad