Add process-level buildroot validation to NailgunExecutor.

Review Request #3393 — Created Jan. 29, 2016 and submitted

2659, 2855
jsirois, nhoward_tw, patricklaw, stuhood

In the case of stale pids for nailgun processes, when a given pid is checked for liveness and the new process is also a java process it can pass the process executable name check (java == java) but not actually be a nailgun process owned by the buildroot. Because the NailgunExecutor fingerprint also does not match a given new fingerprint (None != $NEW_SHA), the current logic treats this as a "I am running but have a new fingerprint and thus need to restart" case and attempts to kill the stale pid before starting a new nailgun. This can cause unintended behavior such as permissions related exceptions as seen in #2659 or worse, the killing of unrelated java processes.

This change addresses this case by adding an inline check for a matching nailgun buildroot argument in the NailgunExecutor.is_alive() check.

Fixes #2659.

CI is green @

Additional manual testing:

While monitoring with ps:

1) `./pants clean-all && ./pants compile testprojects/src/java::` in `~/dev/pants` which starts nailgun owned by `~/dev/pants`.
2) `cp -avf ~/dev/pants ~/dev/pants2 && rm -rf ~/dev/pants2/.pants.d` to wipe cache without removing `.pids` (creating a stale pid situation)
3) `cd ~/dev/pants2 && ./pants compile testprojects/src/java::`

Verified that new nailgun processes are launched for the second buildroot and the first buildroot's processes are untouched.

  • 0
  • 0
  • 1
  • 1
  • 2
Description From Last Updated
  1. Looks pretty good. I've got a couple comments.

  2. I feel like this could cause issues with tests that set their own buildroots if multiple are used within the same process.

    1. This is existing code that has been around since before I started working on pants - I've only just changed the argument name in this review. Wouldn't the types of tests you describe be running as integration tests anyway - i.e. in completely separate pants contexts?

      How is this any different than just using get_buildroot() directly as we do throughout the pants codebase? I'm not sure I see the case you're describing.

    2. Most direct usages are within a method scope. There are only two direct calls in static scopes in the whole project. This one and the one in tests/python/pants_test/android/tasks/ That said, I think it's only an issue for tests that run nailgun that are not integration tests. There are a few of those, but since we turn off nailgun in travis, it shouldn't affect ci.

      The problem wasn't introduced by this change, it's just that I noticed it while looking at it. Wrote up an issue

    3. ah, I see what you mean. I'll punt on this RB as out of scope and circle back to clean up separately for #2859 - thanks for filing.

  3. Please add a test for this.

  1. Yay!

  2. return not (
      process or
      # Check for walkers.
      process.status() == psutil.STATUS_ZOMBIE or
      # Check for stale pids.
      self.process_name and self.process_name != or
      # Extended checking.
      extended_check and not extended_check(process)

    I believe that's equivalent and easier to read, given the cases in this method. If it's True, we return False. If it's False, we return True, but that's what we would have done anyway in the fall-through below (now you can remove that too).

    And if there's an exception the behavior is unchanged.

    1. sgtm, but I think we still need the not in the top-most check (not process), otherwise it'll skip the remaining checks in the boolean expression as process is expected to be truthy if the pid exists. and afaict, the two lower and cases would still need parens.

  1. Ship It!
Review request changed

Status: Closed (submitted)

Change Summary:

thanks Nick & Patrick! this is in @ f9c03075bcf5bebeb9bea1d21d2489d2b4c3b9f0