Add timeouts to tests

Review Request #2919 — Created Oct. 2, 2015 and submitted

benjyw, jsirois, kwlzn, patricklaw, stuhood

Add timeouts to tests.

- A per-target test timeout can be configured in the BUILD file with timeout=SECONDS
- Because pants may run many targets tests as once, the timeouts for all the tests are summed for the entire run of tests timeout
- If one of the targets has no timeout, then all the tests run with no timeout
- A default timeout can be configured with the --timeout-default option
- Timeouts can be [disabled]/enabled with --[no-]timeouts
- Sets the default timeout to 1 minute for java & python tests in pants.ini

./pants test tests/python/pants_test:all
./pants test tests/python/pants_test/backend/jvm/tasks:junit_run_integration

- Added tests:
* tests/python/pants_test/util:timeout
* tests/python/pants_test/backend/python/tasks:pytest_run
* tests/python/pants_test/backend/jvm/tasks:junit_run
* tests/python/pants_test/backend/core/tasks:test_task_mixin

Pull Request:

  • 0
  • 0
  • 21
  • 4
  • 25
Description From Last Updated
  2. nit: we prefer using .format() style string formatting in OSS pants.

  3. We don't usually update the copyright year for existing files.

    1. I created test_task_mixin in the previous RB. I'll correct the year in that one, where I create it.

  4. can you please specify the unit of the timeout (e.g. seconds, minutes, aeons...)

  5. nit: whitespace

  6. I'm not sure this is what we want. If we set a default timeout of 600 seconds, we get 600 * the number of targets we're running tests on?

    1. The core problem with my current implementation is that I didn't wrap each target's test with the timeout, I only wrapped execute(), which runs all the targets' tests. Therefore if you have 3 targets, each one has a timeout of 60 seconds, you end up with a 180 second timeout on all three tests, rather than 60 second timeouts on each test.

      I'm not sure if you're addressing this issue directly or something related to default timeouts.

      My previous attempt at an implementation which would wrap each target specifically required mucking about in, which was pretty ugly. I think it would also be hard to expose, in a language-neutral manner, to the new test task mixin each individual test target run rather than just the call to execute().

      I think what I have is a reasonable first step, but not a great solution.

    2. If this is going to be such a coarse grained timer, then let's just set it absolutely, not try to scale by the number of test targets.

    3. There are a few ways to do this -- appreciate everyone's feedback on what they prefer. It will be easier explained with an example.

      Suppose we have three targets, one has a timeout configured of 10 minute, another 4 minutes, and the last has no timeout set, and the default timeout is 5 minutes.

      1) Use 0 for targets without a timeout, if at least one target has a timeout configured:
      - 14 minutes (14 minutes for the targets with timeouts, 0 for the target without a timeout)
      2) Use the default timeout for targets without a timeout:
      - 19 minutes (14 minutes for the targets with timeouts, default timeout for the target without a timeout)
      3) Use the default timeout for the entire run, if at least one target has no timeout configured:
      - 5 minutes
      4) Use the max of the default timeout and the total of all targets with timeouts configured
      - 14 minutes
      5) Use the max timeout of all the targets, using considering the default timeout for targets without a timeout set
      - 10 minutes

      The current implementation is (2). I'm good with (2) and (4), but willing to be convinced about (1), (3), (5), or any other options that I may have missed.

    4. Sorry if I miss anything, but why not make the default target configurable? Meaning - supplied via command line arg.

      Please let's not do a "sum timeout". Timeout is per target, never as a sum for a bunch of targets.

    5. I want what Itay described, but if not then maybe the maximum of all the timeouts (5)

    6. I want what Itay wants as well, but I think that is intractable, given that the language specific code is in charge of running the tests for each target so I don't know how possible it is to get in there and get that specific, without introducing tons of language-specific timeout code, which is what we've been trying to avoid here in the first place with the test superclass and whatnot.

      The max is reasonable too. Itay what do you think?

    7. If we are having limited timeout support, what I would like is just the way to set a global timeout for the entire pants run and thats it. Having lots of confusing per-target overrides only not to enforce a per-target timeout is just too confusing IMHO.

    8. @Eric: I think executing individual targets independently is up to the language implementations at the moment: Garrett improved the situation there for junit, and if we can improve it further, we should. But unlike bazel/blaze (where this idea came from), pants does not invoke each junit test target individually (currently). As we refactor toward that approach, a sum seems reasonable.

    9. Also, note that the "10 minute" value is a strawman, and is not actually what you'd use. More realistic would be the 10 second range by default, with local overrides for more ridiculous cases. And in that case, a sum is not too scary.

    10. A global pants timeout is a good idea, that's something we're implementing as well. However, we're doing that outside of pants. I don't think it really makes sense to put a global pants timeout inside pants, because if pants itself hangs, then the timeout won't trigger anyway.

    11. I'll drop my objections assuming I need to do the same thing and implement a global timeout in a wrapper script.

  7. Could you shorten this value? this one test adds 5 seconds to our test run.

  2. src/python/pants/backend/core/targets/ (Diff revision 2)

    It's very convenient having this as a pure mixin, but I think using tags for this is a bit of an abuse. This isn't really a string, and I could also imagine it being a property that junit_run might want to invalidate based on.

    Also, not sure I like the usage of a target macro here: this should be a real parameter (in the conversation we had, I was suggesting target macros for internal usage: not upstream).

    1. I see, I misunderstood. I've changed it to be a regular parameter.

  3. Please include kwlzn on this review. He's put some thought into timeouts.

  4. IMO, namespace these:


  2. pants.ini (Diff revision 3)

    Should this be closer to 1 minute? I don't think pants itself has any individual test targets that take longer than a minute.

    1. Yeah I wasn't sure what to put in here, so I put in 10 minutes.

      This depends to a certain degree how we resolve the other issue of max vs sum etc.

      If the timeout set here is supposed to cover all targets (ie if we choose the max approach), then maybe 10 minutes is appropriate, but if we're summing, then probably 1 minute (or less?) is appropriate. (It's true that in the current rev I have 10 minutes and I'm doing a sum, so I should change either this or that.)

  3. I think that giving this a constructor means it is no longer a mixin.

    How about just having the timeout property be abstract, such that implementations have to fill it in some way?

    1. Some research leads me to believe that this is ok: : "There is nothing special about the init method with respect to inheritance and the MRO, it all applies the same to all methods in a class."

      All the tests succeed, so this isn't smashing any other constructors that need to be called.

      I feel like this is cleaner than doing an abstractproperty, but as noted before, I'm new to python multiple inheritance & mixins, so I'm happy doing this with an abstractproperty, if you'd like.

    2. I am new to it as well, so I won't pose a strong objection. If Benjy or Patrick could chime in, it would be great.

    3. I'd like Patrick to chime in as well, but I think this is problematic. It's true that init methods are just methods, but unlike other methods, we must always explicitly call superclass init in our init, and that's where things can get sticky: You don't know where you are in the MRO, so you don't know what args your MRO superclass init method requires. You can get away with the **kwargs trick for a while, but it's fragile. For one thing, none of your callers can pass in *args, and you're relying on a lot of discipline around where things are mixed in so that they end up in a good place in the MRO. I'm pretty sure that mixins should never have init methods.

      But please reach out to Patrick for a more definitive opinion.

  2. return getattr(target, 'timeout', None)

  3. Prefer to short circuit and early return when self.get_options().timeouts is False

  4. The if here will trigger one timeout = 0 which seems like a thing someone might do, even though it's crazy. Prefer if blah is None

    1. My intent was that timeout=0 mean no timeout, not that it would be zero timeout. I.e. it's the same as not setting a timeout. That seems to be fairly standard practice, for example pytest-timeout.

    2. Dropping this, but raised a separate issue that I think will clean up these few lines of code.

  5. This is weird. You should have already filtered this above, is it possible to trigger this line of code? If so it'd be good to see a unit test confirming it does what you expect.

    1. If the default timeout is 0 or None, and there are targets without a timeout configured, then the timeout would be disabled. I've added more comments to make this more clear.

    2. Ditto above.

  6. This raises KeyboardInterrupt? Crazy

    1. Yeah this is ugly. I've changed it so that the Timeout raises a TimeoutReached exception, which is caught.

  2. Since the return short circuits, I think it makes it more readable to remove the else and tab back the rest of the code.

  3. This list comp bothers me, mostly because a ternary statement within a list comp reads at first like a filter.

    Indeed, I think it would make sense to collect with the filter (but not the else), then if the list is empty, return the default (or None). If the list isn't empty, then a real timeout was set and it makes sense to return the sum.

    1. Thanks for all the feedback Patrick.

      This next diff isn't exactly what you suggest, because I believe the logic we are looking for is different, which relates to the other discussion. After thinking further and considering that the path forward on this would be to give us finer granularity timeouts, I do believe that the logic as currently implemented is the right behavior.

      With the logic you suggest, targets with no timeout set don't get included in the timeout sum, even if there is a default.

      In the current logic, their contribution to the sum is the default timeout, if one is set. If there is no default, then if at least one target has no timeout, then the entire run has no timeout.

      This diff splits out the single list comprehension into two list comprehensions to make it more clear what is going on.

    2. Agreed, the second diff clarifies the logic. I had one last nit to get rid of the ternary, opened an issue below.

  4. src/python/pants/util/ (Diff revision 5)

    if isinstance(type, KeyboardInterrupt): ...

    ... though I'm not sure if there are actually subclasses. Also I strongly recommend naming type something like type_ so it doesn't shadow a builtin.

    1. For some reason isinstance(type_, KeyboardInterrupt) is not working, but type_ is KeyboardInterrupt is working.

    2. Drive by - but if type_ is like its name says, its not an instance but a type, so isinstance(type_, type) == True and what you want is issubclass(type_, KeyboardInterrupt); although, your is solution is probably fine here.

    3. Oh, because type_ is a type (who knew!). So you want issubclass

    4. ah right, of course. Thanks John!

  2. timeout or timeout_default for timeout in timeouts gets the same behavior but avoids the ternary.

  1. Despite the drive-by I'll step away from this review.
  2. src/python/pants/util/ (Diff revision 7)

    I think this should raise if the timer's action has been triggered.

    There's no guarantee that the main thread is the thread that the timer started on. In which case, the thread that was executing won't see an error with the current implementation. Depending on how the threads are scheduled, that could result in unexpected behavior.

    1. Yeah I was concerned about this at first but figured that since all the tests worked, it was ok.

      This new version is modified to not catch KeyboardInterrupt but simply check if _triggered was set, and it that's true, then it will raise. Of course if the interrupt doesn't work, then the long-running task won't actually get interrupted, so that is bad, but maybe this is the best option? I could not find any way to interrupt a specific thread using python's thread timer.


  2. Periods at the ends of sentences. Please remember this for future reviews, as this isn't the first time I've mentioned it. Look at the code around you for guidance.

  3. src/python/pants/util/ (Diff revision 9)

    First line of docstring goes immediately after the triple-quote.

    Please pay attention to this kind of stuff.

  4. src/python/pants/util/ (Diff revision 9)

    Raises KeyboardInterrupt.

  5. Tests that rely on system clock behavior are both sketchy and slow.

    I recommend allowing plugging a fake clock implementation into Timeout (one where you set the "current time" instead of the system clock), and using it in this test.

    1. I've mocked threading.Timer in and, but I don't know if that is possible in and Do you have any suggestions for how I could mock it in those tests?


  1. Thanks Sameer.

  2. src/python/pants/backend/core/tasks/ (Diff revision 10)

    For many users, the 'help' is the only explanation of a feature... would be good to expand a bit more on the effects of these options here.

    Also, the timeout is set in the BUILD file, yes. But it would be clearer to say that you mean the timeout set on the target.

    1. Fixed, and further explanation added.

  3. src/python/pants/backend/core/tasks/ (Diff revision 10)

    Would it be easier to have 0 be the default/magic value that indicates "no timeout", and remove the --timeouts flag? Would prevent this ambiguity.

    1. That would actually be different behavior. If --timeouts is true and the default timeout is zero, then targets with no timeouts would still have no timeout, but targets with timeout will still have a timeout, as long as there are no targets in the same run without a timeout. If --timeouts is false, then no targets will have a timeout enabled. The text I wrote in the help should make that fairly clear.

  2. src/python/pants/util/BUILD (Diff revision 10)

    This dep is in the wrong place. I assume you meant to put it on the test target?

  3. src/python/pants/util/ (Diff revision 10)

    Could you implement this whole thing more simply as a contextmanager? This is the idiomatic way to creat generators.

    def timeout(secs):
      # Set up and start timer here.
      # Check timer status here.

    Or will this not work for some reason?

    1. I didn't realize that contextmanager was preferred, so I made a quick attempt to do it with contextmanager just now, but I believe the code is cleaner and easier this way. I think it more accurately represents what the various elements of the logic are. It would also make the mocking harder.

    2. Writing generators manually is error-prone, and @contextmanager really is the recommended way of doing it.

      I'll patch in this RB and see if I can hack up the kind of thing I'm thinking of in a few minutes.

    3. OK, after 10 minutes of messing around with this, I think your version with explicit __enter__ and __exit__ is fine. The @contextmanager version would be simpler in some ways, but it would have two complications: 1) It would have to deal with the KeyboardInterrupt explicitly (because the contextmanager will re-raise it at the yield point), and 2) You can't overwrite a local variable in an inner function, so it would need to be in some container.

      I.e., I can't do this:

      triggered = False
      def handle_timeout():
        triggered  = True  # This is a separate var that shadows the outer one.

      We'd have to do something like:

      triggered = [False]
      def handle_timeout():
        triggered[0]  = True

      And that's just ugly.

      I have a couple of remaining comments, and then I will shipit this.

    4. Thanks.

      I was seeing this KeyboardInterrupt getting raised in my implementation, but I didn't understand why it was happening. That makes sense now.

  4. threading.Timer is an internal implementation detail of Timeout. It's sort-of OK for its own unittest to know that (although I might have passed the Timer implementation in explicitly, with a default of threading.Timer), but it's definitely not good for this test to know about that level of detail. Ideally this test should mock out Timeout itself. Ditto the two other tests you mentioned.

  5. tests/python/pants_test/util/BUILD (Diff revision 10)

    You need that dep on mox here.

  2. src/python/pants/util/ (Diff revision 10)

    This should pass some useful error message to the base class.

  3. src/python/pants/util/ (Diff revision 10)

    surely raise TimeoutReached()? I'm surprised this works.

    1. raise can take either the class or the instance ( I've changed it to pass a useful message per your other suggestion, so it's not relevant anymore.

    2. Ah I see, this only works by accident, it's shorthand for (ExceptionType, <empty arg list>), but you should definitely always raise ExceptionType(<arg list>).

  4. This is very confusing to me. Why should the test code call the handler explicitly? That doesn't seem like a good emulation of what "real" code would do. But maybe I'm not fully grokking this code. I find magic mocking systems hopelessly distracting and confusing, and avoid them when possible.

  1. OK, just a couple more small things, then shipit! Thanks for being diligent about responding to comments.

    1. Thanks! I appreciate all the detailed feedback and help with this.

  2. src/python/pants/util/ (Diff revision 11)

    Add a comment explaining that this var is unprotected by a mutex because boolean get/set is atomic in any Python implementation we're ever likely to care about.

    It's important to emphasize to the reader that you have thought about the concurrency aspects here. Otherwise someone comes along and naively changes your bool to some more complicated type, and things break...

  3. I would move this out to the top level of the module (and name it _make_fake_timer) so that the nested class's self doesn't shadow the outer class's self. It's needlessly confusing.

    I really like this explicit fake timer, instead of some mocking magic, though.

    1. I moved FakeTimer out to the top level of the module, but I left _make_fake_timer in the class, because I need to store a ref to the FakeTimer that got created. It still cleans up the self shadowing confusion.

  2. src/python/pants/util/ (Diff revisions 11 - 12)

    A) Please end a sentence with a period.

    B) This comment isn't quite right. Python says nothing about atomicity. Python implementations do. In practice, all Python implementations we care about will support this.

  1. Ship It!
Review request changed

Status: Closed (submitted)

Change Summary:

Merged as 93edf2f0f1dc397cd6fa5cace520ed2d8f4a3261