Ability to filter list options.

Review Request #3997 — Created June 10, 2016 and submitted

benjyw
pants
pants-reviews
nhoward_tw, stuhood, zundel

Generalizes the "EXTEND" functionality of ListValueComponent to "MODIFY". Each ListValueComponent now stores a list of appends and a list of filters, and these get applied when the final val is requested.

Introduces two new forms of syntax for values of list options:

  1. -['foo', 'bar'] filters those values from the running list computation.
  2. ...,...,... splits the value on the commas and merges the results of evaluating the ....

The latter allows you to both append to and filter a list in, say, pants.ini:

list_option: -[1,2,3],+[4,5,6]

Added tests of the new syntax and functionality. All options tests pass. Full CI running here: http://jenkins.pantsbuild.org/job/pantsbuild/job/pants/branch/PR-3583/

  • 0
  • 0
  • 4
  • 0
  • 4
Description From Last Updated
  1. 
      
  2. does this work? would '|' gets interpreted as pipe in cli?

    [tw-mbp-yic pants (fiddle_3997)]$ ./pants --listy=|+[8,9]|-[6]|
    > 
    

    seems that they need to be quoted
    ./pants options --jvm-test-junit-options='|+[8,9]|-[6]|'

    1. They need to be quoted on the CLI, but not in the test.

  3. 
      
  1. 
      
  2. src/python/pants/option/custom_types.py (Diff revision 1)
     
     
     

    one liner instead?

    1. Ooops yes, that was from when I had some debug printing going on. Fixed.

  3. 
      
  1. I don't know what the | characters in the syntax adds here. Wouldn't +[a,b]-[c] work just as well?

    1. That would be harder to parse. E.g., +[1,-2]-[-3].

      Today we could do it with a regex, because we don't allow lists of lists. But if we ever do...

      The bars make this really easy to parse in all current and future cases (except for strings with bars in them, which I have declared are not supported by fiat...)

      If you feel strongly about it, I can replace this with a more complicated parser, but that seems like overkill for the benefit.

      However I will add this to the docsite, forgot to do so in this change.

  2. 
      
  1. 
      
  2. src/python/pants/option/custom_types.py (Diff revision 3)
     
     

    I see how pipes make it easy here, but I think this is just assuming that the | character won't appear within a string. If you wanted to assume that a | character did appear in a string, I don't think it would be any easier to parse than adding real logic that understood the syntax of the list.

    1. Now that I think of it, since strings can also contain square brackets, and they need not be balanced, regexes won't suffice. That "real logic" requires at least a context-free language. This seems like massive overkill.

    2. I'm pretty comfortable saying that if a need ever arises to use pipes in strings in lists while using this syntax then we'll fix it. Note that you can still use pipes in strings in lists in all other cases. Just not if you want to use this syntax with them.

    3. That said, I'm checking if there's something simple that can be done by parsing into a Python AST.

    4. Alas, there's nothing simple. Both the st and ast modules yield very complicated structures, and reasoning about them would be pretty error-prone.

    5. What if we were to allow two entries:

      --list-opt default is ['a', 'b']

      list_opt:-['a']
      list_opt:+['c']
      

      Yieleds ['b', 'c']

    6. That was my original desire, but how would we do that in config or env vars? This would only work for cmd-line flags.

    7. I don't know if its overkill or not, but there's the start of using a parser builder to be able to do this: https://rbcommons.com/s/twitter/r/3999/

    8. I'm not a fan of the surrounding |s, but dipping into parsers seems pretty heavy weight at this juncture. Having a reasonable looking syntax that we can imagine that we can parse unambiguously (at some point in the future if it becomes a problem) is the important part.

      To match the syntax that we have on the CLI for tags, etc., comma separated would make the most sense to me... ie: '+[1,2],-[3,4]'.

      In the meantime, one thing that might make regexes reasonably non-ambiguous (even without surrouding | tokens), would be to match the entire series of required tokens. For comma delimited, that would be something like r'\]\W*,\W*[+-]\['.

    9. The thing about the pipes is that you can tell that this is how you need to parse this value just by looking at the first character.

      Plus, pipes are rare characters that are unlikely to be encountered in practice, but commas are ubiquitous in lists, so this is much more error prone. Although it's true that commas do look nicer. And the regex you propose is unlikely to be defeated by real-world inputs.

    10. I haven't looked at the parser change yet, but changing our config parser is super out-of-scope for this change. And that would be a pretty big sledgehammer to crack a pretty small nut (and still doesn't help with env vars).

    11. The idea in this change isn't to affect the config parser. You operate on the string that you get back from the config parser so we don't have to introduce a gross syntax that we are I predict we are later going to regret. I haven't added in the + and - operator yet, so its just a POC of how to integrate a context free grammar into pants.

    12. Like I said, I haven't looked at it. But being able to support two instances of the same key in the config file, per your example of

      list_opt:-['a']
      list_opt:+['c']
      

      means changing the config parser itself, no? Or can the standard ConfigParser handle multiple instances of the same key? My browsing of its docs implied that it could not.

    13. Changed the syntax to the comma-separated one Stu suggested. Added copious tests for it, incuding an @xfail test to point out where the regex falls down.

    14. OK. Does the RB description need updating?

    15. Probably :)

  3. 
      
  1. I find the behavior here a little confusing. It feels more like a filtering function than a subtraction function in that the subtractions are accumulated and not overwritten unless the replace syntax is used. It could be used to create list option values that can't be added unless the replacement syntax is used.

    For example, if the environment specifies a subtraction, then a flag must use replacement syntax in order to drop the subtraction. Without it, attempts to add the value will be silently ignored.

    I feel like the docs should note that somehow, or it may be tricky to use. We could also warn if later additions were subtracted earlier, though that could get noisy. If it did that, at least the user would know that their option value was eaten by a filter specified somewhere else.

    I think if we want it to work this way, the behavior should also be specified in a test, maybe one like

    check([1,2,3,6,7], './pants --listy=8',
          env={'PANTS_GLOBAL_LISTY': '|+[6,7]|-[8]'}...
    
    1. Good point. We can fix this though, by checking for this case on addition and adjusting for it. Will add a test case and make it pass.

      The intention is for it to be a modification that doesn't survive a replacement operation, just like addition.

    2. Hmm, not as simple as I thought, because subtraction and addition aren't actually symmetrical (adding is actually appending, so order matters, subtraction is as you say more of a filter, and order cannot matter). I will change the terminology and clarify the docs appropriately.

      The use-case here is to be able to tweak defaults (such as the list of enabled backends) without having to repeat the included values. I don't want to get too bogged down in making this very general.

  2. src/docs/options.md (Diff revision 3)
     
     

    We could document the subtraction interaction with later additions like this.

    Subtractions are applied to values after the last override, so later additions may be ignored if they were previously subtracted.
    
    + `--foo=-[2] --foo=2` will yield `[1]`.
    
    They also remove all matching values. So if a subtracted value appears more than once, all instances will be removed.
    
    + `--foo=1 --foo=1 --foo=2 --foo=-[1]` will yield `[2, 2]`.
    
  3. src/python/pants/option/custom_types.py (Diff revision 3)
     
     

    Could you add a test that ensures multiple instances of a matching element are removed?

  4. Does -8 here mean subtract 8, or does it mean add -8?

    1. Ooops, that was supposed to be -[8].

  5. 
      
  1. Made a bunch of changes. PTAL.

  2. 
      
  1. 
      
  2. src/docs/options.md (Diff revision 4)
     
     

    trailing

  3. src/docs/options.md (Diff revision 4)
     
     
     
     

    Is this only within the same context (ie, passing the options twice on the same commandline), or across the entire parse (ie, passing the option once in pants.ini and once on the commandline)? The latter would be unfortunate, the former is probably fine.

    But I suppose there is always the escape hatch of fully replacing the underlying value... so probably not a blocker.

    1. The latter. I tried to clarify this in the docs by using the terminology of appends and filters, and the fact that filters take precedence.

  4. src/python/pants/option/custom_types.py (Diff revision 4)
     
     
     
     
     
     

    Does this need to strip or ignore whitespace?

    1. No, we always acted on the first character(s) to determine how to interpret the value.

  5. tests/python/pants_test/option/test_options.py (Diff revision 4)
     
     
     
     
     

    Hm, this is a bummer.

    Is there a step that we could execute between the execution of each rank that would allow for the expected behaviour?

    It seems like currently we're concatentating all actions across all ranks in order, as opposed to executing all operations within a rank, finalizing the rank into a value, and then using the result as the input to the beginning of the execution of the next rank.

    I think the same thing applies to config files... if I have two, I'd want the second one to be able to add a thing that the first had removed.

    1. But, more generally, I wonder about the need for associativity here. Do we lose ordering somewhere? If we're able to preserve ordering, it seems like simply executing all steps in precisely the order they're received from each rank (without doing any sort of aggregating) would work.

    2. I really, really, really don't want to make this that complicated... In the custom type we know nothing of ranks.

      I think that trying to make this general-purpose is making the great the enemy of the good. There's a specific use case this is designed to support, namely the ability to use your pants.ini to remove entries from a default value (e.g., to make it easy to not register default backends that you don't care about). So this is quite a bit more general than that, but I don't see the need to support contrived use-cases. I think using the language of "appends" and "filters" (h/t Nick) should help explain that you can't override a filter.

  6. 
      
  1. Ship It!
  2. 
      
  1. Ship It!
  2. 
      
  1. Won't block this: it's an improvement. But if it's possible to treat the "impossible to re-add a value" as a weakness with an associated github issue, that would be good.

    Thanks!

    1. I just don't necessarily agree that it's a weakness. For every use case for which this is desirable we could come up with one where it isn't.

    2. Although it might be more intuitive, true.

  2. 
      
Review request changed

Status: Closed (submitted)

Change Summary:

a5c25d2b17740d449de3bb24780ad1b5d4e63f7f

  1. Submited as a5c25d2b17740d449de3bb24780ad1b5d4e63f7f. Thanks all!

  2. 
      
Loading...