Simplify pants configuration

Review Request #67 — Created March 6, 2014 and discarded

travis
commons
pants-reviews
jsirois, lahosken, patricklaw
Simplify pants configuration by introducing single location to define built-in options. We can generate documentation from this file too so users know what's tunable. This was inspired by http://svn.apache.org/viewvc/hive/trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java?view=markup which I used in a former life and it worked quite well.

Many currently required pants configuration keys (primary offender: workdirs) could have sensible defaults. Migrate one such option to have a sensible default.

If folks are okay with this approach I'll migrate things en masse. This is a step towards my long-term goal of "touch pants.ini" being sufficient to use pants in a repo.


  • 0
  • 0
  • 4
  • 0
  • 4
Description From Last Updated
JS
  1. This collides with just posted: https://rbcommons.com/s/twitter/r/69/
    See what you think of that - part of yak shaves to get internal pants build/publish automated.
    1. https://rbcommons.com/s/twitter/r/69/ looks good and we should ship that. Any objection to this general idea after merging in your change? The key part is centralizing the "tuneable stuff" in one place so its easy to keep track of and generating docs from.
    2. I like the idea of being able to generate docs but this mechanism does not scale.  The base package is only available to pants and not extensions - like we have at Twitter.  Can you come up with something that works for the core - few fixed options (run tracker, reporting server, defualt machine and clone cache root dirs, ...) - and plugin tasks/targets?  [1]
      
      [1] https://github.com/pantsbuild/pants/issues/5
    3. To loop this back to a pants-devel thread from yesterday - we can't have base depending on tasks for example - even if not via import and only conceptually.
    4. Its not in this review, but you could imagine plugins having a defaults.py file that defines the pants.ini options used by that plugin. Stuff in base does not depend on tasks/targets.
      
      The generated docs would be a union of all the options defined in defaults.py that we ship with pants itself. Other plugins wouldn't have their docs on the main site, so this is not problematic.
      
      Conceptually, I think this is pretty straightforward. We have a PantsOption class that's now part of the plugin API. Pants core has a defaults.py file for core options. Extensions could have their own defaults.py files if needed. Re: actually making this change, today the code is not organized around the plugin model described in issue 5. I'd prefer not to block on a big, eventual change when this could happen in the near-term and actually be an improvement over what we have today, and would be simple to migrate if/when we do the extension work.
      
      Thoughts?
    5. The idea of 5 is that every Task and Target type not needed to bootstrap pants is an extension.
      
      I'm all for reducing the amount of non-defaulted config today with no delay.  But this change does more/different and it doesn't end us in a world where we get the doc for pants I think your aiming for.  For example, none of the java task config keys would be part of that doc, all those tasks are extensions.
      
      I'm also a big non-fan of a bag like this.  Its a central registry of keys that only helps doc-gen.  It hurts code organization since the file has global knowledge / deps by design (for some def. of global).  It would be nice to not have to resort to this - as we didn't for @manual.buildict - which is non-global.
      
      Its probably best to add another Twitter reviewer and a Foursquare reviewer to get perspective.  This feels like an anti-pattern to me but I could be being blind. 
  2. 
      
TR
JS
  1. This generally seems more sane to me.  I like it.
    Please do add other reviewers though and get a Foursquare eyeball on this.
  2. 
      
TR
LA
  1. I surely do like it when a thingy's { definition, use, doc } are all close together like this. I won't spend as much time working with these thingies as other folks, so you wanna hear their opinions... as you're doing anyhow, yay.
  2. 
      
BE
  1. Sorry for the tardy review. Regularizing configs and flags is great! See comments inline.
    1. Thanks for the feedback. Sounds like folks are onboard with this approach.
      
      The review in its current state was to illustrate the proposed change. I'll clean this up and post the version ready for final review.
      
      Onward towards doc'd options!
  2. src/python/twitter/pants/base/config.py (Diff revision 2)
     
     
    Check for other is None?
  3. src/python/twitter/pants/base/config.py (Diff revision 2)
     
     
    buildroot isn't actually configurable, right? It's always whatever get_buildroot() returns. It's seeded as a config file var for interpolation, but you can't change it, so I'm not sure it's correct for it to be a ConfigOption.
  4. Add real help.
  5. s/List/Like/. 
    
    Or better yet, get rid of that preamble. Everyone knows what a config file is.
  6. 
      
TR
BE
  1. Before I dive into the code, a general question/comment: How does this interact with command-line flags?
    
    Here's how I was imagining this will work: We distinguish between config options, and how those options get set. That is, code only knows about the config options it registers. Then, the config initialization code munges option values from the following, in this order (later overrides earlier): pants.ini entries, <workspace>/.pantsrc entries, <homedir>/.pantsrc entries, environment vars, command-line flags. 
    
    [E.g., for every config key foo.bar.baz we autogenerate a command-line flag --foo-bar-baz]
    
    Now no code anywhere else knows anything about "pants.ini" or about flags. It certainly doesn't define any flags. It just interacts with the config. Is this what you're thinking too?
    
    In the future we could cleanly support other config formats, e.g., JSON, because only the init code would need to change.
    
    There may be a handful of flags that are not configs (e.g., --help) but ideally we'd allow those to be proper config too (even if that has the slightly odd effect of help=true in your pants.ini being the same as --help).
    1. Today there's zero relationship between command-line flags and the config. I believe pants has been this way since day 1.
      
      I've done a similar thought experiment about how to merge command-line flags and the config, but have not played around with an implementation. There are some wrinkles, such as how to deal with https://github.com/twitter/commons/blob/master/pants.ini#L141 or https://github.com/twitter/commons/blob/master/src/python/twitter/pants/tasks/jar_publish.py#L269 .
      
      This is how pants works today, and likely will for some time, so I'd prefer to set ourselves up to document reality rather than block on a hypothetical future reworking of the config system. In many cases while documenting we fix the underlying issues when what would need doc'd is nuts, but in this case it would be a decent size project. Also, the existing system works fine. Why allow configuration in JSON when there's zero issues with the current format? Providing multiple ways of doing the same thing sounds like a misfeature to me.
    2. Maybe I'm misunderstanding the scope of your work, but aren't you in fact reworking the config system? 
      
      The existing system doesn't work fine at all. We manually define cmd-line flags all over the place, and then repeat everywhere the logic of overriding pants.ini config with flags. The JSON mention was just an example of the flexibility this would provide. Feel free to ignore it. The real problem we have today is the parallel proliferation of config keys and flags, and the logic to combine them.
    3. The goal of this change is making a targeted change to how values are read from pants.ini so we can (a) document how keys are currently used, and (b) provide sensible defaults to help get to a world where no options are required. I'm not sure if we'll actually get to (b) or if some small number of options are required, but that's the general direction I'd like to go.
      
      I'm surprised you're pushing back here. Documenting the pants config file, which today is entirely cowboy town, is a good thing, no? I'm very much interested in the pants onboarding experience and needing a lot of undocumented, required pants.ini keys has been identified as a big issue for new users so I'd like to solve that problem without getting too much into the weeds.
    4. I'm not pushing back, the change looks fine in principle (haven't dug into the code yet). I'm trying to understand the scope, and how this interacts with the wider change.
    5. Gotcha. The overall intent is optimizing the new pants user experience. Right now its "copy a bunch of files into your repo from an existing repo." I'd like to change that to "touch pants.ini" and you're done. Ideally pants.ini would have no required options, and all required resources are bundled with pants, and you can choose to override if needed, but nothing extra is required.
      
      That's the high-level direction I'd like to go. Getting started with pants should be easy peasy.
  2. 
      
BE
  1. 
      
  2. src/python/twitter/pants/base/config.py (Diff revision 3)
     
     
    Why is this a ConfigOption, since it can't be changed anyway? That seems confusing. 
    1. This feels like the most natural way of handling "buildroot", since its present in the config and consumers 
      
      * 'buildroot' is readable from the config, and this allows access just like other options.
      * 'buildroot' will show up in the docs.
      
      I think the main thing is communicating to folks this is present in the config but not settable. I attempted to communicate that in the help string.
      
      What's your proposal for how to structure this in a better way? I can think of:
      
      (a) Have folks use `get_buildroot` directly. This feels awkward because 'buildroot' is in the config so why shouldn't we use it from the config?
      
      (b) Read 'buildroot' from the config, but with `config.get` rather than `config.get_option`. This feels awkward because this one key is special.
      
      This option is unique in that its no settable by users it needs to be special cased. This approach seems cleanest. What's your proposal?
    2. It's not really "present in the config". It's a seeded constant you can use in interpolations in pants.ini, but it is not itself a ConfigOption. In that sense it's no different from user or homedir, which you're not handling in that way. 
      
      Maybe there should be a different class for this: ConfigConstant or something. But of your alternatives, I have no problem with (a), because I don't see buildroot as being "in the config", any more than user or homedir.
      
      (b) I agree is awkward.
  3. Maybe this should be a full path, not a path relative to pants_workdir. I like using interpolation for this in pants.ini:
    
    %(buildroot)s/foo 
    
    is clearer than
    
    foo
    1. This was a conscious decision to restrict directories 'pants_workdir'. If we allow folks to specify arbitrary paths that will inevitably happen and likely cause confusion. This is analogous to pants putting its derived output files in 'target'.
      
      Actually, for a lot of these workdirs I'd prefer to eliminate the workdir option entirely and use well-known keys. I can't think of a compelling reason for "reports" to be customizable.
      
      I'm trying to look at these configs from the eye of a user who's a lot more interested in their code than their build system. If we're thoughtful about what knobs are provided we'll simplify things for users and make using pants delightful rather than overwhelming.
      
      What's the use case for creating derived output outside pants_workdir?
    2. ...analogous to **maven** putting its output in target...
    3. Well, you can still put '..' in the relative path, so you have to (and sort-of do, in _cautious_rmdir) check for the workdir not being under pants_workdir.
      
      However, I agree that I'm not sure any of these should be customizable anyway. Maybe we should just hard-code them, and make specific ones customizable if/when a true need arises?
  4. Ditto - should be full path, not relative to pants_workdir?
    1. Same comment as above.
  5. 
      
TR
  1. Discarding as this diff is no longer valid as pants has moved to pantsbuild. Reposting as https://rbcommons.com/s/twitter/r/183
  2. 
      
TR
Review request changed

Status: Discarded

Loading...