Minimal Haskell plugin for `pants`

Review Request #2975 — Created Oct. 16, 2015 and submitted

Gabriel439
pants
ggonzalez/haskell-plugin-1
2377
2cce687...
pants-reviews
jsirois, stuhood

This is a minimum viable subset of a larger plugin that I have been working on
here:

https://github.com/pantsbuild/pants/compare/master...Gabriel439:ggonzalez/haskell-plugin

The easiest way to review this is to read the `contrib/haskell/README.md` that
this is included within this change. You can also view the rendered `README.md`
at:

https://github.com/Gabriel439/pants/blob/ggonzalez/haskell-plugin/contrib/haskell/README.md

Note that the `README.md` describes the full set of changes that I have already
made on my larger branch, not just the changes within this review board, but
it's probably better that way since you'll have more context for reviewing this
smaller change.

This change provides:

* New `pants` targets for Haskell projects
* Support for `./pants compile` on these targets
* Example targets

I wasn't sure whether or not to include the integration tests since @jsirois
asked me to keep this small. I can expand this review board further to add one
of the integration tests I already wrote if people would like to see that, too.

$ ./pants compile contrib/haskell/examples/3rdparty:headless-project
$ ./pants compile contrib/haskell/examples/3rdparty:stack-project
$ ./pants test contrib/haskell/tests/python/pants_test/contrib/haskell/subsystems

Link to travis build: https://travis-ci.org/pantsbuild/pants/builds/92311084

  • 0
  • 0
  • 13
  • 0
  • 13
Description From Last Updated
ST
  1. Nice! Only made it halfway through the readme, but this looks like a good start.

  2. contrib/haskell/README.md (Diff revision 1)
     
     
     
     
     
     
     
     

    thrice diligent

  3. contrib/haskell/README.md (Diff revision 1)
     
     
     

    project vs package discrepancy here? either way, confusing.

    1. Yeah, this is due to my own confusion and it's not correct. I'll change it to:

      "By default, all dependencies are downloaded from Hackage unless you explicitly
      override them to point to other non-Hackage sources

      One example of a non-Hackage source is a package hosted only on Github such as
      the following pipes-tar package"

  4. contrib/haskell/README.md (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     

    neat.

  5. contrib/haskell/README.md (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     

    This won't necessarily help your fellow devs though. Should get this in the pants cache as well. See below.

  6. contrib/haskell/README.md (Diff revision 1)
     
     
     

    Should consider making the package name optional, and having it default to the target name.

  7. contrib/haskell/README.md (Diff revision 1)
     
     
     
     
     

    This doesn't quite make sense to me.

    1. This was a bit of confusion on my part. I updated the README to clarify that these stackage targets will be useful as dependencies once I add support for also generating *.cabal files for cabal targets (because that's where the actual dependency information is used). I just hadn't thought that far ahead when I first wrote that.

  8. contrib/haskell/README.md (Diff revision 1)
     
     
     
     

    It's not the first thing that escapes from the workspace dir, but it would be good to follow up with them to fix that.

    In particular, we don't want to have to trust that that directory is 100% concurrency safe when things are running in distributed environments.

    1. I opened https://github.com/commercialhaskell/stack/issues/1178 to follow up with them on this and they indicated that at least ~/.stack was overridable and that they had no objection to a pull request to implement the ~/.stack-work override. So I will try to incorporate at least the ~/.stack override into this review board and then work separately on a pull request against stack to add the ~/.stack-work override functionality.

  9. contrib/haskell/README.md (Diff revision 1)
     
     
     
     

    These are not necessarily mutually exclusive. Elsewhere, we use the pattern of a Subsystem that holds your defaults for these parameters... that was if you configure a default resolver you need not repeat it on each target (but you can add it on a target if you want to override the default).

    See https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/codegen/subsystems/thrift_defaults.py which provides defaults for thrift_library targets.

    1. Thanks for pointing that out! I'll take a stab at implementing it using the Subsystem approach

  10. ws

    Also, unless you can imagine installing multiple "stack" related tasks in the compile goal, you might consider simplifying to just compile.stack

    1. Can you clarify what you mean? Right now it already is also registered under compile.stack-build.

      I tried .install('compile.stack'), but doesn't seem to work (even when I explicitly run ./pants compile.stack) because it is interpreting compile.stack as the entire goal, and not a combination goal/task scope.

  11. contrib/haskell/src/python/pants/contrib/haskell/targets/BUILD (Diff revision 1)
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     
     

    IMO, we've been a bit too fine-grained with regard to python targets in the past. This whole directory should probably just be one target and logical package.

  12. These should be included in the target's Payload, which is what contributes to their hash to cause rebuilding of changes.

    cf https://github.com/pantsbuild/pants/blob/master/contrib/go/src/python/pants/contrib/go/targets/go_remote_library.py#L107

  13. Something somewhere should indicate that this is abstract... maybe an abstract execute method? but at the very least a pydoc.

  14. dependencies isn't transitive... to get the transitive closure, you'll want target.closure().

    Also, if you don't validate the types of your deps during construction you should probably do it somewhere here (or filter to isinstance(t, HaskellPackage)

  15. The target selection and looping feels like something that should happen in the caller.

    Also, you'll want to use a self.invalidated block to get proper build invalidation: see https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/jvm/tasks/resources_task.py#L55 for an example.

  16. Rather than a temporary directory, you should enable cache_target_dirs (see Automatic here: http://pantsbuild.github.io/dev_tasks.html#enabling-artifact-caching-for-tasks), and then store everything relevant for a target in its results_dir.

  17. better wrapping please

  18. pants.ini (Diff revision 1)
     
     

    prefer alpha here and below

    1. Can you clarify what you mean by "alpha"?

    2. Never mind, I just realized that you mean "alphabetize".

    3. Fixed

  19. 
      
JS
  1. I got to README last but saw Stu's comments there so stopped short.  Right now the README does not go with the commit since it refers to flags that don't exist and more.  I'm probably happy to turn a blind eye to that since you're retroactively sending this out for review in a layered way.
  2. contrib/haskell/README.md (Diff revision 1)
     
     
    Kill trailing ws.
  3. contrib/haskell/README.md (Diff revision 1)
     
     
    Kill trailing ws.
  4. `targets={` - no spaces around = for kwargs: https://www.python.org/dev/peps/pep-0008/#id21
  5. Dicts are formatted like {'a': 42} - colon immendiately adjacent to key, followed by 1 space, followed by value (if not line wrapped), see: https://www.python.org/dev/peps/pep-0008/#pet-peeves
  6. Some funny whitespace here:
    s/  , action=StackBuild  )/, action=StackBuild)/
  7. Alpha, but also this would be better as a single globs('*.py') target.

  8. Personal preference, bu un-needed.  A docstring is enough to hold the place for the body.
  9. Using a `globs('*.py')` here would be just fine.  It would also be preferrable to structure the targets/BUILD in the same way, as a single target.  We have lots of bad examples of single file targets, but they are the way they are for legacy reasons and will eventually ~all be cleaned up.
  10. Alphabetize please, in the order sort would.

  11. ... target(s).""" - the main idea being the period.  See https://www.python.org/dev/peps/pep-0257/#one-line-docstrings
  12. Please stay consistent with ' or " unless an embedded ' or " forces you to switch to avoid escaping. As a rule, pants code prefers ' like you used above in register_options

    Also - stack_task is blocing per HaskellPackage target_root, so with 2 or more and --file-watch things seem odd. I'm fine with a plan, but the plan should be TODO'd and possibly issued.

  13. Dir shadows a builtin - Reviewboard is helpful here with the black-bold - prefer directory or path or anything else.
  14. Why is pass OK? - deserves a comment or TODO-style comment since the control structure here is odd.
    1. This was changed as per Stu's suggestions. I restructed the stack_task function (and documented it better)

  15. Its probably worth naming this haskell_package to be clear you were handed in something already type checked. This should also be made "private" with a leading underscore in the method name as should the helpers above.

    1. This is actually a configuration for the entire Haskell project, which might span several Haskell packages. Can you clarify what you mean?

  16. Prefer a multiline string with formatting:

    from textwrap import dedent
    ...
            raise TaskError(dedent("""
              Every package in a Haskell build graph must use the same resolver.
    
              Root target : {root}
                - Resolver: {root_resolver}
              Dependency  : {dep}
                - Resolver: {dep_resolver}
              """).strip().format(root=target.address.spec,
                                  root_resolver=target.resolver,
                                  dep=dependency.address.spec,
                                  dep_resolver=dependency.resolver))
    
  17. No extra alignement, just cabal_packages = filter(StackTask.is_cabal, packages)

  18. Its hard to draw the line, but this is close to wanting a template.  Here's an example where the template is inline in the code which would work well here:
    https://github.com/pantsbuild/pants/blob/master/contrib/go/src/python/pants/contrib/go/tasks/go_buildgen.py#L150-L179
    https://github.com/pantsbuild/pants/blob/master/contrib/go/src/python/pants/contrib/go/tasks/go_buildgen.py#L314-L319
    1. I'll give this a try. Still in progress

  19. Please add method doc.  This returns an iterator which is interesting (and currently unused).  That deserves docs as does the fact this runs only over target roots.
    1. This function was restructured and documented, re-review it

  20. You can resolve via self. instead of StackTask. - when you never expect to subclass and override its a matter of personal preference.

  21. More dir built-in shadowing - please re-name.

    1. N/A after restructuring to use pants cache

  22. 100 cols!

    Also this should plumb through a work unit for proper logging and time tracking. An example is here: https://github.com/pantsbuild/pants/blob/master/contrib/node/src/python/pants/contrib/node/tasks/node_task.py#L79-L89

    In that example command.run is ~subprocess.check_call with the command array curried.

  23. Prefer self.context.log.error(...) to print(...) here.

    1. I fixed this and the next coment by using TaskError and expanded the TaskError's message to be fully self-contained

  24. Tasks should raise TaskError or a subclass

    1. Fixed (see immediately above)

  25. pants.ini (Diff revision 1)
     
     
    Alpha please for all 3 lists in this file.
  26. 
      
MA
  1. I love seeing new language support!

    As a non-Haskell person I appreciated the README but somewhere around the halfway mark it transitioned from a Haskell readme to a Pants design doc. Some of that was useful to me as a reviewer but probably doesn't have much use for a plugin consumer.

    The --file-watch flag is constantly monitoring previously compiled files and recompiling them if they are being edited, and if so triggers a rebuild? I don't know if I think that flag deserves its own task, as I say below. Pants has really good support for dependency management and caching, I am a little concerned that the stack tool may be too inflexible to take proper advantage of all that, especially if you are dependent upon patches going upstream.

  2. This is a deprecated import.

  3. As it currently stands, I think it would make more sense to move this entirely into StackTask. It's weird for the execute method to have all of its work in a subclass.

  4. The Yaml construction is per-target work and does something outside of this tasks's stated function. I would think of this work as a candidate to be split out to a YamlTask and have the StackTask consume its products.

    Right now this task is somewhat overloaded, imo.

  5. Why a temporary directory? You will lose the output. Put it in pants.d and then you can eventually work in the artifact cache so that you aren't as dependent on the stack cache.

  6. Wrap this subprocess call in a workunit and then you can get a bunch of logging for free. There is good examples in most of the existing pants tasks.

  7. Don't use a bare except, this will catch everything (including things like ctrl-c KeyboardInterrupt). Catch Exception at worst, and prefer to be even more specific. If you only catch exactly the exceptions you want to then outside exceptions can bubble up to their true owners.

  8. 
      
GA
GA
GA
GA
JS
  1. I'm not going to get to this today, but will have this as my morning coffee on Wednesday.
  2. 
      
JS
  1. Thanks Gabriel.
    
    I'm ready to ship after this - LGTM for the basis to iterate on.
  2. FYI, you can just say `self.is_haskell_package`, the resolution chain will look up to super and classmethods and staticmethods are accessible from self.
  3. Kill the leading and trailing spaces and end with period.
  4. This is already part of your hierarchy via Task, no need to add this (Task extends AbstractClass which is magic for make my metaclass be ABCMeta please).
  5. The summary sentence should be pulled up on the 1st line, then blank line like you have, then rest of the paragraphs.  The style pep for docstrings is 257: https://www.python.org/dev/peps/pep-0257/#multi-line-docstrings
  6. Its more work in python than you may like, but pydoc via sphinx domains supports type info that most advanced editor environments understand. Going forward you can, after a blank line and ending the doc:

    :param [name]: Describe the parameter sentence.
    :type [name]: :class:`[fully qualified type name]`
    ...
    :returns: Describe what is returned sentence.
    :rtype: :class:`[fully qualified type name]`
    :raises: :class:`[fully qualified type name]` when ...
    

    There are affordances for builtin types, ie:

    :param bool fast: `True` to go fast.
    

    The type and name and description in 1 line in this case instead of 2.

    Some docs are here if you are interested in maintaining type information for ~public APIs in future RBs. See here for some docs - its all slightly confusing because sphinx is very general and not python specific: http://sphinx-doc.org/domains.html#info-field-lists

    1. I love any sort of type annotations! I went ahead and added them.

  7. The codebase favors """ instead of ''', although there is use of both:

    $ git grep '"""' | wc -l
    4524
    $ git grep "'''" | wc -l
    141
    
  8. Passing the command line as a string to new_workunit via the cmd kwarg will help out folks debugging, so new_workunit(..., cmd=' '.join(args)) Next comment explains this a bit more.

  9. Right here there is one more bit of plumbing, the workunit can give you streams and then the output of the subprocess plumbs through the reporting structure. Looks like:

    subprocess.check_call(args, stdout=workunit.output('stdout'), stderr=workunit.output('stderr'))
    

    If you ever run ./pants server and then look for items with gears - those are command executions and will have the full command line and the captured streams for inspection post-facto. Can be very useful.

  10. This is redundant - you can just leave the method still abstract since its defined as abstract in your bases.
  11. 
      
ST
  1. Thanks! Will merge when Sirois is satisfied... or he can.

  2. contrib/haskell/BUILD (Diff revision 5)
     
     
     
     
     
     

    Can kill these now! They're inferred automatically. Yay.

  3. This should be a python_library.

  4. 
      
GA
JS
  1. Thanks Stu - submitted @ https://github.com/pantsbuild/pants/commit/5d14bb156b78873734687fc975bf589f8af25c73
    1. This is a lie, meant this comment for another review.  Definitely not submitted.
  2. 
      
JS
  1. Stack bootstrap support is over here: https://rbcommons.com/s/twitter/r/3095/
    1. Noting that this RB passes stack flags like --stack-yaml=... and --local-bin-path that RB 3095 does not support passing. There is a TODO to allow this in RB 3095 and my intent is you'll take that up and bend StackDistribution to real needs.

    2. Yeah, I will ensure that the behavior of this review board stays the same after merging in your changes, including those flags.

  2. 
      
OS
  1. 
      
  2. contrib/haskell/README.md (Diff revision 6)
     
     

    so, what happens if two different targets depend on each other, but use different resolvers?

    1. The command will fail with an informative error message. See stack_task.py:

            if target.resolver != dependency.resolver:
              raise TaskError("""
      Every package in a Haskell build graph must use the same resolver.
      
      Root target : {root}
        - Resolver: {root_resolver}
      Dependency  : {dep}
        - Resolver: {dep_resolver}
      """
      
    2. I also want to point out that I did try your idea of putting the resolver configuration in pants.ini. The issue was that pants.ini configuration is a special case of registering an optional argument and I believe is no way to make a pants.ini field mandatory (it can only be optional) for the same reason that pants forbids making command-line options mandatory. My rough understanding of pants architectural idioms is that mandatory data goes in targets and optional data goes in optional registered "arguments" which can be derived from command-line/ini-files/environment. However, I could be wrong.

    3. sounds good, I guess. Though I do worry about dependency hell on this.

    4. so, and I don't know basically anything about pants internals, I recall some fields, for instance for scala compiler versions, which I thought were set in the pants.ini. This seems to be close analog to resolver version to me. I wonder why we would do these two things differently.

    5. So one thing I could do is actually do the registration through an optional argument (which can be set in pants.ini) and then just fail with an informative error message if the argument is not found anywhere.

      I'm actually torn both ways: on one hand, my main concern with this is that you can't locally tell whether or not a project can be built successfully purely from the target/source; you require non-local information to reconstruct a build. On the other hand, I also believe that invalid states should be unrepresentable.

      Thinking about this more, I think the issue here is that I'm conflating packages with projects at the target level. If there were a target-level distinction between packages and projects then you would only need one project target at the root of your graph (which would have a resolver field) and that could depend on any number of package targets (which would not have resolver fields).

      What do you think?

    6. Can projects depend on projects? I guess not. If that's the case that projects can't depend on projects, and resolver is an arg to project, I guess that's okay.

      That said, failing if the ini does not have the resolver set, I don't think is that bad. There are a lot of other ways you can fail (I assume you need stack installed?). Also, it won't be that delicate. So, this variable will be set, and then very infrequently changed, so it is not a risky runtime failure in that it could happen at any time. And it is appealing to keep the build as simple as possible.

      On the other hand, there are really only two things I can see wanting to build: artifacts to share code (e.g. something that could be uploaded to hackage), binaries to run. Those things, binaries and libraries, could be the things that use resolvers, I guess.

      Of course, uploading libraries is a whole can of worms because then the version range for dependencies probably comes into play.

    7. Yeah, it's accurate to say that projects can't depend on other projects. I initially wasn't sure if that's how stack behaved, but I just checked this by testing what stack does if a source package that is also a project depends on another source package that is also a project and they both have stack.yaml files for their respective projects with different resolvers. stack ignores the project configuration for the dependency project and only uses the project configuration for the root project.

      In other words, we need three target types, but more like this (using more descriptive names to avoid confusion):

      • haskell_project - the project-level target that just specifies the resolver
      • haskell_hackage_package - a package-level target that points to a version range for a package on Hackage
      • haskell_source_package - a package-level target that just points to a source package

      ... and their relationship would look like this:

      data HaskellProject = HaskellProject
          { resolver        :: Resolver
          , hackagePackages :: [Hackage],
          , sourcePackages  :: [Cabal]
          }
      
      data Hackage = Hackage
          { name         :: String
          , versionRange :: VersionRange
          }
      
      data HaskellSource = HaskellSource
          { name         :: String
          , path         :: FilePath
          -- Later on, also add the following field once we support building `*.cabal` files for the user
          , dependencies :: [Hackage]
          }
      

      ... and you could only specify haskell_project targets on the command-line. Any other target will fail with an informative error message explaining why.

      How about that?

    8. Oops, I forgot to rename the types in the example code after giving more descriptive names to the targets:

      data HaskellProject = HaskellProject
          { resolver        :: Resolver
          , hackagePackages :: [HaskellHackagePackage],
          , sourcePackages  :: [HaskellSourcePackage]
          }
      
      data HaskellHackagePackage = HaskellHackagePackage
          { name         :: String
          , versionRange :: VersionRange
          }
      
      data HaskellSourcePackage = HaskellSourcePackage
          { name         :: String
          , path         :: FilePath
          -- Later on, also add the following field once we support building `*.cabal` files for the user
          , dependencies :: [HaskellHackagePackage]
          }
      
    9. This looks like a good way to go to me.

      Not to tackle now, but what about thrift support (or any code-gen such as protobuf etc...)? What would that look like? In that case, I assume that you could make a HaskellSourcePackage that depends on a thrift target, or HaskellSourcePackage = Thrift FilePath | HaskellSource

    10. Yeah, I think Haskell generated from thrift is treated as "just another source package"

    11. Alright, I implemented pretty much exactly as described (now there is a target for projects and you only specify the resolver once there and not on packages).

      There was only one extra detail: I also added a HaskellStackagePackage target which is there for two reasons:

      • For use in later auto-generation of *.cabal BUILD files
      • To supply as a package argument to stack on the command line so that you can, for example, load Stackage packages in the REPL in a "headless project" (i.e. one without a source package)

      For example, if you wanted to load the pipes package in the REPL, you would create two target:

      haskell_stackage_package(name='pipes')
      
      haskell_project(name='my-project',
        resolver='lts-3.1',
        dependencies=[':pipes'],
      )
      

      ... and then (in a future review board once I add the repl goal) you could load pipes in the REPL by doing:

      $ ./pants repl path/to/build/file:my-project
      

      ... and under the hood that will translate to:

      $ stack --resolver=lts-3.1 ghci pipes
      

      Also, all other package dependencies will behave the same way, meaning that the package names of source and Hackage packages will also be supplied on the stack command line.

  3. 
      
GA
GA
GA
Review request changed

Status: Closed (submitted)

Change Summary:

Merged as c1b3b14787188819b8613629814824f1618761b3

Loading...