fix cloc, filemap, list-owners and changed when they come across deferred_sources

Review Request #3830 — Created May 5, 2016 and discarded

cheister
pants
fix-deferred-targets
3326
pants-reviews
jsirois, stuhood, zundel
fix cloc, filemap, list-owners and changed when they come across deferred_sources

Travis CI: https://travis-ci.org/pantsbuild/pants/builds/128563441

This should fix the error:

Exception message: Field requires a call to populate() before this method can be called.

When calling changed, cloc, filedep or list-owners on a target that depends on a deferred_source target

It seemed like having the tasks require deferred_sources in a prepare method would be the better way to fix this

  @classmethod
  def prepare(cls, options, round_manager):
      ...
    round_manager.require_data('deferred_sources')

But that only worked for cloc, and did not work for the failing test in cloc

  • 0
  • 0
  • 0
  • 1
  • 1
Description From Last Updated
CH
  1. 
      
  2. It seemed like this call to sources_relative_to_buildroot() would also need a skip_deferred_sources=True but I could not find a test case that would cause it to fail.

    1. I found a test case that exercises this and added it to test_list_owners.

  3. 
      
JS
  1. > But that only worked for cloc, and did not work for the failing test in cloc
    
    I'm not sure what the right path is, but that at least would have been the 0 surprises path... I think - this stuff is always subjective.
    Can you expand on the difficulties you encountered?  If only in tests, and if you agree fetching sources should be don for any task operating on sources, the issues should be worked through it seems to me.
    If the issue was in the production code, it still is likely worth pushing through.
    1. One way to avoid the subjective call might be to ask what the new engine will do.  This change should probably try to do the same if at all possible to avoid semantic disruption.
    2. The problem with cloc in particular was I could add round_manager.require_data('deferred_sources') to def prepare and cloc would work if ran ./pants cloc from the commandline but the def test_counts_on_deferred_target test would still fail. The change in this PR to add for source in target.sources_relative_to_buildroot(skip_deferred_sources=True): fixed the problem for both the commandline and the test so I went with that fix instead.

      How do I find out what the new engine will do for this case?

    3. Stu will chime in - he's leading the new engine effort.
    4. if you think about it, most folks probably don't want generated code counted in cloc so skipping over deferred sources instead of requiring them to be generated/unpacked is probably the right thing to do instead of adding the require_data()

    5. I honestly have 0 clues what folks will think and if they'll be of one mind - I am demonstrably poor at this.
      I drilled in on this since the description makes it sound like Chris wanted to go the `require_data()` but only technical difficulties stopped him; ie: he seems ambivalent, but with a 1st preference for `require_data()`.
      I'm more generally concerned about the non-scalability of either solution - though I haven't dug in to see if I can offer a suggestion.  Its suboptimal that every task using sources has to know deferred sources exist in the wild and deal with them or blow up.
    6. I agree I don't think people would expect generated code to count in cloc numbers.

      I would prefer the require_data() solution since it means changing fewer of the build_graph internals and is more obvious but I'm not sure why it doesn't work all of the time.

      I share your concern about this solution, it feels a bit piecemeal to patch the commands that might come across a deferred source and test their behavior. I've only "fixed" the four I found in this change, I'm not sure if others commands are affected. It gives me feeling like something more fundamental might need to be fixed in the build_graph?

    7. So, the reason why the 'require_data()' alone doesn't work in our environment is because the 'deferred_sources' product isn't enough for our use case that uses a prep_command() to generate sources. It is impossible to know when to schedule because the tasks that make the deferred_sources product type don't know they have to wait on the prep_command() to run as well. In fact, I think the prep_command thing we have been doing is a pretty big hack.

      An alternative solution that Chris and I discussed was to not use the 'skip_deferred_sources' argument, but to instead just always assume if sources havne't been popluated to silently (or with a debug message) populate them with blank sources. The problem I have with that solution is that it is really the same issue that John originally points out - new devs have so somehow be awaare of this issue, but the behaviour of pants being broken will be so sublte as to go unnoticed. At least this way there is an exception being thrown that says something about deferred sources and we know right away. That's why I gave it a Ship It! but we should probably add more documentation around using the sources_relative_to_buildroot() that point out this pitfall more clearly to future devs that want to start using the method.

    8. Deferred sources will be much easier to accomplish in the new engine, and are one of the first things we'd like to tackle (since they're a very isolated case of ivy usage). There are multiple examples of providing Sources for a target in the exp.examples package, but doing them for real is blocked by ivy support: see https://github.com/pantsbuild/pants/issues/3336

      And yea, anything involved with prep_command is hacky, let alone mutating unrelated targets in the graph.

    9. The narrow question is will commands like filemap see deferred sources in the new engine or not.  Or is this opt-in/opt-out per task?  Basically what will the natural default be if any.  My suggestion is Chris tries to match that if possibly to avoid disruption.
    10. The narrow question is will commands like filemap see deferred sources in the new engine or not.

      Yes: by default I expect that commands like filemap will see deferred sources in the new engine, unless we choose to explicitly filter generated sources. But... as implemented, deferred sources don't act like generated sources.


      In terms of matching existing expectations, it occurs to me that the clearest way to make these "act like generated code" is to pretend that they were codegen'd, and create a synthetic target for them. It would be a bit of a re-architecture of deferred sources, but essentially, have a unpacked_sources target with a lang parameter (similar to thrift_library), and have the UnpackJars task create a synthetic target of the appropriate type paired to the unpacked_sources target.

      So my recommendation here would be an overhaul. I expect that it would be a smaller change than this review is already, and would be happy to help.

    11. We need something to keep Pants from throwing exceptions for pants changed, and we can't commit to an overhaul at this point. pants changed is an intrinsic part of the solution for determining what targets need to be re-compiled. We need some kind of fix so we can completely expunge the maven logic from CI.

    12. @stuhood: If I go to the trouble to modify deferred sources to create a synthetic target I will have to:
      1) Create another task that has to somehow know to run early in the build graph, before cloc or others will run
      2) Then I'll have a graph full of targets with unresolved deferred sources... I still won't able to call sources_relative_to_buildroot(), each task will have to know about deferred sources and filter them out.

      I'm not sure how all that work improves on this patch? It still requires a-priori knowledge of deferred sources and in a less obvious way.

    13. 1) Create another task that has to somehow know to run early in the build graph, before cloc or others will run

      It would be modifying the existing UnpackJars task to act more like SimpleCodegen, a task which injects a synthetic target derived from some target (rather than mutating an existing target to add sources).

      2) Then I'll have a graph full of targets with unresolved deferred sources... I still won't able to call sources_relative_to_buildroot(), each task will have to know about deferred sources and filter them out.

      The point is that it is exactly like codegen... the special case is already pre-implemented: we identify the generated sources due to the fact that they are attached to synthetic targets. No need for any additional special cases: we already have it in the form of support for synthetic targets.

    14. I think this will work and I am on board. I will follow up soon with another RB. In the interim we've cherry picked this change into our local build.

    15. I gave this some thought this weekend, but was otherwise distracted by the compiler performance issue. @stuhood does this sound like what you were thinking of?

      There is something that makes this different from the way we generate synthetic targets codegen which I wasn't sure how to tackle. In codegen, we create new synthetic targets, usually of type java_library() or python_library(). "

      java_protobuf_library() depends on synthetic java_library()

      But that won't be the case here. How are we supposed to know what kind of target to make? (there is a hook in each codegen plugin that creates a specific type of synthetic target for each codegen target it encounters). Does that mean we have to have special logic for each type of target that is to support DeferredSources?

      Based on that, I thought that the synthetic targets that we generate from the deferred sources task should I think just be clones of the original targets:

      java_protobuf_library() depends on synthetic java_protobuf_library(with real sources)

      If we do this, its going to make codegen more complex - it is going to have to know to only run code generation on the synthetic version of java_protobuf_library(). And what do we do with the old java_protobuf_library? It has nothing in the sources field, do we just skip over it?

    16. Generating multiple different types of targets is already supported in scrooge, for example. See:

      https://github.com/pantsbuild/pants/blob/master/contrib/scrooge/src/python/pants/contrib/scrooge/tasks/scrooge_gen.py#L112

      ...which uses the language name to decide which target type to construct.

      Other than that, passing through the rest of the end user kwargs to the new target should work.

    17. Let me be more concrete here.

      If you have a resources() target, what kind of target are you supposed to generate? Another resources() target I guess. What is supposed to happen to the original resources() target that has no sources? Does it create an empty jar? Is it ignored?

      If you have a java_tests() target that gets it sources from something else, what is supposed to happen to the original java_tests() target that has no sources. That is currently an error. Do we need to add special logic now to skip it? Is it going to print out "0 tests run"

      I don't feel like the alternative to this patch is very well thought out. Its going to create all kinds of special cases that are going to be difficult to test and handle. Pants is currently very broken when you use deferrred sources, I want to go ahead and land this patch and not refactor every task that could possibly run into a target with deferrred sources.

    18. Sync'ed up with Stu and he clarified the design for me a bit. We discussed a syntax that looks like this on slack:

      remote_sources(name='proto',  dest=java_protobuf_library,
        sources_target = ':proto-source-set', 
        args= {
            imports = [],
            dependencies = [
             ...
            ],
            platform = '1.7',
            provides = artifact(org='com.squareup.external.protos',
                            name='foo-proto',
                            repo=square,),  # see squarepants/plugin/repo/register.py
        },
      )
      

      the java_protobuf_library target would be created as a synthetic target. The sources parameter would be set to the unpacked sources. The other arguments listed in args would be passed down to the constructor for generating the synthetic target.

    19. But, if it's possible to just introduce the appropriate product dependencies to these tasks as a stopgap before #3336, that would be preferable. As mentioned in slack, we haven't seen the changed goal be affected.

    20. Its not going to be acceptable solution for Square's internal repo to add a dependency on code generation for these tasks. It would tack on a long delay to run code generation for the entire repo before we run every single CI job.

  2. 
      
CH
ZU
  1. 
      
  2. Using add_to_build_file is considered the "old way" to do things, but this is a relatively complex relationship of targets, I think its OK to write it this way. It would be possible to re-write this in terms of self.make_target() if you wanted to pursue it.

    1. I tried using self.make_target() the first time I wrote the tests but this block of code in self.make_target() does not let you create the deferred sources dependencies

          # TODO(John Sirois): This re-creates a little bit too much work done by the BuildGraph.
          # Fixup the BuildGraph to deal with non BuildFileAddresses better and just leverage it.
          for traversable_dependency_spec in target.traversable_dependency_specs:
            traversable_dependency_address = Address.parse(traversable_dependency_spec,
                                                           relative_to=address.spec_path)
            traversable_dependency_target = self.build_graph.get_target(traversable_dependency_address)
            if not traversable_dependency_target:
              raise ValueError('Tests must make targets for traversable dependency specs ahead of them '
                               'being traversed, {} tried to traverse {} which does not exist.'
                               .format(target, traversable_dependency_address))
            if traversable_dependency_target not in target.dependencies:
              self.build_graph.inject_dependency(dependent=target.address,
                                                 dependency=traversable_dependency_address)
              target.mark_transitive_invalidation_hash_dirty()
      
    2. Sounds like its not worth it then.

  3. Looks like you are modeling this after a script we have in our repo that invokes a bunch of npm commands and dumps the output into a directory to be bundled into the output.

    This may raise flags with others reading the code as there is a contrib module that you can use for npm, which is a detail that may detract from understanding why we are doing this.

    If this is just a placeholder name, just call it 'generate-assets' (just to avoid confusion)

    Another way to generate these deferred sources fields is to use the unpack_jars() target, whicih is maybe the use case for DeferredSources that most non-Square pants devs are familiar with. Maybe that would be less confusing.

  4. seems like this target will fail if compile-npm isn't found on the path. (maybe also change the name to generate-assets or something like I said
    above)

    If you don't want to change to unpack_jars() like I suggested above, maybe add a comment that we don't actaully expect this prep command to be run, its just there to demonstrate the from_target() use which causes a target with DeferredSources to be instantiated in the build graph.

  5. Same comment as previous tests, maybe use unpack_jars() as an example instead to avoid confusion.

  6. 
      
CH
ZU
  1. Ship It!
  2. 
      
ZU
  1. 
      
  2. src/python/pants/build_graph/target.py (Diff revision 3)
     
     

    I would like to add a more detailed note to future callers of this method. (also add a note in has_sources() that points to this note)

    Note: Not all sources are available immedately after the build graph is construted.  Some tasks  create source files (unpack_jars for example) and  are effectively code generators.  These tasks should create a `deferred_sources` product type that your task can depend on.
    
    If you intend to call this from a task that runs without somehow depend on output from the `compile` or `resolve` goals, you may want to pass `skip_deferred_sources=True` to avoid having to wait on these other tasks to run.  If you do want your task to return these sources, make sure your task depends on the `deferred_sources` product so that it gets scheduled to run after these sources are populated.
    

    The fact that this alone doesn't work for Square is not worth mentioning. The prep_command hack that we use to generate sources is a band-aid solution that I don't think is worth explaining.

  3. 
      
CH
GM
  1. Instead of all this plumbing, why not just add a fake version of the src/python/pants/core_tasks/deferred_sources_mapper.py task that populates all deferred sources fields with empty sources?

    1. How would you suggest we get that scheduled? Or would we just manually invoke it from each of these tasks where we are adding the 'skip_deferred_sources'?

    2. You could either invoke it as a manual utility function, or schedule it as a task by having these tasks require a product it produces.

      The former would be safer if it is likely that these tasks will ever be invoked in the same invocation as other tasks which actually do care about the deferred sources.

  2. I believe it's preferred to use self.make_target for things like this unless there's a specific reason against it.

    1. Nevermind, just saw your back-and-forth with Eric about this.

  3. 
      
CH
Review request changed

Status: Discarded

Change Summary:

Closed in favor of https://rbcommons.com/s/twitter/r/4014/

Loading...