Use the z.jar files on the zinc classpath if they exist

Review Request #3955 — Created May 30, 2016 and discarded

zundel
pants
zundel/compile-with-zjars
3529
pants-reviews
benjyw, gmalmquist, nhoward_tw, stuhood

Added feature to use the z.jar files on the zinc classpath if they exist.

This decreases the compile time of our biggest target from 6m30s to 1m24s.

Ran this through our CI builder and tested locally. Dramatic compile time improvements all around.

CI is green at https://travis-ci.org/pantsbuild/pants/builds/134164353

  • 2
  • 0
  • 7
  • 0
  • 9
Description From Last Updated
As mentioned in my comment to Benjy: we create these jars as we finish compiling each target, so there should ... stuhood
They always exist. The "previous compile" part isn't accurate, because the current compile creates them for upstream targets before compiling ... stuhood
  1. It seems to work find in our Java repo, but this patch fails pretty miserably on scala code. I'd be happy for someone to take this over if they like. I am just ecstatic for having found this scale of perf improvement!

    1. nm, it does work on scala builds. There was some other bug.

    2. Excellent! That makes more sense...

      So now I'm wondering if we ever need to check for the existence of the zjar. We should only put successfully built targets on the upstream classpath, and those should always have a zjar. No?

  2. 
      
  1. That is an amazing improvement! I can't see why this wouldn't work for scala, but that might be because I'm not clear on where these z.jar files get created.

    1. I can't help but think that the performance improvement in our CI builder is postponing the heat death of the universe by a measurable amount.

      My guess is that the z.jar files are somehow incomplete for the scala build or that there is sometimes more than just one jar created, but I haven't dug into it in the least.

  2. First sentence needs rephrasing.

    Also, 'z.jar' isn't really a known term. Better to explain what these are?

    1. I think I should also land this behind a flag and add some tests, at least until everyone can prove it out.

    2. I did a little bit of explanation. I left the name 'zjar' in for now for lack of creativity.

    3. +1 for a flag

  3. Maybe a more expressive name, like reduced_cp_entries? The meaning of out is ambiguous.

  4. If e isn't in the mapping then this will fail on zjar not being defined.

    Generally this control flow is odd. Wouldn't this do what you want?

    reduced_cp_entries = OrderedSet(zjar_mapping.get(e, e) for e in cp_entries)
    

    And you can do the jar existence check when you build the mapping. Or maybe it doesn't need checking? Doesn't zinc create these jars, and doesn't it always do so?

    1. I think this may be the "scala" bug, nothing really to do with scala at all I guess.

      It is also a waste of time to re-check them over and over, but you can't do the jar existence check when you build the mapping before the compile. Many of those jars are waiting to get built.

  5. 
      
  1. 
      
  2. Ah, right. Then you can replace the zjar_mapping.get(e, e) with a lambda that does the check.

  3. 
      
  1. So this doesn't work with Scala? I'm surprised.

  2. Maybe start with default=False?

    Also, maybe explain in the help that this means use jar files instead of loose classfile dirs.

  3. Add a comment explaining why this check has to happen here, and not when creating zjar_mapping.

  4. 
      
  1. This will make incremental compilation ineffictive, unfortunately. Zinc will no longer have analysis for the classes in the jars, and so it will treat them as 3rdparty/external dependencies, which disables most of incremental compilation.

    To confirm, benchmark compiling once cleanly, and then compiling an edit to an inner library.

    1. Where are these jars created?

    2. We create them: https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/jvm/tasks/jvm_compile/jvm_compile.py#L728-L729


      I began working on a patch (about a year ago: see slack) to add support to sbt for analyzing jars, and nearly had it working before other priorities came up: https://github.com/sbt/sbt/compare/1.0.x...twitter-forks:stuhood/output-jars

    3. Sounds like this could be a big perf win, at least for Java. So we should probably go ahead with this, behind a flag that defaults to off, and with appropriate warnings.

    4. Go ahead with this change, I mean. The patch would be great too, if and when you can get around to it :)

  2. 
      
  1. 
      
    1. Created https://docs.google.com/a/twitter.com/document/d/1mfPHBcyAqAJKhH03ZEtRj78fk0tYD9OppkBDOi3Deuw/edit?usp=sharing to explain some past efforts in this area.

    2. Moved to https://docs.google.com/document/d/1ivAqh7QFelFZ0rUUkslyhOcsC8pAYGN9w09ePvHf9IY/edit?usp=sharing

  2. This will affect the analysis such that incremental compiles will not work, so it should definitely be in the fingerprint.

    1. I was wondering if there may be a place for this substitution even with incremental turned on. Even when using incremental mode, if a VTS is valid before the compilation step starts, or if no code is actually changed during the compile, could it be that we could substitute the z.jar file on the classpath for any element of the classpath that isn't going to be recompiled?

    2. I think that that would fully invalidate targets downstream of the valid target... ie, cause incremental compile to fail for them.

      As mentioned in the doc though, it might be possible to invert that and use the jars for targets that were cleanly compoiled, but then switch to using loose directories for the first incremental compile? Then, all targets which had never been editted in some environment would be represented by jars, and any that were incremental compiled would use directories.

      One issue though is that I don't think that whether something was an incremental compile or not is currently recorded in its fingerprint, so I'm a bit worried that that would still cause strange interactions with the cache...

    3. This may be incompatible with incremental compile, but the performance benefits when compiling our Java codebase are so large that we can stand to not have the incremental option on any more. Is there a way we can accept this patch and then iterate as you find out more about the incremental solution?

  3. They always exist from a previous compile.

  4. As mentioned in my comment to Benjy: we create these jars as we finish compiling each target, so there should be a much cleaner option available to locate them than attempting to infer their location.

    1. I'm not sure what you mean by infering their location, I'm getting the location straight out of the the compile_context. I'm only checked the existence of the file because I was concerend there might be some kind of race condition between threads or some reason why it wouldn't create a z.jar (e.g. no files in the directory)

  5. 
      
  1. 
      
  2. They are, in fact, in products.

  3. 
      
  1. Ship It!
  2. 
      
  1. Ship It!
  2. 
      
  1. Ship It!
  2. 
      
  1. 
      
  2. They always exist. The "previous compile" part isn't accurate, because the current compile creates them for upstream targets before compiling downstream targets.

    So the existence checks shouldn't be necessary; if we ever failed to find one it would be an error in this method: https://github.com/pantsbuild/pants/blob/ef3c4f54d0a80e78f958bf5905313a41df6f196d/src/python/pants/backend/jvm/tasks/jvm_compile/jvm_compile.py#L755-L769

    1. I had a version where I put in an 'else' case with a log message and it was hit on every target for the current one we are compiling. I could add more logic here to try and exclude the current module being compiled I suppose, but I'd have to pass more context into this fucntion.

      I've run out of time to work on this until next Wednesday. If someone else wants to land it in master as is or modify it and land it that is fine by me.

  3. 
      
  1. Moved to: https://rbcommons.com/s/twitter/r/3982/

  2. 
      
Review request changed

Status: Discarded

Change Summary:

See https://rbcommons.com/s/twitter/r/3982/ which supercedes this patch

Loading...