Fix java zinc isolated compile analysis corruption

Review Request #2325 — Created June 4, 2015 and submitted

jrosenfield
pants
jrosenfield/zinc-isolated-analysis_tmp
1626, 1639
2339
4e4a4de...
pants-reviews
jsirois, stuhood, zundel

Fix java zinc isolated compile analysis corruption described github issue #1626

Add use of temporary directories to create_compile_jobs
Add --delete-scratch to jvm_compile default options
Add ensure_analysis_tmpdir to jvm_compile_strategy to create the temporary analysis directory in both strategies

Travis CI in green build: https://travis-ci.org/pantsbuild/pants/builds/65623817

  • 0
  • 0
  • 1
  • 0
  • 1
Description From Last Updated
JS
  1. The summary and description will be the commit message they are too focused on the impl to be very useful when reading the log.  Its probably better to focus on the issue fixed - this commit ensures analysis files are handled atomically to prevent corrupted read state.  Then maybe go into the details of temporary dirs, but that's probably less important log info.  The style info in https://github.com/pantsbuild/pants/issues/1626 is more on-point.
  2. This 2-step is awkward and both strategies in fact define the same `self.analysis_tmpdir` path.  Since the impls don't actually care what the path is, just that they know its value, how about have  self.ensure_analysis_tmpdir() both create and return a tmpdir of its choosing?  The nicest API would probably just be to use a @memoized property for that which encasulates the whole idea of ensure.
    1. Since both strategies, did you want me to put the new analysis_tmpdir() w/ the ensure logic in super? They have different analysis_dir's so I could make it as an abstract property that's memoized in the different strategies...

    2. My fault. When I looked over Jessica's shoulder at the problem I was looking at self._analysis_dir which differs between the two implementations and suggested she make an @abstractproperty

  3. It does look like it can go. Its called by JvmCompile via self.check_artifact_cache only when there are at least invalid_vts present. The call above in prepare_compile is also guarded by the existence of invalid_vts and that above call always preceeds this one.

  4. This is also racy if tmp_analysis_file and compile_context.analysis_file are on different filesystems.  The problem is solved though over here: https://github.com/pantsbuild/pants/blob/master/src/python/pants/util/fileutil.py
    1. So, in other words, if the ctrl-C happens in the middle of this copy, we could still be left with a truncated analysis file.

  5. copypasta from `post_process_cached_vts` above
  6. Single blank line between methods: https://www.python.org/dev/peps/pep-0008/#blank-lines
  7. isort does not like this extra blank line.  You can install commit hooks so you find this out on the client side with `./build-support/bin.setup.sh`, see: http://pantsbuild.github.io/howto_contribute.html#getting-pants-source-code
  8. 
      
JR
  1. 
      
  2. Stuhood: Can you explain what you meant by this?

    1. I was just wondering whether it was necessary to create the .portable analysis file in a temp directory rather than in place. But with a bit better understanding now, I can say that it is not necessary. Will remove this comment the next time I touch this. (or you can!)

    2. Good to know! I can take care of it while cleaning up the extra code under the _empty_analysis_cleanup decorator.

  3. 
      
JR
JR
ZU
  1. 
      
  2. I noticed that the travis-ci build is failing. The appropriate dep to pull in pants.util.fileutil needs to be added to the appropriate target in the src/python/pants/backend/jvm/tasks/jvm_compile/BUILD file

  3. I don't think we need this here. Its already handled in __init__ in the superclass.

  4. Nit: this step doesn't really have to be atomic. If something fails here then we'll make a new copy next time.

  5. 
      
JS
  1. 
      
  2. The passed vts is guaranteed to have one target - it may help debugging if the tmpdir just used the vts single target id instead of a UUID.  Also guaranteed unique, but clear which target's analysis it is.
  3. Consider killing this private variable assigment and instead just renaming the `ensure_analysis_tmpdir` method to `_analysis_tmpdir` and turning it into a @memoized_property.  The advantage is not creating a tmpdir when there are no invalid targets.  In general avoiding real work in constructors is a good thing and in particular its been a push for pantsbuild/pants.
    1. I was advising her against this in that when you invoke a property, its generally not expected to have side effects, like creating a directory and adding a shutdown hood.

    2. OK - a @memoized_method then.  Have everyone call ensure...().  I guess there is a tension here.  What you describe is true, but also encapsulating control in 1 spot is generally desirable too.  
      As you both see fit.
    3. Is the memoized_method code threadsafe? These analysis files are used from the compile worker pools. What would you think about putting this in prepare_compile?

    4. Ah yes - definitely not thread safe.  Using prepare_compile to initialize the attribute is probably best.
    5. Okay, I'm adding it to pre_compile so it's only called once.

  4. 
      
JR
JR
JS
  1. 
      
  2. name is not safe/unique, its just the part of the target address after :, you really have to use target.id, which is globally unique for all targets in a pants workspace.
  3. 
      
JR
JR
JS
  1. This LGTM pending travis-ci.  You might want to formally add Nick Howard in place of Stu since Stu's away to Scala Days.
    1. Thanks John, I will work with Jessica to get this patched in.

  2. 
      
ZU
  1. Thanks Jessica! Committed as 92abf54. Please mark this review as submitted and close the associated github issues/pr's.

  2. 
      
JR
JR
JR
JR
Review request changed

Status: Closed (submitted)

Change Summary:

Committed as 92abf54. I'm learning a lot from the work I'm doing but not as much as I am from code reviews. Thank you John, Eric and Nick for your help!

ST
  1. Thanks for the fix!

  2. Hm... this was the intended goal of the _empty_analysis_cleanup decorator, so perhaps that code is no longer necessary?

    1. _empty_analysis_cleanup is intended to clean up a half-written or unparsable analysis file, so I agree, it might be unecessary if this was the only cause of that problem.

  3. 
      
Loading...