Separate the ivy resolution and ivy repository cache directories.

Review Request #3436 — Created Feb. 7, 2016 and updated

patricklaw
pants
2897
pants-reviews
benjyw, gmalmquist, jsirois, nhoward_tw, stuhood
Currently, the ivy cache directories are unified and by default
live in a shared folder under the user's home directory (~/.ivy2/pants).
While this is fine for fetched artifacts (there is a lock strategy),
it is explicitly documented by Ivy to be unsafe for concurrent
access (http://ant.apache.org/ivy/history/2.4.0/concept.html#cache).
Since multiple processes running pants in separate workspaces
should not interfere with one another, the resolution cache should
therefore be split out.

One option is to always set the default ivy cache (both resolution
and repository) to a workspace-local directory.  This is certainly
safe, but it is not performant since a clean of the workspace
(or clean-all if the directory is under the workdir) will result in
all downloaded artifacts being deleted.  This is probably okay for
small projects, but not projects with large numbers of transitively
resolved artifacts.

This change favors backwards compatibility by leaving the unsafe
strategy as the default, but adds a new option to control where
the resolution dir lives.  With a properly set up ivysettings.xml,
this gives us the best of both worlds (correctness vs performance)
and paves the way for always separating these two directories.

The caveat is that this resolution dir must be known to pants, since
pants must extract and parse the resolution reports after invoking
Ivy.  Thus, if pants plumbs this directory, Ivy _must_ use it or
the resolution post-processing will fail.  Unfortunately, the
setting cannot be plumbed directly.  Instead we pass in a JVM
property which the user is required to consume properly in their
ivysettings.xml.

By default, an ivysettings.xml is not required.  Therefore we make sure
to error out if a custom resolution directory is set but a custom
ivysettings.xml is not.  Because of the flexible nature of ivy
configurations, it is not feasible to simply parse and lint against
the use of the property (ivy has "includes" and the provided settings.xml
might just be a thin wrapper over others).

In the future we should capture the relevant structures that ivy
settings can express and generate the ivysettings.xml ourselves always.
Alternatively, splitting resolution and fetching in a way that fully
captures both resolution and fetching in the pants artifact cache
might make the (currently unperformant) strategy of always keeping
resolution and repository directories within the current workspace
the more desirable option.

CI is green: https://travis-ci.org/pantsbuild/pants/builds/107644742
Added tests for the new error conditions.

  • 2
  • 0
  • 0
  • 0
  • 2
Description From Last Updated
How about making the if else explicit here? ie # If '-cache' is in the provided args, it is assumed ... NH nhoward_tw
I think we should copy the jvm_options as well to ensure we aren't modifying something a caller might have a ... NH nhoward_tw
PA
ZU
  1. 
      
  2. You've mentioned it is difficult to warn ahead of time if a custom ivysettings.xml doesn't have the right settinsg for resolution dir, but you have an opportunity in IvyUtils.parse_xml_report() to provide a hint as to what might be wrong if the report can't be found. (If you saw that it wound up in ivy_cache_dir, that would be a big hint about what was going wrong.)

    1. Agreed, though I don't think we should go as far as looking at the potential other spot (it could be there, but it might also be misconfigured such that it's not there). I'll add a stronger note to this error pointing out the likely cause.

    2. Done, though in a different place. It's not an error for the path to be missing here, but it's an error at src/python/pants/backend/jvm/ivy_utils.py:350, and that's the error that triggers when a misconfiguration exists.

  3. 
      
PA
PA
Review request changed
ZU
  1. There is still some documentation to be updated at https://pantsbuild.github.io/setup_repo.html#tell-pants-about-your-artifact-repository

  2. 
      
BE
  1. +! for generating ivysettings.xml, which is a longstanding TODO.

    +100 for using the artifact cache for both resolution reports and downloaded artifacts, and not rely on ivy caches at all.

  2. Would it be more accurate to say "The directory where Ivy stores artifact version resolutions." ?

    "uses for resolves" could mean many things.

  3. src/python/pants/ivy/ivy.py (Diff revision 3)
     
     

    It's more than just a scratch space, right? The ivy report, for example, is an output, not an intermediate thing.

  4. src/python/pants/ivy/ivy.py (Diff revision 3)
     
     

    The distinction between args and jvm_args is a bit confusing to the casual reader. The convention in most places is to call these args. Maybe you could create cache_args and concat them with args at the point of use?

    1. How about calling jvm_args actual_args, since its purpose is to contain the actual args to use for the command invocation?

  5. 
      
NH
  1. 
      
  2. src/python/pants/ivy/ivy.py (Diff revision 3)
     
     
     
     
     

    How about making the if else explicit here?

    ie

    # If '-cache' is in the provided args, it is assumed that the caller is taking full control
    # over caching conventions, including passing the appropriate '-settings' and
    # JVM properties for repository/resolution directories if custom ones are defined.
    if '-cache' in args:
      actual_args = args[:] # or some other copy mechanism
    else:
      actual_args = ['-cache', self._ivy_cache_dir]
      # ...
      actual_args.extend(args)
    

    Also, should it be an error to pass an explicit -cache when resolution_dir, ivy_settings et al have been set? Or perhaps the error message could note that it's possible to manually override the settings using args?

    Should this check look for both -cache and -settings? Or is it sufficient to look for just -cache? Maybe we should add a debug log line here, noting that we're passing through the args because -cache is present to make it easier to troubleshoot.

  3. src/python/pants/ivy/ivy.py (Diff revision 3)
     
     

    I think we should copy the jvm_options as well to ensure we aren't modifying something a caller might have a reference to.

  4. 
      
JS
  1. Aha - this did never make it in.
    I was looking for split cache functionality for CI just now and could not find it.
    I'll setup CI to use non-shared (ie: not in $HOME/.ivy2/...) ivy caches for now but I am interested in your intent with this RB.
  2. 
      
Loading...