Improve detect-duplicates to detect duplicates in internal dependencies

Review Request #394 — Created May 22, 2014 and submitted

ity
pants
pants-reviews
areitz, jsirois, tejal, travis
Improve detect-duplicates to detect duplicate classes and resources in internal and external dependencies
$ ./pants goal binary src/java/com/twitter/common/application:test-application -ldebug

18:06:14 00:00 [main]
               (To run a reporting server: ./pants goal server)
18:06:14 00:00   [bootstrap]
18:06:14 00:00   [setup]
18:06:14 00:00     [parse]
               Preparing goals in round 1
.......
                    ===== For target JvmBinary(BuildFileAddress(src/java/com/twitter/common/application/BUILD, test-application)):
                   Duplicate classes and/or resources detected in artifacts: (u'com.twitter.common-application-0.0.72.jar', u'src.java.com.twitter.common.application.test-application.jar')
                        com/twitter/common/application/Application.class
               SUCCESS
  • 0
  • 0
  • 3
  • 0
  • 3
Description From Last Updated
JS
  1. Please add the pants-reviews group.
  2. 
      
IT
JS
  1. 
      
  2. src/python/pants/base/payload.py (Diff revision 1)
     
     
    Please revert the extra space between def and __init__
  3. This seems well worth a knob.  We're trying to kill knobs, but if this is the default value for a config settable option, that seems just fine to me.
    1. makes sense - changed to option
  4. os.path.split(qualified_file_name)[1] -> os.path.basename(qualified_file_name)
    .split('.')[0] -> os.path.splitext(...)[0] # behavior is different, but I think more correct for ie: dependencies.the.dotted.filename.txt
  5. Internal deps will never come out correct if confs != None, only _unexcluded_dependencies will work for these as things stand.  Since you can't actually do an exclude on an internal jar, its a bit of overkill as well, although afaict no harm done.
    
    You might consider instead using the 'classes_by_target' and 'resources_by_target' product mappings.  You'd walk the binary and collect a mapping from target -> classes.  Then when you hit a dup for internal vs external you could provide the more useful to the reader target name that dups with the external jar name.  The jar names for internal jars are otherwise hidden and not directly related to things the end user types in BUILD files.
    
    An example of requesting 'classes_by_target' and then retrieving them is in JarCreate:
      request products: https://github.com/pantsbuild/pants/blob/master/src/python/pants/tasks/jar_create.py#L94
      retrieve products: https://github.com/pantsbuild/pants/blob/master/src/python/pants/tasks/jar_create.py#L154
    
    I'm fine with just doing the jar thing for internal classes if you want disagree with the product map approach or wish to circle back to that.  As it stands though the confs != None case is busted fwict and the internal jar listing logic belongs in detect_duplicates - its specialized and different from what jvm_binary_task does.
    1. I like the product map approach - in fact, I would have preferred to use it if I had found it earlier. I didn't like the obfuscation of internal dependencies by the jar names at all, a target would be much better here. Thanks for surfacing.
  6. 
      
TE
  1. 
      
  2. 1 empty line between methods.
  3. 
      
IT
JS
  1. This looks on the right track to me.  Some smaller stuff and its ready to go.
  2. src/python/pants/tasks/detect_duplicates.py (Diff revisions 1 - 2)
     
     
    s/to"/to: "/ - you may want to join the default EXCLUDED_FILES on '\n  ' as well, but IIRC OptParse may not respect this extra formatting.
  3. src/python/pants/tasks/detect_duplicates.py (Diff revisions 1 - 2)
     
     
     
    Just access self.context via super
  4. src/python/pants/tasks/detect_duplicates.py (Diff revisions 1 - 2)
     
     
    Just set the option default=EXCLUDED_FILES and make the option "append" - this way you get a non-None list with the defaults if no flags are set.
    1. s/option/action/
  5. src/python/pants/tasks/detect_duplicates.py (Diff revisions 1 - 2)
     
     
    A list comprehension with a side-effect is against the spirit, just loop and mutate.  This applies in several spots.
  6. 
      
IT
JS
  1. 1 last issue to fix before submitting.
  2. src/python/pants/tasks/detect_duplicates.py (Diff revisions 2 - 3)
     
     
    Your default value is a list, but the option type is defaulting to string.  You need to specify a format for the string in the help text - comma-sep is probably fine, and then parse the value into a list if supplied.
    1. fixed and changed the default to be the same.
  3. src/python/pants/tasks/detect_duplicates.py (Diff revisions 2 - 3)
     
     
    or no for loop and just:
      artifacts_by_file_name[file_name].update(targets)
  4. 
      
IT
Review request changed

Status: Closed (submitted)

Change Summary:

(This code is in the repo, but the file has been moved around since it was submitted)

Loading...