Add java_thrift_library fingerprint strategy

Review Request #2265 — Created May 24, 2015 and submitted

chrischen
pants
5ec1b23...
pants-reviews
areitz, benjyw, fkorotkov, ity, jinfeng, jsirois, patricklaw, stuhood, tejal, zundel

This rb adds fingerprint strategy for java_thrift Library. It fixes the pants cache bug - java thrift library does not include language, compiler and rpc style in its hash key generation. As a result, if any of those values changes, pants will throw an error due to cache.

For example, if a java_thrift_library target was compiled, but then changed its rpc style from sync to fingle, and compile again. Since hash didn't change, pants doesn't invalidate the target. Pants would eventually return an error because it couldn't find the file at the cached location.
To provide a more specific error, it would try to find a file like
.pants.d/gen/scrooge/java-finagle/gen-file-map-by-target/src.thrift.com.twitter.bouncer.bounce-action-thrift-java
but the actual file is .pants.d/gen/scrooge/java-sync/gen-file-map-by-target/src.thrift.com.twitter.bouncer.bounce-action-thrift-java

See the relevent discussion here https://groups.google.com/forum/#!topic/pants-devel/9I2oiTNmOhg

Please discard RB #2258.

https://github.com/pantsbuild/pants/pull/1590
./pants test contrib/scrooge/tests/python/pants_test/contrib/scrooge/tasks
and manually verified invalidation by changing rpc_style and verified pants cache worked correctly when pants target didn't change.

  • 0
  • 0
  • 1
  • 0
  • 1
Description From Last Updated
CH
TE
  1. LGTM.

    Please open this to pants-review.

  2. 
      
CH
CH
CH
  1. Hoping to get this rb in today. Your help would be very much apperciated.

    1. Please sign-up for pants-reviews [1] with czqlfg@yahoo.com - since you're not the RB emails blackholed
      
      [1] https://groups.google.com/forum/#!members/pants-reviews
    2. thanks John! I see that myself in https://groups.google.com/forum/#!members/pants-reviews now.

    3. email doesn't seem to be working even after I joined the group. Try a different email which is also a member.

    4. I believe it does work.  The 1st post is moderated, after we approve for a given address that address can post going forward.  You should be fine now.
  2. 
      
ZU
  1. 
      
  2. But wait... if these were payload fields you wouldn't have to make a custom fingerprint strategy at all. Maybe that would be a simpler (better?) approach?

    1. =Eric
    2. I thought of the same initially, before I relized that the thrift options can change after init. See https://github.com/pantsbuild/pants/blob/9bd9f7fd6942526bcb69200376ff35426e61c6a3/src/python/pants/backend/codegen/targets/java_thrift_library.py#L61

      One potential option was to bring in the options.for_global_scope() to the constructor, but fingerprint strategy seemed to be better. Again, I am new to pants code base, so please feel free to educate me:)

    3. Fixed my membership so that the email is sent when ticket is updated. Please see my comment above. Thanks!

  3. 
      
ST
  1. This seems expedient, but it's not awesome. Adding those parameters to the payload in the constructor would be best, but AFAIK, the options aren't available to target constructors at the moment.

    Please document (in both places) why we're using a figerprintstrategy rather than using the payload directly. A cleanup of task/target payloads seems inevitable, and leaving behind a clear explanation for the choice would be helpful during that process.

    1. One bit of not awesome that can be fixed right now is to simplify the JavaThriftLibrary accessors and drop the options parameter for each of compiler|language|rpc_style.  The only caller(s) will be tasks that have options and no how they want to default things.
      Towards that end, I'd even argue fingerprint strategy is the correct long term home.  If a task does not care about these options - imagine a *.thrift file linter! - it should not suffer them being in the default target payload.
    2. Basically this is the same problem as objects with a built-in equals.  The meaning of equals is often context dependent, and this is certainly the case for many tasks given the same target.
    3. I agree. Here is what I will do:
      1) Add comments on why FingerPrintStrategy
      2) Simplify JavaThriftLibrary accessors by removing the options and moving the default value logic into ScroogeGen

    4. If a task does not care about these options - imagine a *.thrift file linter! - it should not suffer them being in the default target payload.

      Agreed. One refactoring I could imagine then is that FingerPrintStrategies could instead be "filters" of payload fields that a Task cares about. Then by default ALL inputs to a Target would be included in its default identity, but Tasks could select a subset of filters. The default of "opt-in" to payload fields is fragile and subject to this kind of bug.

    5. I think its only fragile because we support a default fingerprint at all.  If all tasks were forced to supply a fingerprint strategy to get access to an InvalidationCheck then the fragile goes away.  You must supply a fingerprint startegy of exactly the fields you use.
    6. I disagree with John's assertion above. If you made each task provide an explicit fingerprint strategy it would be fragile in a different way, in the sense that it is easy to omit important fields when modifying the class. It would be persistent 'gotcha' in modifying Pants code and if missed would create a difficult to detect problem with cache invalidation.

    7. Omit important field when modifying which class?  If TaskA operates on TargetOne and only cares about / uses its foo and bar fields and then someone adds eight more new fields to TargetOne - TaskA doesn;t care and its implementation is still correct since it doesn't use the other fields.
    8. But - to your point - it would be wonderful if target fields were no accessible at all.  If the invalidation framework was all that had access to fields, the contract would be - hand me your strategy and I hand you back a function that can access just the fields that are tracked by the strategy.  This would force correct invalidation semantics.  A task couldn't reach out behind the invalidator and query an untracked field at all.
  2. 
      
CH
JS
  1. 
      
  2. Just make all these these static methods instance methods since they all need self.context.options - no need to pass it in.

    Really these 3 options should be housed in a Subsystem and then ScroogeGen and ApacheThriftGen could use the subsystem to query these properties, leaving the option export and access encapsulated in the Subsystem.

    An example of a Subsystem is here:

    To use it in tasks, you'd just:

      @classmethod
      def global_subsystems(cls):
        return (ThriftLibrarySettings,)
    
      ...
      thrift_library_settings = ThriftLibrarySettings.global_instance()
      language = thrift_library_settings.get_language(target)
      ...
    

    If you're up for that switch - its where things should really be. If not, please add a TODO.

    1. Oops - the example subsystem link for real: https://github.com/pantsbuild/pants/blob/master/src/python/pants/backend/jvm/subsystems/scala_platform.py#L13-13
      Its just an object with a register_options - you'd then encapsulate access to those options and expose properties and methods the public could use.

      Also - you'd want to add migration entries for the 3 option moves since they'd now be in the namespace of the new Subsystem. You do that by adding 3 lines here: https://github.com/pantsbuild/pants/blob/master/migrations/options/src/python/migrate_config.py
      They'd look like this if you chose 'thrift' as your Subsystem's scope_qualifier:

      ('DEFAULT', 'thrift_default_compiler'): ('thrift', 'default_compiler'),
      
  3. 
      
ST
  1. Please include John's TODO about moving the options to a subsystem.

  2. Can you break this up into multiple named conditions? The wrapping hides the meaning a bit.

  3. 
      
CH
JS
  1. Ship It!
  2. 
      
JS
  1. In master @ https://github.com/pantsbuild/pants/commit/e790c8403322b8155e54de72f6e62ad57d78ab01
    
    Please mark this review submitted.
  2. 
      
JS
  1. 
      
  2. My bad for not demanding a green CI.
    
    BUG: self is not needed, but since its a parameter it needs to be passed below on line 131.
    
    I'm following up with a fix.
    1. Sorry, we need submitqueue:)
      Since I didn't see any fix, I have put a fix in CI https://github.com/pantsbuild/pants/pull/1606

    2. https://rbcommons.com/s/twitter/r/2288/
  3. 
      
CH
Review request changed

Status: Closed (submitted)

Loading...