A stats db.

Review Request #2797 — Created Sept. 11, 2015 and submitted

fkorotkov, jsirois
Stores timing stats in a structured way, instead of as json.
Future changes will do interesting visualizations and analytics
on this data.

Tightened up some uses of RunInfo, as previously we were inadvertently
overwriting the id with None. Also ensures that ids are unique, by
appending a uuid to them (but we still prefix them with a timestamp
so that they sort by time).

CI passes: https://travis-ci.org/pantsbuild/pants/builds/79771073.
CI still passes after addressing code review comments: https://travis-ci.org/pantsbuild/pants/builds/79883116.

  1. All small stuff noted.
  2. src/python/pants/stats/statsdb.py (Diff revision 1)
    I'm not sure if this warrants artisinal float safe round-up a ms for >= .5 remainder, but I think that's what its for... it was fun to noodle.
    1. That is indeed what it's for. I like to be precise... I moved this method into StatsDBImpl though, and made it _private, as it's not used anywhere else.
  3. src/python/pants/stats/statsdb.py (Diff revision 1)

    If this is only intended to be used via global_instance you might make this a classmethod that internaizes that bit so a user can just say StatsDB.get_db().

    1. Hmmm, I'm of two minds on this. Do we do something similar elsewhere? Part of me thinks that subsystem instance choice should be explicit. StatsDB.global_instance().get_db() acknowledges that we're getting the same object back each time. StatsDB.get_db() isn't quite as obvious about this. For example, the former is more likely to make the caller think about thread-safety (not an issue in this case, but as a rule).

    2. Here's an example - I did not want anyone to be able to say - reasonably - I need special binaries: https://github.com/pantsbuild/pants/blob/master/src/python/pants/binaries/binary_util.py#L47-L68
      We already support multiple URLs and multiple versions per binary.
  4. src/python/pants/stats/statsdb.py (Diff revision 1)
    The naming is a bit odd since StatsDBImpl doesn't impl a StatsDB interface, which is what you'd expect.  I know java and factry and butt of jokes, but StatsDB really is a StatsDBFactory and StatsDBImpl really is a StatsDB from what I can tell.
    1. You are correct. So renamed.

  5. src/python/pants/stats/statsdb.py (Diff revision 1)
    Can connection and cursor be _hidden?  I'm fine with no pydoc if they are and it seems like StatsDBImpl is meant to be used only by the insert and get_XXX APIs.
    1. Good idea. Done.

  6. src/python/pants/stats/statsdb.py (Diff revision 1)
    Can StatsDB call this before handing back the StatsDBImpl so that any operation order on the returned StatsDBImpl just works?
    1. Good idea. Done.

  7. assertEqual is smart enough to call assertListEqual behind the scenes for you.
    1. Changed. So why does assertListEqual exist?

    2. I have no idea - its odd.  I've seen a few folks using assertListEqual and not said anything, but this time I dug a bit to see what was up.
  8. Sorting both lists would be more robust to weirdness, here and below.
  1. Ship It!
Review request changed

Status: Closed (submitted)

Change Summary:

Submitted as 8000f328aef82d0babb2f85d8b06ee77f1ba42f9.

  1. Thanks John! Submitted as 8000f328aef82d0babb2f85d8b06ee77f1ba42f9.