Fix unicode parsing of ini files.

Review Request #3595 — Created March 22, 2016 and submitted

benjyw, zundel
We encountered an annoying problem upgrading pants in our repo.
Apparently we had a unicode left-quotation mark (\u2018) inside a
line comment. The line comment happened to be in a list. This
caused the option parsing to fail when trying to call
`value.startswith('[')`, which generated a cryptic error about not
being able to decode the string with ascii. This was especially
troublesome because there was no indication of which option the
error was in (and trying to get python to print out the string
caused it to fail with the same error just trying to format the
string for inclusion in a print message).

Added a test to that fails without this change, and passes with this change.

CI went green here:

  • 0
  • 0
  • 2
  • 2
  • 4
Description From Last Updated
  1. Ship It!
  2. A non-sneaky character might make for a better example...

  2. src/python/pants/option/ (Diff revision 1)

    Why not do the ensure_text here?

    1. Because the lines above that elif value.startswith('[') or value.startswith('('): will still trigger the error.

    2. Hmm, I think you've found a bug then. I imagine these tests that assume that the value is a string should be under the clause that tests for string type.

  3. Please use """, not '''
  2. src/python/pants/option/ (Diff revision 3)

    Actually, now that I think of it, does this belong here, or in (or somewhere else earlier than this).

    I ask because this only fixes the problem for list-typed options. Couldn't we run into similar issues elsewhere? A dict, say?

    1. Hmm, possibly. I can try and see where the stack trace leads me.

    2. Dicts don't seem to have this problem. I tried making an invalidation report with

      console_tool_output_format: {
          'RUN':'UNINDENTED', # ☺

      and it worked fine.

    3. Ooh, booleans do though. This broke things, even with this patch.

      strict_deps: True # ☺
    4. Okay. So the situation is this:

      (1) Unicode characters in lists cause option parsing to fail.
      (2) Dicts don't care about unicode characters, for unknown reasons.
      (3) Putting a comment of any nature after a boolean fails, regardless of encoding.

      I can move the ensure_text into, but AFAICT no other option types benefit from it, and it actually breaks the unit test because bypasses most of the options system to construct lists and dicts.

    5. Hmm, why would changing break the unittests? Not sure I follow.

      OK, so no harm in this change (although I may follow it up with the bugfix I mentioned) at any rate.

  1. Ship It!
Review request changed

Status: Closed (submitted)

Change Summary:

In 9f312e7b7d5d8ffea3abf356cb3733962aa476f3. Thanks Eric, Stu, and Benjy!