-
-
src/python/pants/java/nailgun_client.py (Diff revision 1) Could this go deeper? into
def iter_chunks
perhaps?
Fix unicode string on stdout causing taskerror
Review Request #3944 — Created May 28, 2016 and updated
Information | |
---|---|
ity | |
pants | |
3517, 3532 | |
Reviewers | |
pants-reviews | |
jsirois, kwlzn, stuhood |
- No encoding was set on payload which caused the encoding error detailed in the ticket
- Encode payload string with utf-8 before it gets written to stdout
- Added a new test which fails without this change
- Ran ci.sh locally
-
-
tests/python/pants_test/java/test_nailgun_client.py (Diff revision 1) this strikes me as odd - IIUC you redirect
sys.stdout
to aStringIO
thenprint()
(implicitly to theStringIO
referenced bysys.stdout
) and then retrieve the value?but as near as I can tell, the whole point here is just to test the default encoding type of
print()
?if so, could you not achieve the same thing with:
out_s = StringIO() print(self.content, file=out_s) self.stdout_content = out_s.getvalue().strip()
or even simpler by mocking the behavior of
print()
sans files:self.stdout_content = self.content.encode(sys.stdout.encoding)
(but this may all be moot after the comment below)
-
tests/python/pants_test/java/test_nailgun_client.py (Diff revision 1) seems to me like the scope of these tests should probably continue to test against the raw byte stream as written by
write_chunk()
(e.g. the wire format) vs stashing and comparing against a unicode conversion this early?then, as Stu said - you could move the chunk payload encoding to 'utf-8' deeper into
NailgunProtocol.iter_chunks
and then test the unicode reads using that directly - e.g. in the context the client would read and encode from the byte stream toutf-8
on the read/client-side of the protocol.
-
-
tests/python/pants_test/java/test_nailgun_client.py (Diff revision 1) happy to move the testing out to different call stack in the code but not sure I clearly understand this.
write_chunk() and iter/read_chunk() as I see are doing the right thing. write_chunk() encodes unicode to bytes before writing it out and iter/read_chunk() decodes bytes to unicode after reading.
it makes sense to encode in bytes before stdout(..) since stdout of a unicode string causes an implicit encoding - output is always bytes.
so maybe I am missing something here -- why would we encode again in iter/read(..)?
-
-
src/python/pants/java/nailgun_protocol.py (Diff revision 2) I'd be a fan of explicitly passing
return_bytes=False
here with an explanation; ie: the payload, which becomesworking_dir
orcommand
can contain unicode chars as parts of path names or command argument values.
That said, there is 1 caller, and that caller doesn't use these tupe slots at all:$ git grep -n "\.parse_request" -- src/ src/python/pants/pantsd/pailgun_server.py:62: _, _, arguments, environment = NailgunProtocol.parse_request(self.request)
So can the new
return_bytes
parameter just be dropped wholesale and the bytes never decoded?