Improve error handling for nailgun client connection attempts
Review Request #2869 - Created Sept. 23, 2015 and submitted
|benjyw, jsirois, mateor, nhoward_tw, stuhood|
- Use 127.0.0.1 vs localhost for nailgun default host. this prevents a situation where a broken /etc/hosts can affect pants ability to connect to the nailgun (currently raises an unhandled and generic socket.gaierror).
- Add a specialized NailgunClient.NailgunConnectionError exception type.
- Convert NailgunClient.try_connect() from socket.connect_ex() to socket.connect() with added catching of socket.gaierror which is not covered by socket.connect_ex().
- Make NailgunClient.try_connect() raise a helpful NailgunConnectionError exception on failure vs returning None.
- Propagate the helpful NailgunConnectionError in both NailgunClient.call() and NailgunExecutor.ensure_connectable().
- Improve debug logging.
https://travis-ci.org/pantsbuild/pants/builds/81720419 + manual testing. Before/After for a broken /etc/hosts: Exception message: [Errno 8] nodename nor servname provided, or not known vs Exception message: Problem connecting to nailgun server at localhost:55261: gaierror(8, 'nodename nor servname provided, or not known') Before (with resident ng)/Before (without resident ng)/After for an unconnectable nailgun: Exception message: Failed to connect to ng server. vs Exception message: 'NoneType' object has no attribute 'close' vs Exception message: Problem connecting to nailgun server at 127.0.0.1:9999: error(61, 'Connection refused')
Address John's comment + fix a quick off-by-one error in retry counting in NailgunClient.ensure_connectable() that was incurring an additional retry.
Revision 2 (+41 -23)
LGTM, but I'm starting to feel like it would be good to have tests for the NailgunExecutor. This patch appears to also fix a bug where it would make max+1 connection attempts rather than just max, which points to that code not being properly exercised in tests.
Also, should ensure_connectable / try_connect use the connect_timeout?