Allow failover for remote cache

Review Request #3374 - Created Jan. 26, 2016 and submitted

Peiyu Wang
nhoward_tw, patricklaw, stuhood, zundel

Today pants can have multiple remote cache hosts either through config or
through the resolve process, within each task pants will first decide a
best url and stick with the same url throughout the task run.

The problem is if this url dies there is no failover even though other urls are
still be available. The cache instances dying turns out to be not that
uncommon, most recent incident happened in Twitter was due to a memory leak
that was made worse by bloated artifacts so hosts died a few times a day. When
this happens we see pants default to local compile that takes forever.

This review does not attempt to implement load balancing, it does not even try
to re-ping (because running pinger is expensive) it only tries to address the
very specific problem: individual host dying during the single pants run and
allow pants to fail over to 2nd or 3rd instance.

It does not address if all urls die but as long as there is one instance alive
now pants can eventually fail over.


  • 0
  • 10
  • 0
  • 10
Description From Last Updated
Yi Cheng
Eric Ayers
Benjy Weinberger
Nick Howard (Twitter)
Peiyu Wang
Benjy Weinberger
Nick Howard (Twitter)
Benjy Weinberger
Patrick Lawson
Peiyu Wang
Review request changed

Status: Closed (submitted)

Change Summary:

Commited as d6c25992a62bcd7d8d324ac54569a8110e507331