lgtm! one quick thought on simplifying the de-duping implementation.
any reason to not just use
t.c.collections.OrderedSethere instead of the for loop?
>>> from twitter.common.collections import OrderedSet >>> l = ['p', 'a', 'n', 't', 's', 'b', 'u', 'i', 'l', 'd', 'p', 'a', 'n', 't', 's'] >>> tuple(OrderedSet(l)) ('p', 'a', 'n', 't', 's', 'b', 'u', 'i', 'l', 'd')
Remove Duplicates in File System tasks in v2 Engine
Review Request #4096 — Created July 19, 2016 and submitted
Currently when performing file system tasks in v2 Engine (like glob matching), we keep duplicate paths.
This can cause errors sometimes. For example, if 2 glob patterns defined in "sources" field of a target both match a common scala file, this file will be given twice during compilation, thus triggering errors like "X is already defined as class X".
In fact, we do not have any use cases where keeping duplicate is a requirement, however we do have use cases like the above where removing duplicates can save us a lot of trouble. In addition, v1 engine does not have duplicate paths, which will make transition from v1 to v2 engine smoother.
In this review:
1. Enhanced logic in merge_paths (fs.py) to remove duplicates while preserving the order of paths.
2. Added a test case.
+1 on using OrderedSet (more for clearer intent)
thanks Yujie! this is in master @ 454dc92602b789dd7819388bb9c844f11b28408c
please mark this RB as submitted when you get a chance.