[engine] Use scandir, and preserve symlink paths in output

Review Request #3991 — Created June 9, 2016 and submitted

stuhood
pants
3497, 3575
pants-reviews
kwlzn, nhoward_tw, wisechengyi

This patch fixes an issue where symlinked directories and files were resolved to their canonical names during execution. That meant that a glob for d.ln/2.txt (where d.ln was a symlink to a directory a) might result in a match for the path a/2.txt. Instead, users expect that the symlink is treated as if it had the same type as the underlying file.

Additionally, this review switches to using scandir to compute directory listings, which should result in 1 syscall per directory listing, rather than N+1.

  • Convert Path into a holder for a symbolic name and the Stat that underlies it.
  • Generate directory listings via scandir: fewer syscalls, and fewer FilesystemNodes.
  • Satisfy a point lookup for a literal Path using the directory listing that contains it (to allow the same underlying syscall to be reused there).
  • Preserve file ordering in EagerFilesetWithSpec, and cover with a test (resolves #3497.)

http://jenkins.pantsbuild.org/job/pantsbuild/job/pants/branch/PR-3575/5/

KW
  1. lgtm!

  2. src/python/pants/base/project_tree.py (Diff revision 1)
     
     
     

    could probably drop the list() conversion here since _filter_ignored can directly consume the generator.

  3. src/python/pants/base/project_tree.py (Diff revision 1)
     
     
     
     
     
     

    missing doc for entry_list param

  4. src/python/pants/base/project_tree.py (Diff revision 1)
     
     

    seems like this could also consume a generator vs list, e.g.:

    ...self.ignore.match_files((path for path, _ in prefixed_entry_list))...
    
    1. In both this and the other case, the risk is that if the consumer you're passing the value to changes their behaviour such that they consume you twice, it can cause an error... and in fact, _filter_ignored suffered from this bug at one point. But when it's explicit in the method signature, it's ok I suppose.

  5. src/python/pants/engine/fs.py (Diff revision 1)
     
     

    "which taking the underlying type"?

  6. src/python/pants/engine/fs.py (Diff revision 1)
     
     
     

    looks like #3169 is closed - should this TODO get cleaned up or updated?

  7. 
      
NH
  1. lgtm, modulo the comments from Kris.

  2. 
      
ST
ST
ST
Review request changed

Status: Closed (submitted)

Change Summary:

Merged as 387bb116c96863aaf2e07fff278a363e15bbfd34

Loading...