UTF-8 issue: find doesn't find all your files

Public bug announcement: Beware that GNU find in findutils 4.4.2 (as shipped on Ubuntu Lucid) will not find all your files if it's run in the UTF-8 locale: even if the file is there, find may just skip printing its name. Solution: If you have non-ASCII characters in your file names, use LC_CTYPE=C find instead of find.


$ echo $LC_CTYPE
$ ls foo*                                                    
ls: cannot access foo*: No such file or directory
$ perl -e 'die if !open F, ">", "foo\x80bar"'
$ ls foo*
$ find -type f
$ find -name 'foo*'
$ LC_CTYPE=C find -name 'foo*'                               

Possible explanation: The file name matcher won't match a file if its name cannot be parsed properly in the current locale (LC_CTYPE). That is, since foo\x80bar is not valid UTF-8, GNU find 4.4.2 will not find it.

This strange behavior can be very surprising and possibly dangerous, especially in automated shell scripts.

No comments: