Java

Retrieving Files Recursively in Java

admin  

Sometimes I need to read recursively all the files from a directory. I kept using DirectoryScanner class from the Apache(located in ant.jar in my case). It has the advantage that it can filter through files based on the well-known asterisk matching. Recently I had to look-up in a directory containing lots and lots of files and it proved DirectoryScanner tended to be quite slow. So I used plain java for that(using java.io.File and java.io.FilenameFilter classes), for 2 reasons. First is that the speed can be improved quite easy. The second reason is that reporting progress in command line give the impression it takes less time.

Here is the snippet. Just notice the ArrayList should be instantiated with a sufficient capacity(otherwise while adding new and new files to the ArrayList, it will be automatically reallocated to increase its capacity resulting in exponentially slower times as the number of files is increased. This is what probably it happens to Apache DirectoryScanner class):

public static void main(String[] args) 
{
    ArrayList<String> files = new ArrayList<String>(1000000);
    getRecursiveFiles(new File("d:\\temp"), files);
    
    System.out.println("Found Files: " + files.size());
}	

public static void getRecursiveFiles(File dir, ArrayList<String> files)
{
    System.out.println(files.size() + " " + dir.getAbsolutePath());
    
    String[] localFiles = dir.list(new FilenameFilter() {
          @Override
          public boolean accept(File current, String name) {
            return new File(current, name).isFile();
          }
        });		
    
    if (localFiles != null && localFiles.length > 0)
        files.addAll(Arrays.asList(localFiles));
    
    String[] directories = dir.list(new FilenameFilter() {
      @Override
      public boolean accept(File current, String name) {
        return new File(current, name).isDirectory();
      }
    });
    
    if (directories != null && directories.length > 0)
    {		
        for (String directory : directories)
        {
            getRecursiveFiles(new File(dir.getAbsolutePath() + "\\" + directory), files);
        }
    }
}

Below is the DirectoryScanner method. It looks more elegant, you can add include and exclude filters, but when you expect more than 100000 files it becomes quite slow. I couldn't find a quick solution to make it fast so for such cases I'll go to the plain Java mentioned above(If you know how to just share it in a comment).

public static void main(String[] args) 
{
    DirectoryScanner scanner = new DirectoryScanner();
    scanner.setIncludes(new String[]{"**\\*.zip"});
    scanner.setBasedir("d:\\temp");
    scanner.setCaseSensitive(false);
    scanner.scan();
    String[] files = scanner.getIncludedFiles();
    System.out.println("Found Files: " + files.length);	
}
    Java