Home > java, maven > Transparently improve Java 7 mime-type recognition with Apache Tika

Transparently improve Java 7 mime-type recognition with Apache Tika

Java 7 comes with the method java.nio.file.Files#probeContentType(path) to determine the content type of a file at the given path. It returns a mime type identifier. The implementation actually looks at the file content and inspects so-called “magic” byte sequences, which is more reliable than just trusting filename extensions.

However, the default implementation included in Java 7 seems to be platform dependent and not very complete. For example, for me it did not even recognize an mp3 file as audio/mpeg. Fortunately, the Open Source library Apache Tika provides more comprehensive mime type detection and seems to be platform independent.

As shown below, you can register a simple Tika based FileTypeDetector implementation with the Java Service Provider Interface (SPI) to transparently enhance the behaviour of java.nio.file.Files#probeContentType(path). As soon as the resulting jar is in your classpath, the SPI mechanism wil pick up our implementation class and Files.probeContentType(..) will automatically use it behind the scenes.

Maven dependency

        <dependency>
            <groupId>org.apache.tika</groupId>
            <artifactId>tika-core</artifactId>
            <version>1.4</version>
        </dependency>

FileTypeDetector.java

package net.doepner.file;

import java.io.IOException;
import java.nio.file.Path;

import org.apache.tika.Tika;

/**
 * Detects the mime type of files (ideally based on marker in file content)
 */
public class FileTypeDetector extends java.nio.file.spi.FileTypeDetector {

    private final Tika tika = new Tika();

    @Override
    public String probeContentType(Path path) throws IOException {
        return tika.detect(path.toFile());
    }
}

Service Provider registration

To register the implementation with the Java Service Provider Interface (SPI), you need to have a plaintext file /META-INF/services/java.nio.file.spi.FileTypeDetector in the same jar that contains the class net.doepner.file.FileTypeDetector. The text file contains just one line with the fully qualified name of the implementing class:

net.doepner.file.FileTypeDetector

With Maven, you simply create the file src/main/resources/META-INF/services/java.nio.file.spi.FileTypeDetector containing the line shown above.

See the ServiceLoader documentation for details about Java SPI.

Advertisements
Categories: java, maven
  1. flexic
    September 9, 2013 at 16:41

    Nice post, was looking for a solution as probeContentType doesn’t appear to be implemented on OS X yet. Any idea how you would register the service provider in Play! Framework which uses sbt, not Maven?

    • September 10, 2013 at 09:45

      Have you seen the bug report and the comment about ~/.mime-types?
      http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=8008345

      To register the implementation with the Java Service Provider Interface (SPI), you need to have a plaintext file /META-INF/services/java.nio.file.spi.FileTypeDetector in the same jar that contains the class net.doepner.file.FileTypeDetector. The text file contains just one line with the fully qualified name of the implementing class.

      To use Tika without Maven you just have to make sure that tika-core and all its dependencies are in the classpath.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: