Home > html, java > HTML / JSP / Servlets / JavaMail / Oracle / CSV / Excel: UTF-8 to Unicode them all

HTML / JSP / Servlets / JavaMail / Oracle / CSV / Excel: UTF-8 to Unicode them all

Recently I wrote a web app that

  • Lets user enter a greeting message with subject and body
  • Sends an HTML email (“ecard”) to recipients
  • Stores info about sent messages in Oracle
  • Reports on recently sent messages on an admin page (HTML table)
  • Provides the report as downloadable CSV files (often opened in M$ Excel)
  • Provides an RSS feed about recently sent messages

One goal was to allow any Unicode characters for subject and body text and make sure that web form, servlets, JSP pages, emails, database records and CSV files all support that (no garbled characters anywhere, no data loss through charset conversions).

So here is what I did:

JSP and HTML pages

At the top of the JSP pages:

<!DOCTYPE html>

<%@ page contentType="text/html;charset=UTF-8" %>

In every HTML and JSP page, within the <head> section:

    <meta charset="UTF-8"/>

Servlet filter

In WEB-INF/web.xml:

    <filter>
        <filter-name>UTF8Filter</filter-name>
        <filter-class>net.doepner.servlet.Utf8Filter</filter-class>
    </filter>
    <filter-mapping>
        <filter-name>UTF8Filter</filter-name>
        <url-pattern>/*</url-pattern>
    </filter-mapping>

In net/doepner/servlet/Utf8Filter.java:

package net.doepner.servlet;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import java.io.IOException;

/**
 * Makes sure that we use UTF-8 for all requests and response
 */
public class Utf8Filter implements Filter {

    @Override
    public void init(FilterConfig fc) throws ServletException {
        // nothing to do
    }

    @Override
    public final void doFilter(ServletRequest request,
                               ServletResponse response,
                               FilterChain chain)
            throws IOException, ServletException {
        request.setCharacterEncoding("UTF-8");
        response.setCharacterEncoding("UTF-8");
        chain.doFilter(request, response);
    }

    @Override
    public void destroy() {
        // nothing to do
    }
}

Sending Email

In the code that sends the email (using javax.mail API):

        final MimeBodyPart htmlPart = new MimeBodyPart();
        htmlPart.setContent(template.getHtml(msg), "text/html;charset=utf-8");

        final Multipart multiPart = new MimeMultipart("alternative");
        multiPart.addBodyPart(htmlPart);

        final MimeMessage email =
                new MimeMessage(Session.getDefaultInstance(properties));

        // setting the sender and recipient is omitted here for brevity

        email.setSubject(msg.getSubject(), "UTF-8");
        email.setContent(multiPart);

        Transport.send(email);

Oracle database

For Unicode support in Oracle, make sure that

  1. Use NLS_CHARACTERSET = AL32UTF8 and regular VARCHAR2 columns
  2. Or use NVARCHAR2 column types.

I used approach A. I haven’t actually tried approach B myself.

Here is a useful query to see current charset settings:

SELECT * FROM nls_database_parameters nls 
         WHERE nls.parameter LIKE '%CHAR%SET%';

CSV generation

See my earlier blog post about CSV generation in a Servlet using my CsvWriter utility class.

The important bits are:

private static final char BYTE_ORDER_MARK = (char) 0xfeff;

Put that byte sequence (the so-called “BOM“) at the very beginning of the response content. Some applications (like M$ Excel) will otherwise not detect the UTF-8 encoding correctly.

Do this on the writer object from the getWriter() method on the servlet response:

// The BOM is required so that Excel will recognize UTF-8
// characters properly, i.e. all non-ASCII letters, etc.
writer.print(BYTE_ORDER_MARK);

RSS feed

I generate the RSS feed with an JSP page. Just make sure you have this on the top of the page:

<?xml version="1.0" encoding="UTF-8"?>
<%@ page contentType="text/xml;charset=UTF-8" %>
Advertisements
Categories: html, java Tags: , , , , , , ,
  1. March 28, 2013 at 22:36
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: