Skip to end of metadata
Go to start of metadata

Default charset

Loom will apply a default UTF-8 encoding to all your responses with content-type text/*, application/x-javascript and application/x-json. This is the official recommendation to overcome limitations of current standards, but if can be modified using Config.setDefaultCharset.

When parsing requests, the framework will apply a UTF-8 charset unless otherwise specified.

What follows are the details to be considered for any modifications of the default behavior.

Content-type and charset

The content-type and charset are sent together in the same http header:

Request:

Response:

Separate values can be set at the server side using response.setContentType() and response.setCharacterEncoding(). Just setting the charset will do nothing if the contentType has not been set.

Default charset

According to section 6 of RFC2854 the default charset for http is ISO8859-1, while the default for MIME parts (when using multipart/form-data) is US-ASCII.

To make things better, the default encoding applied by the browser is the same of the last received response. This means that it may be UTF-8 or ISO8859-1 depending if this is the first request or not. To simplify things, Loom uses UTF-8 for both requests and responses.

Request encoding

As a rule of thumb, you should avoid using international characters with GET requests, and instead use a POST request that includes an http header with the encoding being used.

When using international characters with GET requests the request.setCharacterEncoding() method should be invoked before asking for any request parameter. This can be done using CharacterEncodingFilter (included with spring) or adding a parameter with the current encoding (this is what google does).

To get the complete story, check the excellent chapter in the Jetty FAQ.

Request encoding is rarely an issue unless you are implementing an interface where locale-specific characters are introduced in form fields (such as Google Maps).

Response encoding

If you do not specify a content type, there will be no Content-Type http header in the response, and the browser will apply the default character encoding. Encodings should be specified for text/html responses, not images or other binary data.

Note that response encoding can also be set by using the <page-encoding> element in web.xml:

But this will only apply to Forward resolutions. Other resolutions (JsonResolution, StringResolution, etc) would need to set their own Content-Type headers.

The java standard

For responses that have set the Content-Type to a text value (text/html, text/plain, text/javascript...) the server will guess the encoding to apply based on the locale sent by the browser and the web server config. For example, a spanish request will trigger a ISO8859-1 response.

Note for Tomcat users

Tomcat has disabled standard servlet behavior by default. To make encoding work you should add -Dorg.apache.catalina.STRICT_SERVLET_COMPLIANCE=true to your startup script. If you don't, the server response will not include the encoding bit, even if the browser has provided an Accept-Language header.

Javascript and charset

Scripts that include non-ASCII characters should use a Content-Type header. Static javascript files are already stored with a concrete encoding, in which case you should check if your server is setting the same encoding for the response.

IE6 will apply the same encoding presented by the HTML page for any subsequent javascript file, ignoring any charset header that is presented with the file. The acceptCharset attribute can still be used on the script tag to override this.

Force the encoding from HTML

You can force the browser to interpret the server response using a specified charset by using acceptCharset on your <form> or charset for your <script> tags. This is not recommended since it is responsibility of the response to specify the encoding to use, not the page author.

The META tag

Using this tag is not a good idea, since the server may be using a different encoding to write the response. It is better to configure the encoding using only http.

Locale default encoding

The browser includes its list of preferred response encodings which the server will not use. Instead, it uses a default list of locale-to-encoding mappings to apply when writing the response. Depending on the browser-supplied locale it may decide to use a specific encoding.

To get a more accurate description of how the Locale affects the response encoding, see the Jetty Response.setLocale() javadoc

The usual suspects

The encoding can be broken at any layer of your application. You should check that the entire system is fine:

  • the encoding used to compile your JSPs, which will affect any non-externalized string which will be copied to the generated java source files
  • the encoding of your database
  • any external file stored in your filesystem: dummy SQL scripts used to load your database, properties files, etc

Java Strings are stored in memory in Unicode representation, so anything displayed fine in the debugger should be serialized properly.

To debug your encoding-related issues we recommend that you install the LiveHttpHeaders FireFox plugin or WireShark (formerly known as Ethereal) to see what's going on.


‹ Accessibility and 508 compliance
up
Logging ›

Labels:
None
Enter labels to add to this page:
Please wait 
Looking for a label? Just start typing.