Issue with Japanese trackback

I have been testing TrackBack module and noticed that Japanese trackback was not shown correctly. It appeared that the encoding was not properly recognized.

In my case, the problem was multifold. First off, the source blog that is pinging the trackback is not sending the encoding. TrackBack module uses the encoding from the source, and without encoding, it relies on its internal algorithm to determine the encoding.

Now, in the algorithm, the second issue was a bug in Trackback. It checks the locale of the site, but the locale is not set (it is empty!) Without the locale, it falls to English (actually ISO-8859-1, which is standard Latin). Thus, the trackback request from Japanese blog was never correctly encoded.

Then, I modified the code to specify the locale to Japanese, yet, the trackback was not encoded correctly. TrackBack specified three Japanese encoding, ISO-2022-JP, EUC-JP, SJIS as the possible encoding sets for Japanese. Those three encoding sets are popular and should work. The php's mb_detect_encoding() function is supposed to detect which encoding is used for a given string. For unknown reason, it always returned SJIS, while the given trackback string was ISO-20220-JP. The function can't detect it correctly.

The solution to this was to specify 'auto' encoding. I don't know why, but instead of specifying three Japanese encoding sets, simply asking mb_detect_encoding() function to just detect seems working.

Now, with those modifications, a trackback from Japanese blogs is correctly shown on the right sidebar, though you might not see Japanese but just some squares if you don't have Japanese fonts installed.

References:
http://cl.pocari.org/2005-07-10-1.html (Japanese)
http://labs.gmo-media.jp/archive/21 (Japanese)
http://je-pu-pu.jp/blog/archives/2005/02/mb_detect_encod.html (Japanese)

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <code> <cite> <ul> <ol> <li> <dl> <dt> <dd> <p>
  • Lines and paragraphs break automatically.
  • Link to Amazon products with: [amazon product_id inline|full|thumbnail]. Example: [amazon 1590597559 thumbnail]
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.
  • Images can be added to this post.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Copy the characters (respecting upper/lower case) from the image.