Joi Ito's conversation with the living web.
Olivier on the left and Karl on the right
Had lunch with Karl Dubost, the Conformance Manager of the W3C and Olivier Thereaux who also works for the W3C. I met Olivier once when he dropped by Moda when I was spinning records. Karl was visiting Japan and working with Olivier. Karl's well known to many of my friends and it was cool meeting him after having heard about him from so many people.

Karl's job at the W3C includes making sure that the standards and the processes "conform" and are well-formed. He's kind of the standards guy for the standards guys.

We talked about RSS, open API's and the balance between simple standards with low barriers to entry and strict and consensus based standards which have a higher "cost" associated with them. Karl talked about how they were trying to make some of the processes at the W3C simpler. I think we all agreed that it really depended on the stage and the type of standard when deciding what sort of process was best for standardization.

We talked about the difficulty of getting developers in English speaking countries to think about internationalization issues. We agreed that we needed to keep pushing people to use UTF-8. I think we got over some of the initial negative reaction to UTF-8 in Japan and how we needed the developers in the US to start using UTF-8 so it will make our (I personally didn't do much) efforts worth it. Both Karl and I have our blogs set to UTF-8.


Bonjour Karl et Olivier! Comment t'allez-vous? I got to meet these guys @ WWW2002 in Honolulu. It was a birds of a feather session for Webmasters .. I think?
I bumped into Olivier @ Moda also. We kept staring at each other and thinking "I've seen this bloke before".
Such a small world. And so linked.

On encoding, how big is UTF-8 in Japan? For web applications it seems that S-JIS or EUC is still the majority.

After our discussion about UTF-8 I actually checked my own blog and noticed that MT had set it to latin-1 by default.

And thus beginned my long struggle to switch everything to UTF-8 while remaining valid (I was in fact all the more motivated as the latin-1 encoded entry titles made kung-log, which I started using recently, complain).

So I did what the doc said, changed a line in mt.cfg to read PublishCharset UTF-8, and changed my apache config to serve that part of the site as UTF-8. So far so good.

Then, since I couldn't find a way to just iconv (that's a tool to convert things from one charset to another, very nifty...) the whole content, I had to remove all the accents (fortunately for me, not having a french keyboard, I'm mostly using entities for accents so that wasn't too painful) from my entries. Kung-log was now happily retrieving past entries, and the HTML validator told me I was a good boy. So far so good.

Then I tried putting the accents back with the MT interface but it would not work as expected (I really don't know why, either it's my browser POSTing as latin-1 even though the MT form is properly set as UTF-8 or it's a bug in MT). So I tried adding the accents with Kung-log, and it works, except that the MT interface (but not the actual site) shows them as "bizarre" entities (and that, I think, is a bug).

Anyway, for all of you having a blog, here is the 100 points question: your blog is (for many of you at least) in english and uses characters in the us-ascii range... So why use latin-1? On the other hand, if you serve it as UTF-8, at least you'll be future proof (think of all these nifty funky characters you can insert in your blog and then brag about).

And for most people not using accents, switching from MT's default (too bad the default is not UTF8 actually) to UTF-8 should be absolutely painless. A nice and cheap way to tell the *World Wide* Web you love it :).

Cheers (and hi! Chris).


That Karl is everywhere! Soon we'll have to publish books in the style of "Where is Waldo" (en français: où est Charlie?). And just like Waldo, he wears a lot of shirts with stripes ;-)

(A small world getting smaller...:)

So on the UTF-8 front, I'm convinced by the above, and am reconfiguring the FOAF weblog at to be in UTF-8. Is there a simple/easy guide somewhere for the Apache directives needed on the server?


Ah cool, seems to be just what I need. I think the site should be ok now...

