Joi Ito's Web

Joi Ito's conversation with the living web.

IDNs (International Domain Names) have been the subject of a great deal of discussion. IDNs are a way to allow non-ASCII scripts to be used in URLs. There are a number of difficulties with IDNs. One is that there are letters or punctuation that look similar to normal ASCII characters or punctuation. This allows people to spoof other URLs and use it to fool users and steal their banking information for instance. The other criticism is whether people really need them. The argument (which until recently I agreed with) is that everyone in the world reads ascii and can't people at least type the URLs in ASCII.

Fellow board member Hualin Qian said that the Chinese were using IDNs using a browser plugin and that since most Chinese read only Chinese web pages, it seemed to be doing quite well. I would have to concur. I think one thing that we forget is that the type of people who come to ICANN meetings and argue about this stuff tend to speak multiple languages, care about what is going on in other languages, and are trying to get everything perfect. We are not the norm. I remember when we set up Infoseek Japan, we decided to index only Japanese pages. I argued that we should index English pages, but I was overruled by the people who said most Japanese don't read English web pages.

Many of the problems of IDNs come from trying to do multiple languages at the same time or languages one can't read. The biggest difficulty is implementing them in gTLDs like .com or .org. I think that if we focus on helping the country level TLDs (ccTLDs) get going with IDNs in their own native languages, we would be solving the problem for 80% or so of the people. My concern is holding up the ability for these people to use IDNs because we can find the perfect solution for the edge cases.

This is a philosophically opposed to my "Global Voices" position which focuses on building bridges between cultures and languages, but I believe that the benefit for the digital divide to get something running soon is worth it. Also, once we have a lot of people using IDNs in different regions, I'm sure we can use this experience to come up with more creative ways to solve the more difficult IDN problems.

Again, this is my personal opinion and not any sort of consensus of staff or the board of ICANN. I am mainly pointing this out because until this meeting, my position (privately) was "why the hell do we need IDNs?" On the other hand, I think we are moving forward and the discussions during this meeting in MdP were very helpful.


Really interesting post!!

Recently, I gave out some business cards with my name joã printed on them. My friend Naruna (who is Brazilian) brought this to my attention, 'You are going to confuse people, they can't type the ã on their keyboards!'

Of course, she was refering to folks around here who don't have Portuguese characters installed on their keyboards.

However, this reminded me of when I post from Flickr to Blogger and many of the characters get funky. Or when I send out an e-mail with my name João or receive an email from my dad in Brazil on my cellphone, and it looks like garbage..

Let's see what develops, but I think you are right about the digital divide and to just get something up instead of fussing over the details.

Joa~bambu - I think you illustrate a huge problem possibly without even realizing it.

For instance - even though I have Chinese, Japanese, and Korean input methods installed, I do not have a keyboard layout installed for Portugese, and even if I did, I'd have no idea how to find the tilde-n character on the keyboard.

The result: I would be extremely unlikely to go to your website if I got your card, unless I thought you were really really important for some reason.

In other words, I'm not going to google your name and go searching for a link I can click on just because I can't type in your URL. This would be doubly hard for someone who doesn't understand Chinese but is given a Chinese URL.

What's the result? URLs become useless to speakers of languages with significantly different orthographies. I think this is a big problem. It significantly impacts the ability of people to communicate their URLs to others via business cards, etc. The non-speaker must find a link to the site or they're out of luck (or have to go googling)...

Do we really want it to be so hard? Imagine that was instead "新华网.com", or even "新华网.公司"... Instead of being able to type it in and see the "English" button in the upper left to go to the English version of the site, now the English speaker is completely unable to go directly to the site. On the other hand, almost all languages have been romanized at some point or another, and almost all countries are somewhat familiar with roman letters...

I think moving to such domains will create a big barrier between linguistic subsections of the net... Even moreso than would naturally exist. Bad idea, in my opinion.

Well, it's interesting. The reality is that there isn't really 'one world'. In reality, we have about 5 or 6 spheres, separated by language. Broadly, you could say there is:

The English-Speaking Sphere.

The Spanish and Portuguese Speaking Sphere.

The Western European Sphere.

The Russian-speaking Sphere.

The Chinese-speaking Sphere.


(Of course there are loads more, but these are the most economically significant ones that come to mind for me.)

The reality is that for the average joe on the street, he or she isn't really likely to come into contact with stuff outside his or her own sphere. How many Chinese people are really going to go looking at English-language websites? Some, but not very many. How many English or American people will look at Chinese-language websites? Close enough to none.

Of course, widespread use of IDN's will make this phenomenon much more ingrained. I don't know if there is much we can do about it though. Internet governance should have ideals, but it also has to be tuned into reality.

The argument made by Trevor wasn't a new one - we have a lot when we were doing IDN in IETF back in 2000. The bottomline is IDN Is not about bridging across culture - it is about helping each culture to bridge their own digital divide. Antoin is absolutely right about native user language behavior - how many of you follow Chinese/Japanese/Korean blogs here?

With arguments with Trevor made, then we shouldn't invent or use Unicode at all because even if you can come to my website, you can't read what it is on there. Taken to the extreme - it means everyone should use only one language and that implicitly implies English. That anglo-saxon centric argument isn't going to sink in well in a world where the majority of the people actually speaks Chinese natively.

"but..but you are building a tower of babylon!" said the other side "you are fragementing the Internet". But they failed to understand while the old Internet may comes from English-only speaking community, the reality of the world is that it is already fragemented by languages and that's going to be reflected on the Internet.

So after a while, I give up arguing with these folks because it is hard to get either side to change position - my shortest answer is "yes and hence you are not the target user of IDN"

Anyway, Joi, glad to know you change your position :-)

I really don't know if this is an issue or not. If a person wants to have a web page in Chinese- s/he will do this. If I read chinese and I have the user interface k/b to go there, I'll go there to read the web page/site.

The individual will decide what is best for themselves.

The Impact is how do I link to you ? Lets say "", which on my website is in english. Do I also have a link in which says "新华网.com", or even "新华网.公司"- Will represent another layer that is saying ..hey the link your going to go too is Chinese.

...opps was that Japanese ??? Who knows eh !! I really, dont know. The inverse logic also should be used for every other 1-1 language or the 1-2-many sceanrios.

IDN will actually create compartalization of the World Live Web. Thus the COMMUNIITY will become clusters of hives. Sites, Bloggers, URL's etc etc under the IDN entity. Though totally self sufficent in all ways, yet isolated by viture of IDN itself.

What is the vision -- To create compartlization so as to impower smaller self's or Divisify the spectrum to sustain HumanityV2.O which is a single WWW sphere ?? Bigger does not always mean better. Albeit- Bigger does not mean stronger !! Remember the David and Golith story ??

I see what you're saying James, but I don't think anyone could reasonably argue that we should take things to that extreme. Your argument about English is really a reductio ad absurdum.

All I'm suggesting is keeping domains in ascii, and nothing else. They wouldn't have to be English, but would have to be in ascii...

I do see your other point though - if the goal is to promote internet use within each culture, that's fine. That's what this policy does.

But we all see, I think, that it breeds a new sort of confusion and division. It makes parts of the web effectively inaccessible to groups of people...

What if, for instance, when dialing a phone number to a foreign country, you had to dial in that country's particular numerical orthography? Chinese, Arabic, Hindi, and Thai all have their own, for example. I wouldn't be able to call people in those countries without an intermediary, and that's tantamount to what this is...

@ Trevor --

Great point you made! I was careful enough to put under the title "joãobambu" which was a smaller italicized script.

Now, I look at it from a Romance-speakers perspective of knowing several languages, being an old-school hacker/BBSer, and the ALT-codes for such symboles (HOLD DOWN ALT on the keyboard and hit 0227 = ã, etc).

However, if someone gave me a business card in Mandarin that had symbols for things other than "Bamboo" or "Beer", I wouldn't be able to put it in a URL!

You nailed the head on the head and I completely concur...

It is true that having a domain name with (say) characters in it is alienating you from the English-speaking community. That is likely to be important for someone like James, who does a lot of work internationally. But it isn't going to be that big an issue for, say a plumber in Guangdong, who only speaks English, and only really deals with Chinese-speaking customers and suppliers. For him, having an ASCII domain name is more likely to put off his customers (at least once IDN's have become established). Saying that everyone should learn and get used to English characters for the sake of standardization is not going to fly. You might as well suggest that we should standardize on Chinese characters.

Remember, that we, the global digerati are in the minority here. Most people just want to use the Internet to buy locally-delivered services from people within their own sphere.

The other thing to remember is that this is essentially a done deal. The genie is out of the bottle. IDN's are out there, and there's not much that can be done to hold them back. ICANN can counsel, advise and maybe admonish now and again, but as far as U can see, it can't really hold back second level IDN's.

Tevour, ah the "dialing" argument. We get that alot too. (I suppose nothing is new argument to me anymore ;-)

If I want you to 'dial' me, then I would have an ASCII-only domain name with appropriate English content you can read.

But if I am, so what you can 'dial' it? You are unlikely to know what it said there anyway (unless you know Chinese). So in this case, why not make it easier for millions of Chinese out there who knows as 新华网.公司.

And ASCII-only isn't an option - not every language has standard romanization. Chinese has at least 3 which I know (altho most commonly used is hanyu pinyin), Korean has 2 and even Japanese romanji isn't particularly standardization - 後藤 is it Gotoo or Gotou or Goto?

There's a lot to say for the CJK group...however I don't think ICANN really wanted to make IDN works since the Montreal meetings, which the IDN guidelines were agreed , approved, and sigedn by the registries (not including Verisign however). I believe the IDN is a TOPIC only for ICANN to stay in the discussion of globalization of Internet. Too much WSIS things going on now...

To be frank, if I were the user, I would pay for the ASCII domain name, and then the fee search engine rank. But I would love to see my local registry providing domain name in the language that I/my father/my government can read and access.

I don't think there's not much about local content issues in places like Singapore, Hong Kong, Taiwan. What's on the portals are also on today's newspaper or next month's magazine. But things are still different in the inner land of China.

IDN allows URL spoofing. So does ASCII URL. Ito-san I surely hope that you may describe it in a clearer way (I can see what Hotta-san looks like now~~)

BTW, where the hack is the ICANN at-large folks now? Again too much WSIS brainwash?

correction...."make IDN work since the Montreal meetings". should be "after the Montreal meetings". my apology.

The introduction of IPv6 will sooner or later correct the worldwide imbalance in the allocation of globally routable IP addresses. For domain names, however, an antiquated TLD-based infrastructure perdures, whereby TLDs must be assigned as de facto monopolies to single entities, with the associated wrangling over the choice of the assignees. There is thus, IMHO, a need to be much more proactive, and expend the effort to actively migrate the worldwide DNS to a more balanced structure that is unencumbered with commercial monopolies.

IDNs, for example, offer the opportunity and excuse to re-architect in a language-independent manner the current, flawed, TLD-based structure. As DeNIC's recent .net registry proposal exemplifies, there are entities across the world who are able and willing to operate a large-scale and reliable infrastructure for the provision of DNS-related IT services.

I envision a DNS structure where all registrations, including IDN ones, could take place at the root level, with the resulting (huge) master registry maintained by a non-profit organization like ICANN. This authoritative root zone would be replicated to name servers whose setup and operation would be delegated via a competitive international bidding process to, say, 4 commercial entities — e.g. Verisign, DeNIC... — with an explicit focus on geographical diversity. Four-year operational contracts could be awarded in a staggered fashion, with one awardee's contract expiring every year, and thus being re-assignable via a new tender process, taking into account cost ceiling commitments and the incumbent's operational performance.

I've architected three TLDs back in ancient, pre-ICANN times, and thus like to think that I'm somewhat familiar with the issues ;-). Looking at the situation today, I am a bit disappointed with the stagnation and lack of real evolution of the DNS structure in the past decade, while the Internet's importance has so much changed across the whole planet... Some out-of-the-box, global issues-aware thinking to shake up the inertia would be welcome, IMHO ;-P

OK, I'm largely engaged in this ICANN-centric universe and diverse community as a non-tech word-based individual who studies ICANN's Agreements and expects coherent and predictable outcomes from those processes.

So forgive me if I seem ignorant at a technical level, and yet I start to question the whole word-based approach, or at least the centrality words have been given in technological processes that actually operate through number strings.

I expect some to laugh at the "innocence" of what I am about to say, but I have found this thread on IDNs particularly interesting and I'm grateful to each contributor. Perhaps someone can provide me with a few answers or opinions:

What I don't understand is this:

If there's an enormous number of numerical addresses, why can't we just use some basic software so that £%$$*&£$.%£$ is keyed in in China on a Chinese keyboard and the software simply makes it resolve to a specific number. Provided that number or even a sub-stratum of that number is reserved (in the same way that a number is reserved for us when we register a domain name in ASCII), then what problem should there be. And if people outside the "Chinese keyboard" community can't type the Chinese characters, then they just type in the numbers?

Why have we become so obsessed with domain names. We are actually talking number-strings.

Can you explain to me, what is the problem with this?

Or I just set up (which, as it happens, I registered) and then enable the entire Chinese nation to type (via software translator) and or let's make it simple and safer: and the whole Chinese internet (or a networked community) operates under a Richard or Chinese Registry, except one layer down.

Though I don't suppose you need to go down to that 3rd level at all.

If a software character translator simply converted the chinese characters into a specific number then Chinese people can use the DNS on closer to their own terms.

As one correspondent has said, could we do away with TLDs altogether and just type in and register numbers (or our local characters converted to any number) direct at a "number" root/registry/whatever?

If we just let number-strings be number strings and allowed software to do the talking, then why would we worry about TLDs at all?

It seems so straightforward that I presume it's already happening.

I presume some Chinese entrepreneur has created such an interface... after all, most Chinese websites are in non-ASCII characters already... so with a simple software solution, the whole Chinese internet system must surely be typing in their own characters for DNS access.

The same or similar software could then be extended to every nation's sets of characters.

You don't need ICANN policy to dictate to individual communities or networks (because ICANN can't control everyone's software anyway). You just need the get on and do it.

Then maybe instead of a hierarchical ICANN empire, you have a network of networks, and people just choose which networks they want to participate in.

You operate access to numbers via your own character-set key-in, which triggers the software, and in you go. The numbers remain universal, but different networks meet different needs.

The door's open, because it's only people (or software) feeding in numbers. Why worry about TLDs or ICANN pretending to get in the way of that process? We wouldn't stop talking to each other "inside" the websites. We'd just let a multiplicity of peoples or networks operate with their own characters and characteristics.

Because actually, whatever useful interfaces we choose to implement with words or whatever, I'm still left thinking...

A number is just a number.

Or am I being incredibly stupid!?



The newly-approved domain names, completely in Chinese characters (including the ‘dot’ character) that we offer do not suffer from these two problems. They are already widely usable in China, and authorized by the Ministry of Information Industry (MII) of the People's Republic of China.

These ideas of westerners saying we all have to use roman-character URLs is shit. You guys are so arrogant. I don't care if America has hedgemony power, or how long Europe has been powerful. 1/4 of the world is Chinese, and China's economy has been the fastest growing in the world these years. Why the fuck would we want to use YOUR characters for OUR website names. If you can't read or type Chinese, tough. It's not our fault westerners are lazy. Go call George Bush to invade us. Do you know that the names for all the countries in the world can be written out in Chinese characters? Do you know we can write both Arabic or romance langauge names and in Chinese characters just like you can "romanize" other langauges? I'll bet you never even CONSIDERED it. How would you like it if Chinese, and other asians were chatting about how all DNS should be "sinosized" into Chinese characters and that american websites like had to be written as 麦当劳.公司.美国? Eat on that thought for a while.

English is the accepted language of the sciences, and whether you like it or
not, it will continue to be. We will never wake up one day to a the entire globe
speaking and reading chinese. Arabic numerals are the global standard, so I will
not try and push the rest of the world to shift to using Roman or Egyptian numerals and besides--- you cant make a 7-segment LED display Farsi. Standards are created for a reason - to simplify things. Imagine if you had to go
to three different stores just to find a lighbult with the proper threading for non-standard sockets. Stop dragging your heels and slowing down the andvancement of things by hanging on to ancient ways. I want people to put up websites in whatever language you like, perhaps even ones that need special 'decoder-glasses' to be able to decipher, the sky should be the limit. but lets not reinvent a system that already has such global momentum. Let's not start discussing how we are going to add 2 more rows of japanese buttons to the