For the last several days, I haven't been able to access the English Wikipedia from home. This has happened in the past. The reason is that the DNS that my ISP provides me is returning an error when looking up en.wikipedia.org.

dig en.wikipedia.org
;; Truncated, retrying in TCP mode.
;; communications error to 210.130.232.1#53: end of file
The odd thing is that jp.wikipedia.org and other Wikipedia subdomains resolve. Also, when I try another DNS server, en.wikipedia.org resolves. The DNS server I am using, DNS.CDN-JAPAN.COM (210.130.232.1) is run by IIJ, my ISP. Has anyone else had similar problems either with other domains on IIJ or problems with en.wikipedia.org on other DNS servers?

Sorry, for this obscure and geeky post, but being prevented from using Wikipedia has become extremely irritating.

UPDATE: en, fr, nl, de, pl fail. ja, eo, ko, es, zh work. It appears that the ones which are failing use geodns.

UPDATE 2: I caught up with a senior guy from my ISP IIJ at the Internet Association meeting yesterday and explained the problem to him. He said that MAYBE it is because they are running a load balancing thing that might interact weirdly with geodns. He's looking into it for me.

UPDATE 3: I got a response from my ISP. They said that the "AUTHORITYSECTION" was being returned making the record longer than 512 bytes forcing it to respond via TCP instead of UDP. They said that they thought my firewall was blocking TCP responses from dns. They changed the setting on the nameserver not to add the AUTHORITYSECTION and now it appears to work for me... I've asked them to provide me with another long domain entry greater than 512 bytes so I can see if I can replicate the error...

Technorati Tags:

13 Comments

Possible causes I can think of:

1) Packet filtering UDP Port 53. Ummm, very unlikely

2) Packet size is greater than 512 bytes which is forcing the system to retry as TCP. Unlikely, as the packets returned here in North America are far smaller than 512 bytes.

3) The responding name server is a Windows implememntation of BIND. they have a nasty habit of sending the occasional empty trailing packet and firewalls hate that.

Just some ideas...

smp

You've got the default Ecto Technorati tag "puppy" in your post :)

Thanks Gen. Fixed the tag.

Stephen: The weird thing is that I can not find any other queries that produce the same error on this DNS. Also, IIJ is one of the largest ISPs in Japan...

This might be stating the obvious but... have you tried calling IIJ to ask them about it? I was with @nifty when I was living in Tokyo - whenever I had a problem I would call them and they were always very helpful, usually identifying and fixing the problem quite quickly. I presume IIJ would be similar (after all, service in Japan is generally much better than western countries).

This is definitely an issue with your DNS
server. It does not answer DNS queries
on TCP port 53, just closes connection.

When a DNS answer is small enough, it is
answered via UDP which works - that is why
other domains resolve.

Have you considered running your own caching DNS server ?
If your main machine is an Apple PowerBook running OSX, then it already has the software required (BIND) to make you independent of your ISP/Temporary Office/Hotel's DNS servers. Here's the step-by-step approach:


1) If not already done, enable the “root” account using Applications/Utilities/Netinfo Manager


2) Open a Terminal window, and "su" to root


3) Edit /etc/hostconfig with pico/ed/vi/emacs/whatever and replace this line:
DNSSERVER=-NO-
with
DNSSERVER=-YES-
This will ensure that BIND a.k.a. /usr/sbin/named will be automatically launched the next time you restart your computer


4) Start the BIND software manually (saves a reboot)
# cd /tmp
# /usr/sbin/named


5) On OSX, /etc/resolv.conf is generally a symlink to /var/run/resolv.conf. The /var/run file is created by OSX based on the DNS server information supplied by your ISP's DHCP servers. This symlink must thus be replaced with a static resolv.conf file, pointing to your own machine at 127.0.0.1
## Save the symlink, in case you want to restore OSX's default behavior)
# mv /etc/resolv.conf /etc/resolv.conf.bak
## Create a new /etc/resolv.conf file
# cat>/etc/resolv.conf
nameserver 127.0.0.1

# chmod 644 /etc/resolv.conf


6) Check that your PowerBook is now using its own DNS server (127.0.0.1) to resolve names
# dig en.wikipedia.org.
Blah
Blah
;; SERVER: 127.0.0.1#53(127.0.0.1)



To stop using your own DNS server, and restore OSX's default DHCP-specified DNS behavior, just do:
# /bin/mv -f /etc/resolv.conf.bak /etc/resolv.conf
Put "DNSSERVER=-NO-" in your /etc/hostconfig file
Kill the /usr/sbin/named process

To tell "cat" that there's an EOF, an invisible "Ctrl-D" is needed as the second line after the "nameserver 127.0.0.1" line in step 5, of course.

I did email IIJ. I guess I should call them too.

MV: That's a good idea. I should try running bind on my machine... Thanks for the step-by-step.

OK. Now I have more questions than answers. I did what you said and when I did a dig en.wikipedia.org it timed out...

I changed resolve.conf to 204.69.234.1, which is UltraDNS and when I did a dig, the dig pointed to UltraDNS and resolved en.wikipedia.org for me fine. When I tried to access it through my browser, it said it couldn't find en.wikipedia.org. Do my applications no look in resolve.conf? Where does my machine store the nameserver that it gets from DHCP?

Sorry about my ignorance. I suppose I should just go Google around... and I realize the irony of an ICANN board member not being able to figure out bind on his own. ;-) Having said that, I'd been meaning to get bind up and running so this is a good an opportunity as any.

Hmm... It looks like I'm going to the Internet Association of Japan meeting today so maybe I'll try to snag someone from IIJ there...

note: one does not have to enable root to do this on OSX. Just use the "sudo" command before the command which needs root if you are an administrator on your machine. For example "sudo vi /etc/hostconfig". I run a caching DNS server at home on my G4 Cube using 10.2.8 and have never enabled the root account.

slightly related: If you are behind a PIX firewall running the 6.3 code (or lower) you may need to disable "DNS fixup" due to the long DNS packets scenario described above. It seems that there are LOTS of crappy name servers out there...

Your Mac will automatically synthesize a dynamic /var/run/resolv.conf file, based on the DNS hints it receives from your ISP's DHCP server.
The apps running on the Mac normally use the operating system's standard resolver libraries, instead of rolling their own DNS routines. The OS resolver routines A change in /etc/resolv.conf (which generally is a symlinked alias to /var/run/resolv.conf) should thus be automatically taken into account by your apps. There might be a need, however, to quit apps like web browsers, and relaunch them after having made changes to your resolver configuration file. This is because some apps "cache" the results of their previous DNS queries.

Here are some troubleshooting procedures for your personal DNS server setup:

1) Kill the named process (if it's already running), so that we can start with a clean DNS cache.
# ps auxw | fgrep named
root 472 0.0 0.1 76728 1204 ?? Ss 7:13AM 0:00.01 /usr/sbin/named
# kill 472

2) Launch a new /usr/sbin/named process
# cd /tmp
# /usr/sbin/named

3) Is the /usr/sbin/named process running ?
# ps auxw | fgrep named
root 495 0.0 0.1 76728 1204 ?? Ss 7:13AM 0:00.01 /usr/sbin/named
OK, the new named process is running with process ID 495

4) Can this “named” on our own machine (127.0.0.1) respond to our DNS queries ?
# dig @127.0.0.1 . ns
Blah
;; ANSWER SECTION:
. 518400 IN NS M.ROOT-SERVERS.NET.
. 518400 IN NS A.ROOT-SERVERS.NET.
. 518400 IN NS B.ROOT-SERVERS.NET.
. 518400 IN NS C.ROOT-SERVERS.NET.
. 518400 IN NS D.ROOT-SERVERS.NET.
. 518400 IN NS E.ROOT-SERVERS.NET.
. 518400 IN NS F.ROOT-SERVERS.NET.
. 518400 IN NS G.ROOT-SERVERS.NET.
. 518400 IN NS H.ROOT-SERVERS.NET.
. 518400 IN NS I.ROOT-SERVERS.NET.
. 518400 IN NS J.ROOT-SERVERS.NET.
. 518400 IN NS K.ROOT-SERVERS.NET.
. 518400 IN NS L.ROOT-SERVERS.NET.

dig should normally return the names of the official (ICANN-sanctioned ;-) 13 name servers (A..M) authoritative for the DNS root "."


3) If the above “dig” times out with a“no servers could be reached” message, even though the named process is running, we need to check whether DNS traffic (UDP port 53) is coming back to your Mac (it might be blocked, say, by your DSL router's firewall config, or by a router's flaky UDP NAT implementation, or by your ISP's traffic filters...)
We can use the Mac as a simple network sniffer with the “tcpdump” command. If your Mac uses its built-in Ethernet port, we'll point tcpdump to the “en0” interface. If, instead, your Mac communicates with the Internet with an Airport wireless card, we must tell tcpdump to listen to the “en1” interface.
Open a second Terminal window in which to run tcpdump, while we do some dig tests in the first Terminal window.
## Launch tcpdump in the second Terminal window, monitoring, say, the Airport interface
# tcpdump -i en1 udp port 53
## In the first Terminal window, send a DNS query to your named process
# dig @127.0.0.1 . ns

Normally, tcpdump should see at least two UDP/53 packets: the one sent by your Mac's named to a randomly chosen root name server (say, B.ROOT-SERVERS.NET) and the reply from same.
07:22:16.932775 IP 192.168.1.5.49634 > 202.12.27.33.53: 51946% [1au] NS? . (28)
07:22:17.054610 IP 202.12.27.33.53 > 192.168.1.5.49634: 51946*- 13/0/14 NS B.ROOT-SERVERS.NET.,[|domain]

You can now terminate tcpdump by entering Ctrl-C in the second Terminal window.


4) If tcpdump doesn't see a reply packet coming from a root name server, then something is wrong with your router/firewall/NAT/ISP... Troubleshooting that would be outside the scope of this comment :-(


5) If tcpdump sees the reply coming from a root name server, then your “dig” should in principle succeed, and show you the names of the 13 root name servers (A..M.ROOT-SERVERS.NET). Your named is running fine, and we can start to use it by pointing /etc/resolv.conf to it.


6) Verify the syntax of your /etc/resolv.conf file. It must contain only this one line:
nameserver 127.0.0.1
Its Unix permissions should be 644
Note the file name: It's "/etc/resolv.conf", not "/etc/resolve.conf"


7) Run “dig” without explicitly specifying the server with @127.0.0.1. That simple “dig” will then rely on your default resolver library configuration, making it a good predictor for all the other apps' (e.g. web browsers') behavior.
# dig en.wikipedia.org.
Blah
;; ANSWER SECTION:
en.wikipedia.org. 600 IN CNAME rr.gdns.wikimedia.org.
rr.gdns.wikimedia.org. 600 IN CNAME rr.knams.wikimedia.org.
rr.knams.wikimedia.org. 3600 IN A 145.97.39.133
rr.knams.wikimedia.org. 3600 IN A 145.97.39.134
Blah
;; Query time: 783 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)

Note that, strictly speaking, en.wikipedia.org's DNS entry doesn't seem to be RFC-compliant, as it uses a two-level CNAME redirection, which is generally considered a no-no. A CNAME (Canonical Name) should point to a directly resolvable DNS symbol (with an A record), instead of a second CNAME. I would thus have preferred seeing the following answer. Maybe that's why some DNS servers like IIJ's act flaky when trying to resolve en.wikipedia.org :-|
;; ANSWER SECTION:
en.wikipedia.org. 600 IN CNAME rr.knams.wikimedia.org.
rr.knams.wikimedia.org. 3600 IN A 145.97.39.133
rr.knams.wikimedia.org. 3600 IN A 145.97.39.134
Blah


8) If the dig fails, something is definitely wrong or non-standard with your OSX setup or your karma. Troubleshooting that would be, I'm afraid, outside the scope of a blog comment ;-)

Joi,

As a quick fix, you can probably also add any hostname and its associated IP address to your /etc/hosts file and it should override any other settings.

Alternatively, if you use a home router or IP masquerading box you could probably also do this in the configuration settings there and the change would work on your entire internal network. Note, its a hack.. but being hard wired, its immune to spoofing.. which is getting more and more common..

I often do this for debugging web sites - it works like a charm...

Leave a comment

About this Archive

This page is an archive of recent entries in the Business and the Economy category.

Books is the previous category.

Computer and Network Risks is the next category.

Find recent content on the main index.

Monthly Archives