Joi Ito's Web

Joi Ito's conversation with the living web.

I learned Python (thanks to Sen) in a week. I wrote a birthday script, a script to scrape blogshares and put the shareholders in my sidebar and even wrote a vcard handler. I was on a roll. Then... Sifry sent me some Technorati stuff to mess with. XML? Cool, should be easy. I was just about to do the Parsing XML section of Dive Into Python anyway. Great! ...not

Dive Into Python
As I was saying, actually parsing an XML document is very simple: one line of code. Where you go from there is up to you.
So 2 hours later, I have 4 different installations of Python on my PowerBook and one on my FreeBSD machine and I can't get Mark's first example to work
>>> xmldoc = minidom.parse('~/diveintopython/common/py/kgp/binary.xml')
I've just about given up. The O'Reilly Python & XML is cryptic, I've googled around and tried a bunch of stuff and am totally frustrated. I guess I thought I was becoming a programmer, but I'm just a wimpy little script kiddie. >sigh<

So for those of you who are interested in how far I've gotten. I did see a post by Mark that the Python that comes with OS X doesn't have the necessary XML libraries so I downloaded PyXML. Well, when I try to install it, it says "NameError: name 'distutils' is not defined"

On my FreeBSD Box the Python error is:

Traceback (most recent call last):
File "", line 1, in ?
File "/usr/local/lib/python2.2/site-packages/_xmlplus/dom/", line 19
15, in parse
return expatbuilder.parse(file)
File "/usr/local/lib/python2.2/site-packages/_xmlplus/dom/", li
ne 924, in parse
result = builder.parseFile(fp)
File "/usr/local/lib/python2.2/site-packages/_xmlplus/dom/", li
ne 207, in parseFile
parser.Parse(buffer, 0)
xml.parsers.expat.ExpatError: undefined entity: line 119, column 366


Python kicks ass as a language, but its problem is and always has been lack of proper documentation.

Actually, it wasn't Mark's fault. I was using a "not well-formed" xml file and that was confusing me and python. I guess I just haven't gotten the hang of figuring out the error messages yet, and being overly anxious, I always use different test files than the ones provided... I added DOCTYPE to the XML and it parsed. Then I hit another speedbump with unicode, but Mark explained that well and I'm off again! Phew!

Strange, I don't think you need a DOCTYPE tag in order to create valid xml, sounds like a bug in the python parser? Or am I missing something?


I guess I should have said a DTD declaration... or maybe I'm being ignorant/stupid again.

Perhaps, there is help to be found at either of these two Python wikis:


PythonWiki (auf Deutch)

About XML parsing with Python that comes with OSX :

The default install does not work because it lacks the Expat parser (and maybe some other things ?).

What i did to solve this :

- Install the last beta of Python from source (it was Python 2.3a1 when i did it). there is a beta available now.

Python2.3 comes with more stuff for XML parsing.
Everything went well and all the examples from DiveIntoPython worked.

Note that installing pyXML0.8.2 does not work either :-(


After asking for help, I got an email from John Jackson who said :

"I had the same problem with PyXML 0.8.2 on Mac OS X. I tried the 0.7.1 version and that ran without the distutils problem..."

Wooo :-)

So :

Install the last beta of Python, or install the not last version of PyXML.


Thanks for the links Jonathan and thanks for the tip JY! I'm going to go get PyXML 0.7.1 now. ;-)

I wrote a Python-licensed module that helps your xml.sax scripts parse XML anywhere, including on the copy of Python 2.2 bundled with Jaguar (Mac OS X 10.2), which lacks expat.

If xml.sax is missing or nonfunctional, it adapts the older, slower xmllib to provide a compatible interface. A test suite verifies call-by-call equivalence no matter which module ends up being used.

Namespace- and Unicode-savvy. Tested down to Python 1.5.2 and up to Python 2.3 (as of this writing).