New project: Denormalizer

So…

I was working on South integration for a project the other day and realized that I wanted to denormalize some things.

Like, keep the number of pages in the site update in the update record, the number of errors, title in the site update page record rather than sucking it from the saved HTML, that kind of thing.

And, I realized that, when things go wrong, there should be an automated way of fixing/updating the denormalized data.

Hence, the Denormalizer.

The basic idea is to specify the denormalized pieces of data in your database in a format resembling a South migration. You provide rules that say “run this query on that database, and stick the results over here.” Every denormalized piece of data has a rule for creating it.

Then, whenever things get hosed, or you’re just feeling insecure about the state of the data, put it into “maintenance mode” and have the Denormalizer go and count things up, extract other things (like the title, above) and fix up the denormalized fields.

Any time you want to make a new denormalized data chunk, just specify the rules for it, and off you go…it’ll be just like it was always there being dutifully updated along with the “normal” fields.

Pretty cool idea; wonder if I’ll ever get to implement it…

Stupid MySQL Python with Stupid 64 bit MySQL on Stupid 64 bit Snow Leopard

So…

I’ve decided to go with MySQL for my latest product for a variety of reasons.

To build MySQL support for Python, you have to have the MySQL headers and such available.

Even though I installed from the MySQL supported binary for OS X 10.6, the MySQL-python installer couldn’t find the support files.

To get it to work, I had to edit `site.cfg` in the `MySQL-python-1.2.3c1` directory to uncomment out line 13 and edit it to read:

mysql_config = /usr/local/mysql/bin/mysql_config

The comment in the code says, above that:

# The path to mysql_config.
# Only use this if mysql_config is not on your PATH, or you have some weird
# setup that requires it.

Well, isn’t that special…

Then, to add insult to stupidity:

  File "/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/site-packages/django/db/backends/mysql/base.py", line 13, in 
    raise ImproperlyConfigured("Error loading MySQLdb module: %s" % e)
django.core.exceptions.ImproperlyConfigured: Error loading MySQLdb module: dynamic module does not define init function (init_mysql)

I found some instructions that suggested trashing the build directory, then executing the following (I broke it into separate lines):

# export ARCHFLAGS="-arch x86_64"
# python setup.py build
# sudo python setup.py install

This, however, gave me the same error on import.

So, I trashed `build`, `dist`, and `*.egg-info` and tried again.

Same shit.

So, I dug a little deeper. Of course, there’s always the suggestion that one build MySQL from source, but that just seems melodramatic.

It’s not complaining about MySQL, it’s complaining about the module not defining an init function.

I hunted down this thread that seems to be right up to yesterday so obviously I’m not the only one arguing with this:

http://cd34.com/blog/programming/python/mysql-python-and-snow-leopard/

So far, the solutions all seem to say to run Python in 32 bit mode but that just seems idiotic. I’ve got 64 bit everything and this library just doesn’t seem to have correct build instructions. I’m not going to start crippling my system in the hopes of making it limp along; that’s just stupid.

Of course, it’s after 1am and so am I right now so I think I’ll stop…for now.

How to Work On Multiple Twisted Branches at the Same Time

So…

I’m very interested in the new HTTP/1.1 functionality that’s pending review in Twisted in two different branches.

They’re all about new Twisted Web Client functionality and HTTP/1.1, and are documented in TwistedWebClient.

Don’t download them, but they are:

expressive-http-client-886-4

and:

high-level-web-client-3987

The main Twisted version control repository is currently Subversion which isn’t particularly good at merging things or, at least, I’ve never been very happy with the way it works.

So, I asked on the #twisted IRC channel how one would go about working on those two branches simultaneously.

The following conversation ensued:

ssteinerX: Does anyone know how to checkout twisted-branch-expressive-http-client-886-4 and twisted-branch-high-level-web-client-3987 in such a way as they can be used together?
ssteinerX: 3987 depends on the stuff in 886-4 but they’re in completely separate branches
ssteinerX: I forget (thankfully) how to even use svn other than a simple checkout…
ivan: I use git and merge everything into my personal branch
ssteinerX: ivan: have you merged those two particular branches successfully?
ivan: yes
ssteinerX: Is it something you could possibly post (or have posted) to github?
ivan: my git svn mirror of unmodified Twisted code is at ludios.net
ivan: git svn fetch, make a branch, merge 886-4, merge 3987

So…taking that information, here’s how to make a local branch that contains everything from those two branches together:

First, get whatever is the current “Twisted-svn-git-to…” archive from ludios.net.

I use wget like:

	# wget http://ludios.net/mirror/Twisted-whatever-the-heck

Unarchive that, then change to the directory and update it with:

	# git svn fetch

That will pull the latest changes from the svn server right into your git repository.

Then, make a branch of your own to work on. I called mine 886-4+3987 since that’s what it is…

	# git branch 886-4+3987
	# git checkout 886-4+3987
	# git merge expressive-http-client-886-4
  	# git merge high-level-web-client-3987

And there you have it. Everything in both branches, working in one checkout.

You can make a new virtualenv, install this branch to it with the normal `python setup.py develop` and go about your business with your new Twisted!

Distribute/distutils Sprint #1, QA Manager Update

We recently had our first Distribute/distutils sprint.

Tarek basically laid out the project:

1. Brief presentation of the project

  • Distribute is a fork of setuptools
  • it’s composed of two branches 0.6 and 0.7
  • 0.6 is a drop-in replacmeent for setuptools, and is not the topic of today’s sprint
  • 0.7 is a complete rewrite; the tool of our dreams
  • 0.7 general philosophy:
    • be a good distutils citizen (no more patches)
    • keep all the good bits of setuptools, but rewrite them, except easy_install
    • try to find the right place in the packaging ecosystem w.r.t. pip, distutils

We went over these things but that was the basic gist of the rest of the sprint.

The tasks for the sprint are outlined on the Python moin wiki at Distutils
Distribute Sprint
.

I ended up leading the QA effort but, since we’re starting from scratch, most
of my work will be in the future, making sure we have full code coverage and,
more importantly (and more widely useful), making sure we have buildbots to test everything.

Since we don’t have infinite hardware or financial resources available, my
plan is to make it so that we can fire up buildbots at will, on cloud servers,
get results, and shut them down ’till next time.

So…that’s the status update from the Distributils QA Team Leader.

Python Distribute/distutils

Distribute is a fork of the oft maligned setuptools project, which has fallen into disrepair after much neglect (and a single committer system). Up until the most recent bug fix (6.0.11, of 2009-10-20), it had not been touched in over a year.

The leader of the Distribute project is the indefatigable Tarek Ziadé of Expert Python Programming fame (see mini-review @ end of post).

Distribute is a ‘friendly fork’ in that PJE, maintainer of setuptools, has pretty much said “Please get me out of the software distribution business!”, even so much as offering stewardship to a couple of people who are not Tarek (they cleverly declined).

The first Distribute/distutils sprint is scheduled for October 20th at 7pm Paris Time (GMT +2). I’ll be there.

The day before the sprint (October 19th), I had gotten curious about the distutils testing procedure and had found Tarek’s distutils buildbot project.

It wouldn’t run on my newly fired up Rackspace Cloud server so, in the course of the day, we got the distutils-buildbot set up with a Fabric script file that uploaded the various components required for a full, multi-version buildbot.

By the end of the day, with a single:

# fab -H you.re.hos.tip doItAll

You’d have a fully functional buildbot in about 5 minutes or so.

Also, on the same day, I finally released the first early version of my Fabkit which is a set of utility functions for use with Fabric. These are real-world script chunks that can be pulled in and used within a Fabric file just by referencing them. I had to get it out the door ’cause I sure wasn’t going to rewrite all that stuff for the buildbot script!

Also, on the same day, I mostly finished my article for Python Magazine about using Fabric with Rackspace Cloud Servers to fire up on-the-fly testing servers.

Buildbots, Fabric, Rackspace, Testing? Anyone see a pattern here?

More on that in my next post…

Expert Python Programming is my most highly recommended Python book of all time for anyone with more than a few months of Python under their belt. It’s well written and shows a complete picture of a real, working Python programmer’s daily toolset with enough information to actually use those tools effectively. I was a big fan before I even met Tarek and, now that I know him, I recommend it even more; he really does walk the walk. (see my affiliate link, you pay the same price at Amazon, I make about a dollar)

Eclipse for Python Dev!?

Ok, so I’m finally officially sick of using TextMate. Seems like every time I turn around, I find some new annoyance and, after five years of waiting for a new version, I don’t think I really care any more.

I bought the recent upgrade to BBEdit and it’s definitely an improvement, but with some of the new coding standards I’m working within (always using Pyflakes, Pylint, nosetests, pythoscope etc.) I really need a factory, not a tweezy text editor and a command line.

So, I finally bit the bullet and installed Eclipse from here, and pydev from within Eclipse (Help->Install New Software, http://pydev.org/updates/).

I’ll be writing more on this (especially the new toolchain), but, for now, the most important thing is where to set the comment color to green italic text.

Screen shot 2009-10-11 at 12.32.55 PM.png

Snow Leopard vs. virtualenv – easy_install virtualenv==dev != latest

So there I was, merrily plooking along with my various Python projects and had occasion to make a new virtual environment using `mkvirtualenv` from Doug Hellmann’s excellent virtualenvwrapper.

And it hung.

I ctrl-C’d out after a few minutes and tried again. Hung.

Figured it might be a Snow Leopard thing, so I did a quick:

	# easy_install virtualenv==dev 

Figuring that’d get me the latest version.

Same thing.

Poked around in the source looking for a clue for a minute, then did the obvious; Googled for the error message.

Which lead me to this post.

Turns out that easy_install grabs from a subversion repository that’s not quite up to date with the new code up on bitbucket.

To quote that post:

Turns out that triggers an install from the Subversion repository at colorstudy.com which *doesn’t* have the Snow Leopard fix, but is also labeled as version 1.3.4dev. So I guess I was chasing my tail a bit.
I should have done this:

> easy_install http://bitbucket.org/ianb/virtualenv/get/tip.zip

That gets the virtualenv with fix I was after, and indeed does work.

So, the lesson is: in this time of projects moving off of their own little subversion repositories and onto bitbucket and github, and easy_install, out of the box, supporting only subversion and CVS (which I won’t dignify with a link), check your assumptions about which version of what you’ve got installed; sometimes things LIE!

Hopefully, this will save someone else some time and trouble.

P.S.
Speaking of subversion, I’m hoping to get to use the setuptools Mercurial plugin working sometime soon since most of my new projects are on Mercurial, but I’ll probably wait until I convert over to Distribute which may get it built in sooner rather than later.

BaseCamp Access in Python

We’ve been using BaseCamp for a while now and I finally had enough stuff in it that I needed a better way to view it than a web browser.

Browsers are fine but they’re not the ideal interface for everything in the world despite what the “Web 2.0″ evangelidiots will try to sell you. Yes, you can get around much of the annoyance of the page refresh modality of the browser with Ajax tricks but there comes a point where the metaphor is just wrong.

Also, Basecamp is missing a few things, like project templates, which I desperately need. We do a “Simple Site Audit.” It’s the same for *every* site. Same To-Do lists, same Writeboards (for whitepapers), same **everything**. Can’t be done with stock Basecamp.

So… I need to get at my Basecamp stuff in Python.

BaseCamp, coming from 37Signals, inventors of Ruby on Rails, has a very Ruby-centric view of the universe and all of the demonstration code using the API, the one file, that is, is Ruby. The guts of the api are all XML instead of JSON which would be much more AJAX (and Python) friendly but whatever…

So, I started poking around with it using their sample code.

Let me just say, well written Ruby, Perl, Python, Awk, whatever, it’s all pretty much the same stuff and I’ve used every one of’em to write actual paid-for jobs.

So I went poking around in the API example code.

I’m not sure if it’s the language itself or just the idioms that have developed within the Rails universe but I found the API code incredibly annoying to read. Not that I can’t figure it out, just that it takes longer than it should. And is annoying.

It’s almost like everything’s on backwards, and for no good reason I can understand.

Here’s a beaut, right from the API:

    def [](name)
      name = dashify(name)
      case @hash[name]
      when Hash then 
        @hash[name] = if (@hash[name].keys.length == 1 && @hash[name].values.first.is_a?(Array))
          @hash[name].values.first.map { |v| Record.new(@hash[name].keys.first, v) }
        else
          Record.new(name, @hash[name])
        end
      else
        @hash[name]
      end
    end

Now remember, this is supposed to be code demonstrating “best practices” use of an API.

This is the actual indentation, as written.

After reading it over a couple of times it became apparent what it does and also that it suffers from being “clever.” Clever as in “stupid.”

I’m not sure why people write code like that, and I’m really not sure how a company like 37Signals lets it get out, *expecially as the official API to their main subscription product* but there you have it.

I rewrote this in Python, in about the same number of lines, maybe three or five more.

I showed a programmer friend of mine, whose main language is Objective-C/Cocoa the original version and he said, and I quote:

What the fuck?

When I showed him my Python version he said, and I quote:

Oh, that makes sense.

Me personally, I’ll take “Oh, that makes sense” over “What the fuck?” every time.

This is not to say that Ruby is a bad language or that all Ruby code is bad, or that all Ruby programmers write inscrutable crap but /s/Ruby/Perl/ and we’d be having almost exactly the same discussion. I’m not sure what it is but the inscrutable crap code seems to be drawn to Ruby and Perl like incomprehensible bugs are drawn to Visual Basic.

I have a whole theory about the finite number of bugs in the universe and how having them all attracted to Visual Basic is actually a good thing since it keeps them out of real programs…

Since there doesn’t seem to be a complete, working Python API wrapper for Basecamp, I guess I’ll have to write one; I really need project templates.

When I do, I’ll try to sell it to 37 Signals, then it will be available for the world. If they don’t buy it, I’ll just have to figure out what to do then… In the meantime, if you need a clean Python wrapper for the Basecamp API, just give a shout. I’m sure we can work something out.

What a pain in the ass…

S

The WSSW Stack

Choosing The Stack

Ok, so I’ve been plooking around with various web frameworks, even languages, for a couple of years now.

Now, while starting WebSauce Software for real, it’s time to choose a standard toolset. This is what we are going to use to produce our software until further notice.

Unless there’s a compelling reason to change, this is what we’re using.

If something great comes along to replace a component then fine, but it’ll have to be pretty damn good for us to switch.

If it’s great, we’ll switch.

Adapt or die!

First a little history.

We got into the web business about 7 years ago after 25 years of general purpose contract programming which overlapped with about 10 years of software publishing.

I started consulting in about 1982, started publishing software in about 1986, stopped publishing software in 1994, and retired from the software business, sort of, in 1995, had my first son in 2002, and went back to work in 2004-ish.

I did some consulting between 1995 and 2004, but only a handful of really complex, challenging jobs. I was not making a living, I was just taking on work I liked and wanted to do.

When I went back to work, I didn’t know exactly what type of work would be coming up and I wasn’t too worried about it. I’ve always managed to keep busy.

Unfortunately, I had been out of the loop for almost 10 years so most of my old consulting contract clients were gone, companies changed hands, engineers at those companies moved around to parts unknown. In short, I didn’t really have any contacts any more.

So, I rented an office and hung my shingle out to see what would happen.

People kept asking me if we did websites.

So, I said we did.

Now, it’s not that we hadn’t done websites before that for ourselves or for customers, but we weren’t in the business of making websites for other people.

So, now we were, and we did.

Lots of them.

We grew, hired people, had clients, had a stream of new clients, a few big clients, I wrote some nice tools for in-house use that made us more efficient than other companies, we learned the web development business and everything was hunky-dunky.

Except…

I hate making new websites for people who don’t already have them.

They have unrealistic expectations of what the site can do for them and especially, how much it should cost. At least people with existing sites have an idea what things cost, and know what the site is doing or not doing for them.

Improving an existing site is way better, for us. Less friction, better
results all’round.

What I do like…

Fixing existing sites

Fixing up an existing site is a blast. We get to leverage all of our cool tools and, because of those tools, we’re very efficient at it. Because of our efficiency, clients get a better deal that they did from their prior company which makes us look good and, since almost everything is automated, we make good profit margins.

Best part? I get paid to spend time ploinking on the tools we use to do customer jobs more efficiently which is the most fun for me.

Doing SEO

Getting sites to rank well in the Search Engines, making sure that their customers can do useful things with their website, and generally helping our customers serve their customers better.

My software engineering background has allowed me to write some tools that do things in this area that nobody else has. We’ll be publishing some of them soon. We’ll let you know ;-).

Writing web applications

Things that are kind of like desktop applications but run in a browser and do things that are appropriately web based. We’ve done SalesForce.com integration, custom database editing applications for real estate brokers, inventory control and management against existing, legacy databases that just need a new view to be more useful than they already are, all kinds of stuff. Love it.

How I’ve Written All This Stuff

I’ve written utilities for doing the repetitive parts of SEO and also written web applications for various purposes for clients and for in-house needs.

I was always hunting for the best development toolset both for client applications and for our own internal tools.

I’ve gone through a lot of tools.

So I tried…in no particular order

and God knows how many other frameworks, version control systems, WSGI components, templating languages, and chunks and parts of various solutions.

So…I’ve finally settled

So, after all that trial and error, here’s my toolset.

This is what I’m using from now on unless there’s a compelling reason to use something else. Most of the bigger tools (Django, for example) have or are developing plug-in parts for things like the templating system so these choices are not as rigid as having this list might imply.

Linux Distribution: Ubuntu

I’ve used just about every Linux distribution at one time or another, we host lots of sites on the Centos series, I think one of our in-house boxes is Suse. Then I started using Ubuntu since it seemed to be the one most of the documentation for the tools I was using was written for. I figured there must be some reason for that since it was just too pervasive to be a coincidence. Not a coincidence. It just works better. All of our cloud servers are now fired up with Ubuntu 9.04 server configuration and I run Kubuntu (I absolutely hate Gnome, love KDE). I’m envious of the MacOS-X Aqua theme, only for Gnome so far, but it’s not enough of a reason to switch to Gnome.

Ubuntu has been rock solid, and apt-get blows away any other system package tool I’ve used (yum, nasty RPMs, etc.).

Language: Python

The language I always come back to. I’ve tried other languages. Seems like I’ve tried every other language at one time or another. Last time I counted it was, like, 40 or something including dialects of Basic, Pascal, C, C++, Delphi, various Assembly languages, Perl, Ruby, Awk, SmallTalk, Lisp, Sed, Haskell, and many, many others I can’t even remember. I don’t remember who said it but Python really is executable pseudo code

VCS: Mercurial (hg)

Up until a few months ago, we were Subversion users. I feel dirty even saying it, now. We used Perforce for one job but I hated it the whole time. I always found Subversion annoying; especially trying to merge branches.

The centralized repository always gave me an uncomfortable feeling I never identified until I started using Git on an Open Source project I was working on.

The first time I did a merge, I was hooked. It was painless and it wasn’t a trivial merge either. I had to manually resolve one conflict out of 30 or so changes. It took five minutes. It would have taken all day in Subversion and I would have been swearing the whole time. I was leaning toward Git, not having used any of the other likely suspects much until this announcement.

Then, there’s Google’s support which double sealed the deal.

Since the main Python repository is going to be Mercurial, and since that will likely drive adoption on other projects that have yet to move out of Subversion, and since Mercurial is written in Python, it would be silly to use anything else since there’s really little obviously superior about any other DVCS.

Mercurial is also sure to get lots of loving attention and will pass Git in short order in any area where it’s currently lagging. Fortunately, Git, Mercurial, and Bazaar are similar enough that it’ll be easy enough to switch around when needed.

WebSauce’s projects will all be DVCS’d in Mercurial and I’ll document the setup as soon as I get around to it. The setup, that is…

Desktop App Development: Cocoa/Objective-C

I tried writing my first OS X Application for publication using Python and PyObjc. I had a working prototype but, even with expert help, couldn’t get it to run anywhere but my development system. Next app is pure Objective-C and, if I need Python for something, I’ll run it as an external process and work on getting the results back some way other than being running inside the main application space.

Web Framework: Django

I may not like some of the parts of the Django stack so much but it all hangs together well and, if I get sufficiently dissatisfied with any particular part, I’m sure there will be a way to “fix” it on my own checkout and submit a patch. I’m pretty sure most of the Django pieces are pluggable to some extent and, where they’re not, it would be good of me to help make them so. That’s what Open Source is all about, right?

Web ToolKit: Twisted

Twisted does so many things, and our applications need so many of them, that it’d be silly not to use the grandfather of all things Python and Web.

Sure, it’s a little hard to wrap your head around in the beginning, and there are parts that are dark, deep, and mysterious, but I’ve been hanging around on the mailing list and IRC channel and I’m confident that if I run into a problem, and do my research before asking for help, that I’ll be able to get any problem solved in relatively short order.

Because so much of the rest of our apps require Twisted services, we’re going to run our Django app using Twisted’s WSGI unless we run into problems, Then we’ll fall back to eiter CherryPy’s WSGI, Apache’s mod_python, Apache’s mod_wsgi. Whatever, not a big deal.

Documentation Language: Restructured Text

The documentation format of Python that can be easily converted to everything else.

It’s human-readable in source form, intuitive, and is everywhere in all the tools I use.

No brainer.

Other Tools

The stack really isn’t worth anything unless you can deploy it.

For that, I’m relying on several other Python based tools:

Paste

I’m only using the directory template creation of Paste. Paste is for the most part, overgrown and under-focused but the directory templating works well enough for now.

virtualenv, virtualenvwrapper

These allow me to set up an isolated Python environment in which to run my applications. Keeps all the cruft out of the system and gives me an attainable target to deploy.

zc.buildout

Allows creation of a completely self-contained app. Virtualenv’s great for development, but this wraps it all up in a one-stop-shopping bundle.

fabric

Makes deployment as simple as writing a Python script that does what you want to distribute an application to wherever you want to deploy it.

Sphinx

The documentation tool used on the Python project itself. You can set up a documentation structure in one command, write your docs in reStructuredText, and have it in html, latex, and several other formats in a flash.

github/BitBucket/LaunchPad

Not really part of the deployment stack but from having worked on several open source projects on github with git, I think it’s about the best there is right now. I’m still interested in looking at BitBucket and I’m contributing to a few projects there as well but Github seems to be more mature and has a much more informative and useful interface. LaunchPad is very ambitious, and seems well thought out and pretty all-encompasing. Unfortunately, the only backend it supports is Bazaar. Yuck.

Basecamp

We’ve been using Basecamp for a while now for project management. It’s not perfect but it is the best shared system we’ve found. We’ve tried Google Docs and got addicted to shared documents but the rest of the system doesn’t provide any project management functionality so things tended to get lost in there since there was no way to indicate what was to be done next. Basecamp also has shared documents (Writeboards) and also ToDo Lists and Milestones which make it possible to keep a project moving.

FogBugz

We’re currently using FogBugz to track our bugs in the OS X product that we’re untangling the Python code from and it really is a great bug tracking system.

We’ve been focused mostly in BaseCamp so it will be interesting to see how well they integrate or whether we move to another system for this functionality. An obvious choice would be Trac and, with the buildout script, maybe it won’t be so abominable to install.

For now, that’s it.

I’ll be updating this as I update the toolset but this is it, for now…

It Sounds Gross But It’s Tasty: yolk

It sounds gross but yolk is not a gooey, salmonella laced food product; it’s a utility for finding out what the heck’s available in your currently active Python installation.

In my previous post I mentioned Ian Bicking’s virtualenv and Doug Hellmann’s virtualenvwrapper.

One of the reasons for using virtualenvs in the first place is to avoid the confusion and conflict that can arise from having multiple versions of multiple libraries spewn into the available “import space” of a Python app.

Just today, I was working on using CherryPy‘s WSGI server to serve up a Django application and had a version conflict in my system Python that was loading an old version of CherryPy.

I ran easy_install cherrypy a couple of times and realized that something was making sure that the 2.x version was going to be loaded even though I’d just installed the new one again. It took me about 20 minutes to find and fix the problem, removing the package and screwing around with .pth files.

I saw a comment on the virtualenvwrapper article mentioning yolk.

I installed and ran it and I gotta’ say, the sheer number and number of versions of libraries and crap in my machine’s default Python environment is staggering, disgusting, unworkable, and embarrassing.

Pre-virtualenv, I would just install shit, try it, and either keep using and updating or just forgetting it was there.

So…there is a gross-out factor with yolk.

Do this, in this order, then get to cleaning up before someone gets sick! Use your system Python for this test and prepare to be grossed out with how much cruft is in your system Python that’s slowing down your apps and just generally bogging things down.

Cleanup that yolk!

	# easy_install yolk
	# yolk -l
	# python
	>>> import sys
	>>> sys.path

Cleanup? That’s the topic for another post but if your environment looks anything like mine, it’s in need of some serious help…