Solving Unicode Problems in Python 2.7

One of the toughest things to get right in a Python program is Unicode handling. If you’re reading this, you’re probably in the middle of discovering this the hard way.

The main reasons Unicode handling is difficult in Python is because the existing terminology is confusing, and because many cases which could be problematic are handled transparently. This prevents many people from ever having to learn what’s really going on, until suddenly they run into a brick wall when they want to handle data that contains characters outside the ASCII character set.

If you’ve just run into the Python 2 Unicode brick wall, here are three steps you can take to start thinking about strings and Unicode the right way…