While much of data cleaning is performed before loading data in a database (especially for one-time, ad hoc analyses), there is a way to do it, continuously (like once a week or once a day), once the data is in its final database. It consists of adding look-up tables to help with the messy fields.
When I was working at eBay, there was a database of clients from around the world. Some clients had names in a foreign language, containing accents and special characters. Somehow, it made some SQL joins very tricky. We created a lookup table of names, matching different spelling of a company name, to a standardized name and client ID. Think about names such as M.I.T and MIT that represent the same entity but can be spelled differently. It also helps dealing with duplicate records.This old trick allows you to do fuzzy matching, and the size of the lookup tables (updated daily) was manageable.
What do you think of this idea? Of course the best solution is to use this system, together with traditional cleaning techniques, if possible. But in systems where data is automatically uploaded and updated on a daily basis, lookup tables are very helpful.