Fun with regular expressions

Regular expressions are amongst the most powerful text manipulation features around. Most dynamic websites that you see around the web are based on programming languages that incorporate regular expression support, such as PHP, Perl, etc. Harvesting this power for file renaming tasks, however, is not so very straight forward. A Better Finder Rename has supported regular expressions for a couple of years now. When the feature was first introduced, I tried to help users figure out what is going on by providing a preview of the various substitution groups of the first file to be renamed. This was miles better than no special preview at all, but fell short of what I really had in mind. Version 7.0 was supposed to have the “new improved” regular expression preview, but the scope of the release just kept growing; mostly due to the mountains of great feedback that I received from my private beta testing group (thanks guys!). Still version 7.0 was a great step forward, even with one or two planned features not making it. Fans of the program will know that I add a new feature pretty much every month and have done so for the past 10 years, so the day had to come when the missing feature would finally make it. Mission accomplished with version 7.3.5 out today. So why are regular expressions so hard? and why are they so powerful? The answer to both is the same: it’s programming with text. Programming is hard; programming is powerful. Regular expressions are a “pattern manipulation language” and their syntax is both super “tight” and profoundly cryptic. Just the way that Unix geeks love things to be. Manipulating text with “reg ex” is a matter of first identifying groups of letters (or “symbols”) in the existing text (i.e. the current file name) and then rearranging the groups and perhaps adding letters (“symbols”) to the “output”. Let’s see a brief example: regx_preview_thumb1.gif As you can see the current name is “hello world” and we simply want to swap both words. First we need to identify the two words. We do this with the pattern “(.*) (.*)”. What does this mean? Well a “.” is a placeholder for any symbol. A ” ” (space) is a placeholder for a space and an “*” means that there may be 0 or more occurences of the last symbol. So “.* .*” is a pattern that matches any text that has a space somewhere. So for instance, “the cat”, “the mouse” and “the cat and the mouse” all match the pattern “.* .*” because they have a space somewhere. “the_mouse” does not match because it does not have a space. A Better Finder Rename will simply do nothing for a current file name that does not match the pattern. In other words, “the_mouse” file will be left untouched. In the screenshot you can see that our pattern is not just “.* .*”, but “(.*) (.*)”. The brackets match nothing but simply enclose a substitution group. These substitution groups can be used in the substitution expression to refer back to what was matched. Each substitution group has a “name”: \1 is the first substitution group, \2 the second, etc. A Better Finder Rename supports up to 8 substitution groups. In our example, we split “hello world” into two substitution groups: \1 which is “hello” and \2 which is “world”. In the substitution field we have put “\2 \1”, which translates into “the contents of the second substitution group, followed by a space, followed by the first substitution group”. In other words, “swap the first and the second words” of the file name. This is of course barely touching the surface of the regular expressions. Things become more interesting when you start “programming” in earnest. Say we want to swap the position of the numbers at the end of some image files in a “clever” way: regex_full_preview_2.gif We are only interested in the numbers, so we say “match all uppercase or lowercase letters” at the beginning of the name up to the first number and put all the numbers into the substitution group \1″. We then use the numbers substitution group in our substitution expression. Voila. Obviously, it isn’t really possible to cover regular expressions in detail in a blog entry. Whole books have been written on the subject. The manual page for the feature has some more details on the syntax, the built-in support and a selection of further reading materials, including some useful books. Most of you will go “oh no this is far too complicated for me” at this point. That’s ok, this is an advanced feature for advanced users: there are plenty of easy to use features in the program and you can achieve most frequent file renaming operations without regular expressions. For the “advanced users” amongst you, however, this will alert you to the presence of the feature and might motivate you to learn a little more..