PowerShell Problem Solver: PowerShell String Parsing with Regular Expressions

In a recent article called, PowerShell Problem Solver: PowerShell String Parsing with Substrings, I showed you some ways to parse strings with PowerShell. For many PowerShell beginners, splitting strings works just fine. Eventually you’ll realize that you want more control, and this is where regular expressions come into play. I am not going to try and teach your regular expressions from scratch. There is an entire chapter in the second edition of PowerShell in Depth that covers regular expressions in PowerShell. You should also take a few minutes to look at the help topic about regular expressions.

A regular expression is a way of using a pattern to describe some piece of data. Granted, coming up with the pattern can be time consuming. Let’s see if our string challenge can help shed some light on the subject. If you recall, I am starting with a string, presumably from a log.

$s = "Mailbox:9WJKDFH-FS349-1DSDS-OIFODJFDO-7F21-FC1BF02EFE26 (O'Hicks, Jeffery(X.))"

The goal is to extract O’Hicks, Jeffery(X.) from the string. I’ve intentionally modified my name to throw in a different character because you might face something similar. The examples I am going to show you should also work for simpler strings as well. If you know for a fact what your data will look like, you might even be able to get by with simpler patterns. But enough chat. The simple way to even test if there is a matching pattern is with the –Match operator.

$s -match "\S+,\s\S+"

The stuff to the left of the –match operator is the regular expression pattern. Here’s how it breaks down, and it is case-senstive:

  • \S means get any non-whitespace character
  • + means get one or more instances of the preceding, e.g. non-whitespace character
  • , means a literal comma
  • \s means a single whitespace
  • \S+ is a repeat of the first part.

The –match operator will return true or false. If true, you can look at the built-in $matches variable to see what matched.

Using the -match operator in Windows PowerShell. (Image Credit: Jeff Hicks)
Using the -match operator in Windows PowerShell. (Image Credit: Jeff Hicks)

This is similar to what I came up with splitting in the previous article.

As before, I can now parse out the relevant part of the string.

$t = $matches.values
$t.Substring(1,$t.length-2)

Parsing a relevant part of the string in Windows PowerShell. (Image Credit: Jeff Hicks)
Parsing a relevant part of the string in Windows PowerShell. (Image Credit: Jeff Hicks)

The thing about regular expression patterns is that they float to match anywhere in the string, unless you use anchors. You can also fine-tune your pattern:

$s -match "\b\S+,\s\S+\(\w{1}\.\)"

There are a few additions here.

  • \b indicates a word boundary, which usually means there is a space before the character
  • \w means an alphanumeric character
  • {1} means exactly one of the preceding, e.g. alphanumeric, characters
  • \. means a literal period. The period is a special regular expression character, so I need to escape it with a \ so PowerShell treats it literally.
  • \) means a literal parentheses. These are special characters, so if you mean a ) you need to escape it.

As you can see, now I have exactly the result I need without any additional parsing.

Our modified command with the -match operator in Windows PowerShell. (Image Credit: Jeff Hicks)
Our modified command with the -match operator in Windows PowerShell. (Image Credit: Jeff Hicks)

I can get the value from $matches.values. Still with me? Let’s spin your head a bit more, and let me show you the REGEX object. This object starts out as a regular expression pattern.

[regex]$rx="\(.*\)"

This is a variation on what I used before.

  • \( is a literal parantheses
  • . means any single character
  • means any preceding instance so .* is a way of saying every and anything
  • \) is a literal parentheses

The REGEX object gives you a bit more control. Pipe $rx to Get-Member and you should see something like this:

Using the REGEX object in Windows PowerShell. (Image Credit: Jeff Hicks)
Using the REGEX object in Windows PowerShell. (Image Credit: Jeff Hicks)

If you merely want to test and see if the pattern matches in the string, then you can do this:

$rx.IsMatch($s)

You can also use the Match method:

$rx.Match($s)

Using the Match method in Windows PowerShell. (Image Credit: Jeff Hicks)
Using the Match method in Windows PowerShell. (Image Credit: Jeff Hicks)

As I did with –Match, I can get the value and parse it.
Grabbing the value with the Match method in Windows PowerShell. (Image Credit: Jeff Hicks)
Grabbing the value with the Match method in Windows PowerShell. (Image Credit: Jeff Hicks)

As you can see it all comes down to the pattern.

Since I’ve already found a pattern that matches exactly what I want without any additional parsing, I might as well use that.

[regex]$rx="\b\S+,\s\S+\(\w{1}\.\)"
$rx.Match($s).value

020415 2333 PowerShellP7
Or if you want to show off your head-spinning PowerShell skills, try this one-liner:

([regex]"\b\S+,\s\S+\(\w{1}\.\)").Match($s).Value

The advantage to a regular expression is that you can find matching patterns within strings without having to worry about how long the string is or how it might be formatted. You still need to know your data and it must be consistent. If some lines show the data I want as “O’Hicks, Jeffery(X.)” but it might also be “O’Hicks, Jeffery{X.}” or “O’Hicks, Jeffery X”, it might make your regular expression pattern a bit more complicated.
I know many IT pros are new to regular expressions and find them difficult, but like anything, it simply takes practice. So the next time you are looking to parse some string for some nugget of information, see if regular expressions can make your life easier. But I strongly believe that if you want to be taken seriously as a PowerShell professional, then you need to develop at least some basic proficiency with regular expressions. We’ll wrap up this mini-series next time with another aspect of regular expressions – named captures.