Nuts and Bolts Regex in PowerShell Lead image: Photo by JJ Ying on Unsplash.com
Photo by JJ Ying on Unsplash.com
 

Regular expressions and metacharacters in PowerShell

Patterns

Almost all programming and scripting languages allow the use of regular expressions, but many professionals still believe regex is a relic from ancient times. With specific examples, we show how useful and meaningful regex can be in PowerShell. By Thomas Bär, Frank-Michael Schlede

When administrators worked exclusively at the command line, they could impress the ordinary user with the endless rows of cryptic letter and number combinations (e.g., (\d{1,4}\.){4}(\d{1,4})), which then changed entries in text files as if by magic.

Even though system administrators today do a large part of their work with graphical tools that provide a convenient interface, the use of regular expressions (regex) significantly facilitates the work. This is true, in particular, when you need to automate and simplify tasks with the use of PowerShell scripts.

PowerShell with Regex

If you have already developed or used some PowerShell scripts, you will typically have come into contact with regular expressions – even if you were perhaps not aware of it. The following example illustrates this very well:

$An_Array = @('somethingno1', 'somethingno2','morestuff')
$An_Array | Where-Object {$_ -match 'something'}

Here, you first create an array of strings and then launch a query that only displays the first two elements of the array, because the third element does not match the 'something' pattern. The -match operator can also be used without the Where-Object cmdlet. Thus, calling:

'somethingno1' -match 'something'

returns the value True because the search pattern was found in the string, whereas calling:

'somethingno1' -match 'nothing'

logically returns False. The -replace operator also works with regular expressions such as

'The book is good' -replace 'The book', 'The ITA book'

which then returns the string The ITA book is good. The -replace operator compares, finds the matching string The book, and replaces it with the The ITA book before output. Thus, the purpose of regular expressions is summarized as follows: They are mainly used for making comparisons or replacing values and characters. In addition to operators for direct comparison of values, such as -eq (equals) and -gt (greater than), the similarity operators -like and -notlike, the replacement operator -replace, and the match operators -match and -notmatch all belong to the comparison operators category. The -replace operator, as in the example here, and the -match and -notmatch operators can all handle regex. The two following calls thus produce exactly the same output on screen:

> Get-Service | where {$_.status -like "running"}
> Get-Service | where {$_.status -match "running"}

Please note that this query is not case sensitive – it does not distinguish between upper- and lowercase. Both calls will find processes that are displayed as running or Running. If you need a comparison that is explicitly case sensitive, use the -cmatch operator. To make it clear to any other user reading your shell script that you do not want to differentiate between upper- and lowercase, use the -imatch operator, which works in the same way as -match.

Both calls display all of the processes that are active (running) on the system, which could initially lead to confusion with many PowerShell beginners. However, the -like operator works exclusively together with the asterisk (*) metacharacter (or wildcard), which stands for any number of characters, excluding other metacharacters. Therefore, comparisons can be made in a far more accurate and meaningful way by using -match with the help of metacharacters. Metacharacters are most responsible for the bad reputation of regular expressions, because they make your command line look like hieroglyphics.

Patterns and Metacharacters

Regular expressions are patterns (character strings) that describe data. Such an expression always represents a certain type of data in the search pattern and often include metacharacters. Some of the most important of these characters used in PowerShell scripts are:

. ^ $ [ ] { } * ? + \

This list is not exhaustive and only reflects a small selection of the metacharacters available in PowerShell. In many cases, you want to determine whether a string that stands for a file name starts with a specific letter or has a particular extension. Three metacharacters known as quantifiers are used here: the asterisk *, plus +, and question mark ?. The asterisk stands for a character that occurs a random number of times, or not at all, which means the expression will be true even if the character you are looking for is not in the string. In contrast, the plus sign stands for a character that occurs at least once or an arbitrary number of times. Finally, the question mark stands for a character that might only be found in the string once or not at all. Thus, the call

> 'something.txt' -match 'i*'

returns the value True because the i pattern was found, followed by no or any number of characters in the string. In this type of search pattern with an * metacharacter, it does not matter where the i is found. The following call also returns True:

> 'Thatistheone.txt' -match 'i*'

It would make more sense to determine whether the letter i, for which you are searching, occurs at the start of the character string. To do so, use the ^ metacharacter (circumflex accent or hat). After calling

> 'itssomething.txt' -match '^i'

the shell returns True, whereas the call

> 'Thatistheone.txt' -match '^i'

returns the value False. If you are looking for a character at the end of the string, you can use the dollar sign $, which must then be specified after the comparison template. Thus,

> 'something.txt' -match 't$'

returns the value True. You can read the $matches variable, which is automatically created and filled by calling -match and in which the corresponding hash table is stored. Enter

> 'something.txt' -match 't$'
True
 **
> $matches
Name ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** Value
0 ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** t

to check the $matches variable.

Letters and Numbers

The regular expressions in PowerShell use character classes, such as those that are also available in Microsoft .NET Framework 3.5. If you want to use the -match operator to determine whether, for example, an object is a letter, then use

> $Teststring='Programming'
> $Teststring -match "\w"

which returns True. In this case, it is important that you type a small w after the escape character \, which is used here to keep the w from being interpreted as a normal single character by the shell. In contrast to PowerShell's normal behavior, upper- and lowercase are distinguished. If you use the call

> $Teststring -match "\W"

PowerShell checks for non-letters, which means the expression would be True if the shell were to come across a number in the string, for example. However, because the first character is a letter, the comparison is cancelled immediately and a value of False is returned. In this case, reading $matches shows that the character P was found. PowerShell always compares the pattern to be examined with the regex call, until the condition is met. This also works when comparing numbers, which you can do with the following call (Figure 1):

> $Teststring='Programm456ing
> $Teststring -match "\d"
True
 **
> $matches
Name ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** Value
0 ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** 4

The shell stops the comparison after finding the first number in the string. Because this is not practical in many cases, the behavior of the comparison can be changed using a metacharacter.

> $Teststring -match "\d+"
True
 **
> $matches
Name ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** Value
0 ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** 456
When searching for numbers, the shell stops at the first match. If + is added, all the numbers in the string are found.
Figure 1: When searching for numbers, the shell stops at the first match. If + is added, all the numbers in the string are found.

Curly brackets help improve precision and can help you determine a number of characters that should be found in the string. The basic syntax of such a call is {no. of min. characters, no. of max. characters}. If you just have one number between the curly brackets, PowerShell checks for at least this number of characters,

> $Teststring -match "\d{2}"
True
 **
> $matches
Name ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** Value
0 ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** 45

whereas calling

> $Teststring -match "\d{2,3}"

returns True if at least two and at most three numbers exist in the character string.

Checking Input

Along with the metacharacters that compare a string at the beginning ^ and at the end $, you can now show a slightly more practical example in the form of querying an input string:

> $Input=  Read-Host "Please enter three digits followed by at least 4 but less than 8 letters!"

Once the user has entered a value, you can check the input for correctness with the following regular expression (Figure 2):

> $input -match "^\d{3}\w{4,8}$"
Although checking input is certainly not rocket science, this example demonstrates the principle.
Figure 2: Although checking input is certainly not rocket science, this example demonstrates the principle.

If you still think this is very cryptic, simply try it out. This type of query is certainly not suitable for checking a password; however, it can be used to change incorrect input in shell scripts.

This example shows that the curly brackets can be used to check for a specific number of matches. In contrast, with square brackets [], you can compare a specific range of characters. If you specify several characters within the brackets, the condition is correct if at least one of those characters is found in the object examined:

> $Teststring ='What is it To you, dude?'
> $Teststring -match "[wy]"
True

I have already mentioned that the shell cancels the comparison as soon as the first match is found, but it is also possible to specify a span of the alphabet within which (in this case) the sought-after letter should come (Figure 3):

> $Teststring -match "[p-t]"
True
 **
$matches
Name ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** Value
0 ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** t
PowerShell does not make case-sensitive comparisons, even in the current 5.0 version. If you need case sensitivity, you have to use the -cmatch operator.
Figure 3: PowerShell does not make case-sensitive comparisons, even in the current 5.0 version. If you need case sensitivity, you have to use the -cmatch operator.

This command finds the first character between p and t inclusive. Here, t is the first match. In other scripting languages, you can use regular expressions of this type to run case-sensitive comparisons (e.g., by specifying the area as [P-T]). Unfortunately, in many discussions on the web, you will find false statements telling you that this approach will also work in PowerShell.

At the time of writing this article in February 2017 under version 5, as found in Windows 10 and Windows Server 2016, PowerShell could not make case-sensitive comparisons. Additionally, a brief test with the current alpha version 6.0.0.16 showed no change in this behavior. If you want to restrict your search to uppercase, you not only have to specify the area as [P-T], but you also have to use the previously mentioned -cmatch operator.

Finally, I'll look at a special feature of metacharacters when used in PowerShell that has certainly contributed to the bad reputation of regular expressions. Some of these metacharacters have different meanings and thus different effects, depending on where they are used. One of these characters is the aforementioned circumflex accent ^, which allows you to find matching patterns at the start of a string. However, if you use this character within square brackets, it negates the formulated search pattern. The call

> $Teststring='Hello678 again'
> $Teststring -match "[^a-z]"
True
 **
> $matches
Name ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** Value
0 ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** ** 6

discovers that the first character in the string that is not a letter is the number 6. To find all related numbers in the string, use the

> $Teststring -match "[^a-z]+"
variant; the output in the <C>$matches<C> variable is then <I>678<I>.

Conclusions

If you take a close look at the use of regular expressions in the context of metacharacters in PowerShell scripts, you will quickly discover the enormous opportunities hiding there, despite some pitfalls. Moreover, the world of regex holds significantly more special characters and options than I could possibly introduce in a single article. The old programming adage also applies: The quickest way to learn is by trial and error.