Wednesday, January 25, 2012

Regular Expressions: Using Lookaheads to Group and Grab Exactly the Text You Want

My previous post used zero-width lookahead and lookbehind assertions to grab some text from a gnarly-looking string, so I thought I'd follow up with a quick post on how that works.  It's not as complicated as the name sounds.

I had this string, from which I wanted to extract the domain and username:


I know that I want the text between the double-quotes immediately following the words "Domain" and "Name".  I decided on this approach:

$string -match '(?<=Domain\=")(?<domain>[^"]+).*(?<=Name\=")(?<name>[^"]+)'

The characters in blue are, as described in the previous post, named groups, which will be captured and assigned in the automatic variable $matches with those names (Eg. $matches.domain).  The characters in red are the zero-width lookbehind assertions.

So what are they good for?  You can use lookaheads and lookbehinds if you want to make sure that a specific pattern comes before or after the pattern you want to capture, but don't actually want that pattern to be captured.  They look like groups, but will not be added to $matches.

A lookbehind assertion looks like this:


A lookahead assertion looks like this:


Ah, but what if I want to make sure that a certain pattern does not follow my group?  Just replace the equality sign with an exclamation point, like so:


So let's break down what my regex does:

# Check that the pattern 'Domain\="' is in the string, 
# but do not capture this group.

# Immediately following it, capture one or more characters that are not the 
# double-quote character and name this group "domain"

# Match zero or more of any character.

# Check that the pattern 'Name\="' is in the string, 
# but do not capture this group.

# Immediately following it, capture one or more characters that are not the 
# double-quote character and name this group "name"

Tuesday, January 24, 2012

Named Groups in Regular Expressions

I don't know how I went this long without discovering named groups in regular expressions, but I'm genuinely excited about them (yes, I'm a nerd).

A quick recap of the most common way to use regular expressions in PowerShell. Let's say I have a string like the one below (sorry it isn't a more simple example, but this is literally something I ran into today).  I got it by querying the local administrators of a system using SCCM.  The problem is, I want it in domain\user format.


My first thought was to do something like this:

$string -match '(?<=domain\=")([^"]+).*(?<=name\=")([^"]+)'

It evaluates to True on my test string, so I go look at $matches:


Name                           Value
----                           -----
2                              adminuser
1                              MYDOMAIN
0                              MYDOMAIN",Name="adminuser

Okay, I've captured my groups, but I notice something strange.  Why is $matches a hashtable instead of an array?  Because of named groups, that's why.

To create a named group, you put the parentheses around it just like normal, but you add
'?<groupname>' to the beginning of the capture.  This stores the group under the name 'groupname'.  Let's try that with the above example:

$string -match '(?<=domain\=")(?<domain>[^"]+).*(?<=name\=")(?<name>[^"]+)'


Name                           Value
----                           -----
name                           adminuser
domain                         MYDOMAIN
0                              MYDOMAIN",Name="adminuser

It makes my regex a little longer, but it is so much easier now when I go to use the values I've collected to remember $matches.domain and $ instead of $matches[1] and $matches[2].

Tuesday, January 10, 2012

Harnessing the Power of PowerShell to Load-balance Sophos Servers

At work we have a decent-sized Sophos installation.  This means that we have to use message relays to manage the status traffic back and forth between the Enterprise Console and the clients.  I recently discovered that although I could use groups to point client updates to their local server for updating, the message routers weren't affected.  As a result almost all clients ended up using the same server as a message relay.  I confirmed with my TAM that this feature is by design, so I set out to fix it with a script.  What I ended up with is basically what you see below.

A few things worthy of note:

  • I've pretty much standardized on using that logging boilerplate for most of my scripts.  It makes it easy to log errors and insert debug statements at the code as I'm writing so that I can always set -loglevel to 'debug' later when troubleshooting.
  • I made the caller pass the name of the mrinit.conf file so that I could create one small SCCM package for the script with all five different mrinit.conf files.
  • If you decide to do this, don't use the mrinit.conf file from the root of the package directory on the Update server.  There should be an mrinit.conf file in the rms subfolder.  Use that one.  If it isn't there, then you might not be configured to use a message relay, and this script won't help you until you are.

I am doing the QA and testing for my organization.  I make no guarantees that this script will work for yours.  Sophos is a temperamental beast, and you should do the due diligence to test and do the QA and do whatever modifications it takes to make it work for yours.  You may also wish to consult with your Sophos TAM before undertaking a project like this.