Thursday, September 25, 2008

Pipes, Loops, and Exploding Memory Usage

Background

Recently I wrote a script to automate some reports that are a huge pain.  I was pretty pleased with myself when I finished, but when I ran it, it kept going...and going...  It was taking a really long time, which might not have been strange because there was a lot of data, but I popped up my Task Manager, and that's when I noticed that powershell.exe was using up 1GB of RAM and climbing.  Clearly I had a problem with the design of the script, but what shocked me was that I was able to fix this  by replacing a foreach loop with a pipe to foreach-object, and the end result was that my powershell.exe process never uses more than 55MB of RAM.


Passing Objects Down the Pipe

One of the cool things about pipes is that as data is generated by a cmdlet or function it is passed down the pipe to the next one without having to wait for all of the data finish being generated. 

Consider the following:

C:\PowerShell> dir c:\ -include *.log -recurse | % {$_.FullName}

As each file is found that matches the pattern, it will be returned.  Now let's try it with a foreach loop:

foreach ($file in (dir c:\ -include *.log -recurse)) {
  $_.FullName
}

This time we have to wait for the entire hard drive to be scanned before the output comes out, and we'll use a lot more memory.  Why?   Because when you use parentheses, the expression between them is evaluated BEFORE the loop is processed.  This is essentially the same as the following:

$files = dir c:\ -include *.log -recurse

foreach ($file in $files) {
  $_.FullName
}

Most things you use PowerShell for probably won't be so large that this becomes a huge issue.  In my case I was querying Systems Management Server for inventory information on tens of thousands of computers, so it really started to impact the other things I was using.


Planning Ahead

As you're creating your scripts, try to be conscious of where you're using piped commands vs. loops, and consider how it would change your script if you refactored the code to do it a different way.  I tend to use loops more when I'm writing scripts because they are generally more readable and easier to update for the next poor sap who has to edit my code, but it's important no matter which way you choose to get the job done that you try to understand the flow of execution of your script.  

Some questions I try to ask myself when I think I'm done with my scripts:
  • Where am I causing my script to stop and collect all of the input pipeline before continuing?  (sorting in the middle of the pipeline is the classic example of this)  Does it matter?
  • What variables am I declaring at the top level that can be moved so that they are deleted automatically when they leave scope?
  • What is the impact on readability?

Sunday, September 21, 2008

I'm Not Dead

I've been really busy at work but keep an eye out for a real post very soon. If you want to check out some PowerShell scripts head on over to http://poshcode.org, where you can find PowerShell scripts submitted by users.

There has been talk in the community about trying to come up with a CPAN for PowerShell.  This is not it, but it's a start.  For one thing, CPAN grew enough that they could enforce readability and technique requirements by having someone actually review each module.  PowerShell is still too young for that.

I decided to post my SMS.psm1 module for making command-line managment of SMS easier, so here goes nothing.  I'll do a post on how to use it later.






Note:  You need PowerShell v2 CTP2 in order to use this.   Copy it into %userprofile%\Documents\WindowsPowerShell\Packages\SMS\.

Thursday, September 4, 2008

PowerShell Team Blog: Text Output Is Not a Contract


I just wanted to bring people's attention to this post from Jeffrey Snover, whose feet I am not worthy to wash with my perfume-soaked hair**.

It just reiterates the new mindset that Perlers and Pythonistas have to keep in mind when either transitioning to or assimilating with PowerShell topics.



** I know, too much information.

Monday, September 1, 2008

Putting the Fun in Functions

I just wanted to make a quick post to point out two neat features of functions that I left out of the last post since it was getting a little long, piping to functions, and autocomplete for functions.


AutoComplete for Cmdlets and Functions

You're probably used to using the TAB key to autocomplete file names, but have you noticed that you can autocomplete cmdlet and function names, too?  This comes in useful a lot since I don't have all of the standard cmdlets' names memorized yet.  Just start typing the name of a function or cmdlet and hit tab.  If the name that comes up isn't what you're looking for, just keep hitting TAB and you'll cycle through the available options.

For example:

PS C:\>  out-

Will give you, if you keep tabbing:
  • Out-Clipboard
  • Out-Default
  • Out-File
  • Out-GridView (super cool, I didn't know about this one)
  • Out-Host
  • Out-Null
  • Out-Printer
  • Out-String


Piping to Functions

Piping to functions is really easy.  Anything piped to a function is automatically added to an array called $input.  You can just add a loop in your function to cycle through the values in $input and voila!

Take the following example**:

# Get-Count()
# Gets the number of objects in the input pipeline.
#
# Returns:
#   An int with the count
#

function Get-Count () {
  $i = 0;

  foreach ($obj in $input) {
    $i++;
  }

  Write-Output $i
}


** "But Tojo," you're thinking, "Doesn't Measure-Object do the same thing?"  Indeed it does, but it's much slower in my experience because it also has a lot of extra bells and whistles that I don't need if I just want to see how many lines are in a file, etc.