Another question I heard at Microsoft Ignite 2015 and one that I come across frequently is about the best way to handle large-scale tasks. In other words, what is the most efficient way to handle a large task that might take a very long time to complete. My answer will sound like your Mom, “It depends.” Almost always the question is about how to create a multi-threaded solution to a PowerShell problem. There’s no right way to accomplish this goal. Although I am going to demonstrate a variety of techniques, much will depend on your task, how you phrase it, and your PowerShell skill level.
To demonstrate, I have a typical IT pro task. Let’s grab all of the errors from the system event log from a number of computers in the domain. The first inclination is to simply use Get-Eventlog.
$computers = "chi-dc01","chi-dc02","chi-dc04","chi-hvr2","chi-core01","chi-web02"
$data = Get-Eventlog -LogName System -EntryType Error -ComputerName $computers
Assuming all of the computers are online, this will work. In my domain, $data contains over 12,000 entries and took about eight minutes to complete. If I have other work to do, then perhaps eight minutes isn’t that big a deal. Don’t forget you could also use Start-Job to kick this off in the background and collect the results later. But let’s say you want the data now. What are your options?
The first step is to identify the potential bottleneck. What part of the PowerShell expression is most likely responsible in determining how long the command will run? Be aware that you might have more than one possibility. In this example, I would say the Computername is the bottleneck. As written, Get-Eventlog has to process each computer sequentially. To make this more efficient, I need to find a way to query each computer in parallel.
One option is to spin off each computer into a separate background job.
$jobs = @()
foreach ($computername in $computers) {
$jobs += Start-Job -ScriptBlock { Param($computername) Get-Eventlog -LogName System -EntryType Error -ComputerName $computername} -ArgumentList $computername
}
Each job’s scriptblock runs the same Get-Eventlog command but against a different computer. The scriptblock has a separate scope, so I pass the computername as a parameter. The end result is a background job for each computer. Now I can wait for all the jobs to complete and collect the results.
$data = $jobs | Wait-Job | Receive-Job -Keep
This is a little better. This took about six and half minutes in my tests.
Another approach that you will find if you search for PowerShell performance is the use of runspaces. Creating a background job is one way of using runspaces and perhaps the easiest because it shields you from all the .NET details. However you can dive in if you are so inclined. Be aware that going this route is definitely advanced and adding features like error handling is a bit trickier.
I’ll start by initializing an empty array that will eventually hold all of my runspaces. I’m going to create one for each computer.
$pool=@()
For each computer name, I’m going to:
Here’s the code:
foreach ($computername in $computers) { #much harder to incorporate error handling $run = [runspacefactory]::CreateRunspace() $run.Open() $pipe = [powershell]::Create() #Add the runspace to the PowerShell pipeline $pipe.Runspace = $run #add the command to the pipeline $pipe.Commands.AddCommand("Get-Eventlog") #add the parameters $pipe.Commands.AddParameter("Logname","System") $pipe.Commands.AddParameter("EntryType","Error") $pipe.Commands.AddParameter("Computername",$Computername) $pool += $pipe.BeginInvoke() | Add-Member -Membertype Noteproperty -name Powershell -value $pipe -PassThru } #foreach
The BeginInvoke() method writes a runspace object to the pipeline. I also add a property that includes the corresponding PowerShell pipeline. You’ll see why in a moment.
I loop, waiting all of the runspaces to complete.
While (-Not $pool.IsCompleted) {
Start-Sleep -Milliseconds 200
}
Once finished, I can enumerate the pool and get the results using the EndInvoke() method.
$pool | foreach {
$data += $_.PowerShell.EndInvoke($_)
$_.Powershell.dispose()
}
I have to pass the associated runspace as a parameter to EndInvoke on the PowerShell pipeline object. The last part is the cleanup you should do.
$run.Close()
$run.Dispose()
This worked much better and completed in two and a half minutes. It took me quite a bit of time to work out the details since we are on the edge of PowerShell scripting and veering into .NET programming. There are probably a few things that can be done to improve this approach, but I’d rather use traditional PowerShell solutions.
One such solution is to leverage a PowerShell workflow. There is a lot of associated overhead, but workflows support a –Parallel parameter with Foreach. Someday I’m hoping this feature makes its way into the language but for now you need to use a workflow like this:
Workflow TestMe { Param([string[]]$Computers) foreach -parallel ($computername in $computers) { Get-Eventlog -LogName System -EntryType Error -PSComputerName $computername } } #end workflow
Once defined, I can call it.
$data = TestMe $computers
The workflow is running locally and is essentially spinning up new runspaces for each computername. This is very speedy and finished my test in 24 seconds. Nice, but it takes some time to set up.
I think the best approach, again it works for this situation, is to take advantage of PowerShell remoting. Why not run the Get-Eventlog command simultaneously on all of the remote computers? This is still a one-line command:
$data = Invoke-Command {Get-Eventlog -LogName System -EntryType Error} -ComputerName $computers
If you wanted, you could even use the -Asjob parameter. This approach kicks off multiple sessions and directs the results to my computer. This took about 24 seconds to retrieve 12,000+ event log entries! I don’t think enough IT pros think about leveraging remoting for performance purposes. I’m trying to get people to stop thinking about managing one thing at a time but rather a lot of things at one time. I think this is a great example.
Now, after all of my demonstrations, let me also say that sometimes PowerShell is not the right tool for the job. If your task is so complicated or performance-sensitive, you might be better off looking for another solution. If you have to write a PowerShell script that is full of .NET code, you might as well break out Visual Studio and build a .NET tool or look for third-party software solutions. PowerShell will never perform as well as a compiled application, but that’s ok. For the majority of IT pros, PowerShell is an easy-to-use management tool that scales very well. Sometimes you have to be a little creative as I’ve shown here.