Saturday, August 8, 2009

String Manipulation: Splitting and Joining Strings

I think it's time to get back to some more basic things, so let's get to it.

The split()method of a string object lets you define a delimiter and "split" the string up into multiple pieces, which are returned as an array.

Let's say I do this:

PS C:\Users\tojo2000> $string = "one two three four"
PS C:\Users\tojo2000> $string.split(" ")

As it turns out, though, split() splits on a single whitespace character by default.


PS C:\Users\tojo2000> $string = "one two three four six"
PS C:\Users\tojo2000> $string.split()


What's the practical use for this? Let's say you have a tab-separated values file called people.txt that looks like this:

Doe John California
Doe Jane Texas
Neuman Alfred Nebraska

And you want the output to be First Last, Location. You could do it like so:

PS C:\Users\tojo2000> Get-Content people.txt |
>> %{$data = $_.split("`t"); Write-Output "$($data[1]) $($data[0]), $($data[2])"}
John Doe, California
Jane Doe, Texas
Alfred Neuman, Nebraska

But what if you have a file like this called people2.txt (let's say the first part is some kind of ID)?

2323:Doe John California
827:Doe Jane Texas
982982:Neuman Alfred Nebraska

Now you have two delimiters, but the split() method of a string object takes a string as an argument, so you would have to split twice in order to get each part of the string split into an array. If only there was some way to split on a regular expression... Oh, wait, there is.

PS C:\Users\tojo2000> Get-Content people2.txt |
>> %{$data = [regex]::split($_, '\t|:'); Write-Output "$($data[2]) $($data[1]), $($data[3])"}
John Doe, California
Jane Doe, Texas
Alfred Neuman, Nebraska

('\t|:' is a regular expression for a tab character or a colon)

What if you have an array, and you want to make a string out of it? Then you can join the strings together using join(). Now there are two ways to join a list of strings, depending on whether you're using PowerShell v1 or v2.

The first way works in both v1 and v2:

PS C:\Users\tojo2000> $buncha_strings = 'one', 'two', 'three', 'four', 'five', 'six'
PS C:\Users\tojo2000> [string]::join(' and a ', $buncha_strings)
one and a two and a three and a four and a five and a six

So what practical use would there be? For a completely contrived example, maybe you want to create a new version of people2.txt called people3.txt that uses pipe characters to separate the fields. We'll split the strings up into an array just like before, but then we'll join them back together the way we want them.

PS C:\Users\tojo2000> Get-Content people2.txt |
>> %{$data = [regex]::split($_, '\t|:'); [string]::join('|', $data)} > people3.txt
PS C:\Users\tjo2000> Get-Content .\people3.txt

There is one more way that you can join a list of strings, but this method only works in v2: the -join operator. It works the same way as the other join(), but the syntax is slightly different.

For example, here is how you would join a list of strings by tab characters:

PS C:\Users\tojo2000> $list_of_strings -join "`t"

So to wrap this all up, let me give a real-world example. I had a SQL query that I wanted to run that would retrieve the SCCM information for a specific list of computers. I needed to update the list, and I had the computers in a file called machines.txt, with one name per line.

Here's how I slurped up the file to create the new query string:

$query = @"
SELECT * FROM v_R_System
WHERE Netbios_Name0
IN ('{0}')
"@ -f ((Get-Content machines.txt) -join "', '")


Anonymous said...

Even simple article can be informative. I like your last example - will modify my current script :)

mikecrowley said...

Nice post, this helped me tonight. I did find however that
"string" -split '[AB]'
allows you to use 2 delimiters w/o regex.

Anshul said...

My output is different when using the second example.
I am using v3.0
Output am getting is when i use write-host and not write-output

Please explain the flow

# ppl.txt is

2323:Doe John California
827:Doe Jane Texas
982982:Neuman Alfred Nebraska

#command executed is

$data = gc C:\appl\ppl.txt | %{$data = [regex]::split($_, '\t|:'); Write-host "$($data[2]) $($data[1]), $($data[3])"}


Doe John California,
Doe Jane Texas,
Neuman Alfred Nebraska,

McPeppr said...

Thanks, that helped. I'd like to use the BootUpTime as a Timestamp for a session based logfile:

$LBUT = Get-WmiObject -Class Win32_OperatingSystem –ComputerName localhost |
Select-Object -Property LastBootUpTime |
Format-Table -HideTableHeaders |
$LBUT = $LBUT.split(".")[0]
write $LBUT

tabing said...

Location :
Name : slave-pos(redirected 2)
PrinterState : 0
PrinterStatus : 3
ShareName :
SystemName : NWTL-XCHANGE1

need help. How can I get the string "slave-pos (redirected 2)" ?

Rauno M├Ągi said...

I converted MAC address to OID compatible dotted decimal this way. It can be used with : or - as a separator.

[string]::join('.', ("00:15-5d-55-55-55".Split("-").Split(":") | foreach {[convert]::toint16($_,16)}) )

Anonymous said...

My source text file looks like this:-

2015-09-19 05:15:12,050 - DEBUG [ImportThread] - Added file watcher for \\05rnb-plmkrs01\asruns32\outboundforvision\sssel2
2015-09-19 05:15:12,050 - DEBUG [ImportThread] - FileProcessor constructor finished
2015-09-19 07:05:00,752 - DEBUG [6828] - New file has been created: \\05rnb-plmkrs01\asruns32\outboundforvision\blitz\20150918_BLITZ.LOG
2015-09-19 07:05:00,846 - DEBUG [ImportThread] - Created job record: 3098953 in the database for file: \\05rnb-plmkrs01\asruns32\outboundforvision\animal\20150918_ANIMAL.LOG

I would like to write a PS script to filter the above log file as follows:

Logic finds lines with "Added file watcher" and strips out certain strings to read:-
2015-09-19 05:15:12,050 - outboundforvision\sssel2

Then skips the next few lines and logic finds created job record and outputs certain string values on this line.
2015-09-19 07:05:00,846 - Created job record: 3098953 -outboundforvision\animal\20150918_ANIMAL.LOG