Hey PowerShell Guy !,How Can I Parse a Tab-Delimited File and Then Save That as a Comma-Separated Values File?
from the Hey Scripting Guy article :
How Can I Parse a Tab-Delimited File and Then Save That as a Comma-Separated Values File?
gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {$r = new-object object} {$r | add-Member -memberType noteProperty -name $_.split('=')[0] -Value $_.split('=')[1]}{$r}} | export-csv -NoTypeInformation
What ?! can you say that again more slowly ?
ok, here is how I came to this, Its not so bad I did in in a couple of minutes from the PowerShell console :
I saved the example from the original article as tab.txt
Get-content to read it :
[PoSH]> gc tab.txt
Cre Rec Name=Jack Address=5 XYZ Drive Phone=555-4567
Cre Rec Name=Jill Address=7 XYZ Drive Phone=555-6547
Cre Rec Name=Jake Address=9 XYZ Drive Phone=555-9876
[PoSH]> gc tab.txt |% {$_.split("`t")}
Cre Rec
Name=Jack
Address=5 XYZ Drive
Phone=555-4567
Cre Rec
Name=Jill
Address=7 XYZ Drive
Phone=555-6547
Cre Rec
Name=Jake
Address=9 XYZ Drive
Phone=555-9876
[PoSH]> gc tab.txt |% {$_.split("`t")} |? {$_ - match '='}
You must provide a value expression on the right-hand side of the '-' operator.
At line:1 char:40
+ gc tab.txt |% {$_.split("`t")} |? {$_ - <<<< match '='}
[PoSH]> gc tab.txt |% {$_.split("`t")} |? {$_ -match '='}
Name=Jack
Address=5 XYZ Drive
Phone=555-4567
Name=Jill
Address=7 XYZ Drive
Phone=555-6547
Name=Jake
Address=9 XYZ Drive
Phone=555-9876
[PoSH]>
After that split on [tab] , and filter the parts that contain the = sign , everytime using the UP arrow to get back former line.
Note that % is an alias for foreach-object and % an alias for where-object, as they are more handy for interactive use like this.
In the next step I turn the lines into objects with there fields as noteproperties :
[PoSH]> gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {$r = new-object object} {$r | add-Member -memberType noteProperty -name $_.spli
t('=')[0] -Value $_.split('=')[1]}{$r}}
Name Address Phone
---- ------- -----
Jack 5 XYZ Drive 555-4567
Jill 7 XYZ Drive 555-6547
Jake 9 XYZ Drive 555-9876
[PoSH]> gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {$r = new-object object} {$r | add-Member -memberType noteProperty -name $_.spli
t('=')[0] -Value $_.split('=')[1]}{$r}} | gm
TypeName: System.Object
Name MemberType Definition
---- ---------- ----------
Equals Method System.Boolean Equals(Object obj)
GetHashCode Method System.Int32 GetHashCode()
GetType Method System.Type GetType()
ToString Method System.String ToString()
Address NoteProperty System.String Address=5 XYZ Drive
Name NoteProperty System.String Name=Jack
Phone NoteProperty System.String Phone=555-4567
[PoSH]>
If you follow my blog and did read the former posts about textscraping from Websites : Tag : TextScraping on my blog, it will be clear why I picked this as I use it also in the babynames and scorelist posts.
but I will work it out a bit more below for how it is used this case because this way of working can look a bit strange as you are not used to it,
Basicly I start with the same as the scripting guy did split the fields on the = sign ,
[PoSH]> "name=Jack".split('=')[0]
name
[PoSH]> "name=Jack".split('=')[1]
Jack
[PoSH]> $o = New-Object object
[PoSH]> $o | Add-Member -Name "name" -Value "Jack" -MemberType NoteProperty
[PoSH]> $o | gm
TypeName: System.Object
Name MemberType Definition
---- ---------- ----------
Equals Method System.Boolean Equals(Object obj)
GetHashCode Method System.Int32 GetHashCode()
GetType Method System.Type GetType()
ToString Method System.String ToString()
name NoteProperty System.String name=Jack
[PoSH]> gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {"New-Object"}{ $_.split('=')}{"Output object"}}
New-Object
Name
Jack
Address
5 XYZ Drive
Phone
555-4567
Output object
New-Object
Name
Jill
Address
7 XYZ Drive
Phone
555-6547
Output object
New-Object
Name
Jake
Address
9 XYZ Drive
Phone
555-9876
Output object
[PoSH]>
but after that I make a new empty Object and add custom properties for every field on a line, I use the Begin and End blocks of a foreach-object loop to create a new object for every line and after that to output it.
Now we have the inputfile as a collection of objects we can just use the export-Csv Cmdlet provided in PowerShell to export it to CSV
[PoSH]> gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {$r = new-object object} {$r | add-Member -memberType noteProperty -name $_.spli
t('=')[0] -Value $_.split('=')[1]}{$r}} | export-csv
cmdlet export-csv at command pipeline position 3
Supply values for the following parameters:
Path: tab.csv
[PoSH]> gc tab.csv
#TYPE System.Object
Name,Address,Phone
Jack,"5 XYZ Drive",555-4567
Jill,"7 XYZ Drive",555-6547
Jake,"9 XYZ Drive",555-9876
[PoSH]> gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {$r = new-object object} {$r | add-Member -memberType noteProperty -name $_.spli
t('=')[0] -Value $_.split('=')[1]}{$r}} | export-csv tab.csv -NoTypeInformation
[PoSH]> gc tab.csv
Name,Address,Phone
Jack,"5 XYZ Drive",555-4567
Jill,"7 XYZ Drive",555-6547
Jake,"9 XYZ Drive",555-9876
[PoSH]> Import-Csv tab.csv
Name Address Phone
---- ------- -----
Jack 5 XYZ Drive 555-4567
Jill 7 XYZ Drive 555-6547
Jake 9 XYZ Drive 555-9876
[PoSH]> Import-Csv tab.csv | format-list name
Name : Jack
Name : Jill
Name : Jake
[PoSH]>
so now you have it in CSV format and as CSV support in PowerShell is great you can get it back into an Object to work with it very easy.
you can see this method of first convering input to objects is very powerfull, not as hard as it seems at first, is consistant, very quick and works for many situation so there is a lot of Return on Investment in learning it.
At any rate, it’s always a treat to watch your PowerShell mow down the opposition. He blew his first oneliner right by several REPL loops, and completely combined several more different task with his flexible Pipeline.
Enjoy,
Greetings /\/\o\/\/