Add to Technorati Favorites
Welcome to ThePowerShellGuy.com Sign in | Join | Help

Hey PowerShell Guy !,How Can I Parse a Tab-Delimited File and Then Save That as a Comma-Separated Values File?

from the Hey Scripting Guy article :

How Can I Parse a Tab-Delimited File and Then Save That as a Comma-Separated Values File?

 

gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {$r = new-object object} {$r | add-Member -memberType noteProperty -name $_.split('=')[0] -Value $_.split('=')[1]}{$r}} | export-csv -NoTypeInformation

 

What ?! can you say that again more slowly ?

ok, here is how I came to this, Its not so bad I did in in a couple of minutes from the PowerShell console :

I saved the example from the original article as tab.txt

Get-content to read it :

 

[PoSH]> gc tab.txt                                                                                                                          
Cre Rec Name=Jack       Address=5 XYZ Drive     Phone=555-4567                                                                              
Cre Rec Name=Jill       Address=7 XYZ Drive     Phone=555-6547                                                                              
Cre Rec Name=Jake       Address=9 XYZ Drive     Phone=555-9876                                                                              
[PoSH]> gc tab.txt |% {$_.split("`t")}                                                                                                      
Cre Rec                                                                                                                                     
Name=Jack                                                                                                                                   
Address=5 XYZ Drive                                                                                                                         
Phone=555-4567                                                                                                                              
Cre Rec                                                                                                                                     
Name=Jill                                                                                                                                   
Address=7 XYZ Drive                                                                                                                         
Phone=555-6547                                                                                                                              
Cre Rec                                                                                                                                     
Name=Jake                                                                                                                                   
Address=9 XYZ Drive                                                                                                                         
Phone=555-9876                                                                                                                              
[PoSH]> gc tab.txt |% {$_.split("`t")} |? {$_ - match '='}                                                                                  
You must provide a value expression on the right-hand side of the '-' operator.                                                             
At line:1 char:40                                                                                                                           
+ gc tab.txt |% {$_.split("`t")} |? {$_ -  <<<< match '='}                                                                                  
[PoSH]> gc tab.txt |% {$_.split("`t")} |? {$_ -match '='}                                                                                   
Name=Jack                                                                                                                                   
Address=5 XYZ Drive                                                                                                                         
Phone=555-4567                                                                                                                              
Name=Jill                                                                                                                                   
Address=7 XYZ Drive                                                                                                                         
Phone=555-6547                                                                                                                              
Name=Jake                                                                                                                                   
Address=9 XYZ Drive                                                                                                                         
Phone=555-9876                                                                                                                              
[PoSH]>                                                                                                     

After that split on [tab] , and filter the parts that contain the = sign , everytime using the UP arrow to get back former line.

Note that % is an alias for foreach-object and % an alias for where-object, as they are more handy for interactive use like this.

In the next step I turn the lines into objects with there fields as noteproperties :

 

[PoSH]> gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {$r = new-object object} {$r | add-Member -memberType noteProperty -name $_.spli
t('=')[0] -Value $_.split('=')[1]}{$r}}                                                                                                     
                                                                                                                                            
Name                                           Address                                        Phone                                         
----                                           -------                                        -----                                         
Jack                                           5 XYZ Drive                                    555-4567                                      
Jill                                           7 XYZ Drive                                    555-6547                                      
Jake                                           9 XYZ Drive                                    555-9876                                      
                                                                                                                                            
                                                                                                                                            
[PoSH]> gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {$r = new-object object} {$r | add-Member -memberType noteProperty -name $_.spli
t('=')[0] -Value $_.split('=')[1]}{$r}} | gm                                                                                                
                                                                                                                                            
                                                                                                                                            
   TypeName: System.Object                                                                                                                  
                                                                                                                                            
Name        MemberType   Definition                                                                                                         
----        ----------   ----------                                                                                                         
Equals      Method       System.Boolean Equals(Object obj)                                                                                  
GetHashCode Method       System.Int32 GetHashCode()                                                                                         
GetType     Method       System.Type GetType()                                                                                              
ToString    Method       System.String ToString()                                                                                           
Address     NoteProperty System.String Address=5 XYZ Drive                                                                                  
Name        NoteProperty System.String Name=Jack                                                                                            
Phone       NoteProperty System.String Phone=555-4567                                                                                       
                                                                                                                                            
                                                                                                                                            
[PoSH]>                                                                                                   

 

If you follow my blog and did read the former posts about textscraping from Websites : Tag : TextScraping on my blog, it will be clear why I picked this as I use it also in the babynames and scorelist posts.

but I will work it out a bit more below for how it is used this case because this way of working can look a bit strange as you are not used to it,

 Basicly I start with the same as the scripting guy did split the fields on the = sign ,

 

[PoSH]> "name=Jack".split('=')[0]                                                                                                           
name                                                                                                                                        
[PoSH]> "name=Jack".split('=')[1]                                                                                                           
Jack                                                                                                                                        
[PoSH]> $o = New-Object object                                                                                                              
[PoSH]> $o | Add-Member -Name "name" -Value "Jack" -MemberType NoteProperty                                                                 
[PoSH]> $o | gm                                                                                                                             
                                                                                                                                            
                                                                                                                                            
   TypeName: System.Object                                                                                                                  
                                                                                                                                            
Name        MemberType   Definition                                                                                                         
----        ----------   ----------                                                                                                         
Equals      Method       System.Boolean Equals(Object obj)                                                                                  
GetHashCode Method       System.Int32 GetHashCode()                                                                                         
GetType     Method       System.Type GetType()                                                                                              
ToString    Method       System.String ToString()                                                                                           
name        NoteProperty System.String name=Jack                                                                                            
                                                                                                                                            
                                                                                                                                            
[PoSH]> gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {"New-Object"}{ $_.split('=')}{"Output object"}}                                
New-Object                                                                                                                                  
Name                                                                                                                                        
Jack                                                                                                                                        
Address                                                                                                                                     
5 XYZ Drive                                                                                                                                 
Phone                                                                                                                                       
555-4567                                                                                                                                    
Output object                                                                                                                               
New-Object                                                                                                                                  
Name                                                                                                                                        
Jill                                                                                                                                        
Address                                                                                                                                     
7 XYZ Drive                                                                                                                                 
Phone                                                                                                                                       
555-6547                                                                                                                                    
Output object                                                                                                                               
New-Object                                                                                                                                  
Name                                                                                                                                        
Jake                                                                                                                                        
Address                                                                                                                                     
9 XYZ Drive                                                                                                                                 
Phone                                                                                                                                       
555-9876                                                                                                                                    
Output object                                                                                                                               
[PoSH]>                                                                                                    

but after that I make a new empty Object and add custom properties for every field on a line, I use the Begin and End blocks of a foreach-object loop to create a new object for every line and after that to output it.

Now we have the inputfile as a collection of objects we can just use the export-Csv Cmdlet provided in PowerShell to export it to CSV  

[PoSH]> gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {$r = new-object object} {$r | add-Member -memberType noteProperty -name $_.spli
t('=')[0] -Value $_.split('=')[1]}{$r}} | export-csv                                                                                        
                                                                                                                                            
cmdlet export-csv at command pipeline position 3                                                                                            
Supply values for the following parameters:                                                                                                 
Path: tab.csv                                                                                                                               
[PoSH]> gc tab.csv                                                                                                                          
#TYPE System.Object                                                                                                                         
Name,Address,Phone                                                                                                                          
Jack,"5 XYZ Drive",555-4567                                                                                                                 
Jill,"7 XYZ Drive",555-6547                                                                                                                 
Jake,"9 XYZ Drive",555-9876                                                                                                                 
[PoSH]> gc tab.txt |% {$_.split("`t") |? {$_ -match '='} |% {$r = new-object object} {$r | add-Member -memberType noteProperty -name $_.spli
t('=')[0] -Value $_.split('=')[1]}{$r}} | export-csv tab.csv -NoTypeInformation                                                             
[PoSH]> gc tab.csv                                                                                                                          
Name,Address,Phone                                                                                                                          
Jack,"5 XYZ Drive",555-4567                                                                                                                 
Jill,"7 XYZ Drive",555-6547                                                                                                                 
Jake,"9 XYZ Drive",555-9876                                                                                                                 
[PoSH]> Import-Csv tab.csv                                                                                                                  
                                                                                                                                            
Name                                           Address                                        Phone                                         
----                                           -------                                        -----                                         
Jack                                           5 XYZ Drive                                    555-4567                                      
Jill                                           7 XYZ Drive                                    555-6547                                      
Jake                                           9 XYZ Drive                                    555-9876                                      
                                                                                                                                            
                                                                                                                                            
[PoSH]> Import-Csv tab.csv | format-list name                                                                                               
                                                                                                                                            
                                                                                                                                            
Name : Jack                                                                                                                                 
                                                                                                                                            
Name : Jill                                                                                                                                 
                                                                                                                                            
Name : Jake                                                                                                                                 
                                                                                                                                            
                                                                                                                                            
                                                                                                                                            
[PoSH]>                                                                                                    

 

  so now you have it in CSV format and as CSV support in PowerShell is great you can get it back into an Object to work with it very easy.

you can see this method of first convering input to objects is very powerfull, not as hard as it seems at first, is consistant, very quick and works for many situation so there is a lot of Return on Investment in learning it. 

At any rate, it’s always a treat to watch your PowerShell mow down the opposition. He blew his first oneliner right by several REPL loops, and completely combined several more different task with his flexible Pipeline.

Enjoy,

 

Greetings /\/\o\/\/ 

Published Saturday, March 24, 2007 5:45 AM by admin

Comments

# re: Hey PowerShell Guy !,How Can I Parse a Tab-Delimited File and Then Save That as a Comma-Separated Values File?

I wrote A Import-Delimited Cmdlet to solve this problem since I seam to be doing it all the time.

Wednesday, June 13, 2007 8:04 AM by Adam

# re: Hey PowerShell Guy !,How Can I Parse a Tab-Delimited File and Then Save That as a Comma-Separated Values File?

How do you output multiple variables to the same csv file?  I've been trying and I don't see an append option and it keeps overwriting what I output.

Thursday, September 27, 2007 2:54 PM by ambersed

# re: Hey PowerShell Guy !,How Can I Parse a Tab-Delimited File and Then Save That as a Comma-Separated Values File?

i have a newbie question :

I want to read a flat ASCII text file

filename  \\UNC_path\filename.txt

field_1  position 1 thru 5

space

field_2   position 7 thru 12

space

field_3   position 14 thru 25

end of record

read file

make each field into a XML node

output as an xml file

Thursday, January 10, 2008 9:29 AM by yitzstokes

# re: Hey PowerShell Guy !,How Can I Parse a Tab-Delimited File and Then Save That as a Comma-Separated Values File?

Re: Fixed position import by: yitzstokes

"I want to read a flat ASCII text file

filename  \\UNC_path\filename.txt..."

I used powershell and entered

[POS]>"bob"|gm

...

Substring...

I am betting you could use Substring instead of Split to accomplish your goals.

-Brooke

Monday, February 04, 2008 8:50 PM by bhedrick
Anonymous comments are disabled