Add to Technorati Favorites
Welcome to ThePowerShellGuy.com Sign in | Join | Help

Hey, PowerShell Guy ! How Can I Tally Up All the Words Found in a Text File?

And an other Hey Scripting guy ! translation to PowerShell 

How Can I Tally Up All the Words Found in a Text File?

Hey, Scripting Guy! While browsing the Internet I found a script that showed me how to get a list of all the unique words in a text file. That’s useful, but I’d like to go one step further: how can I determine the number of times each of those words occurs??

In PowerShell those 2 articles are very simple to do,we also use the split to get an array of words, already a bit easer on PowerShell

but after that we have the great PowerShell object tools as select, group and sort so from there on it is a .. piece of "Cake-Meel" :

 

$s = "I saw the cat. The cat was black."

# Remove Chars
 
",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace($_,' ')}

# Array of words, spaces removed

$w = $s.Split() |? {$_.Length -gt 0 }

# Unique words

$w | select -Unique

# Tally

$w | group

# Sort and format

$W | group | sort name | ft name,count -AutoSize

 

You can see for the count we just can switch for select -unique to group :

This looks like this :

 

PoSH> $s = "I saw the cat. The cat was black."                                                                          
PoSH>                                                                                                                   
PoSH> # Remove Chars                                                                                                    
PoSH>                                                                                                                   
PoSH> ",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace($_,' ')}                                             
PoSH> $w = $s.Split() |? {$_.Length -gt 0 }                                                                             
PoSH>                                                                                                                   
PoSH> # Unique words                                                                                                    
PoSH>                                                                                                                   
PoSH> $w | select -Unique                                                                                               
I                                                                                                                       
saw                                                                                                                     
the                                                                                                                     
cat                                                                                                                     
The                                                                                                                     
was                                                                                                                     
black                                                                                                                   
PoSH>                                                                                                                   
PoSH> # Tally                                                                                                           
PoSH>                                                                                                                   
PoSH> $w | group                                                                                                        
                                                                                                                        
Count Name                      Group                                                                                   
----- ----                      -----                                                                                   
    1 I                         {I}                                                                                     
    1 saw                       {saw}                                                                                   
    2 the                       {the, The}                                                                              
    2 cat                       {cat, cat}                                                                              
    1 was                       {was}                                                                                   
    1 black                     {black}                                                                                 
                                                                                                                        
                                                                                                                        
PoSH>                                                                                                                   
PoSH> # Sort and format                                                                                                 
PoSH>                                                                                                                   
PoSH> $W | group | sort name | ft name,count -AutoSize                                                                  
                                                                                                                        
Name  Count                                                                                                             
----  -----                                                                                                             
black     1                                                                                                             
cat       2                                                                                                             
I         1                                                                                                             
saw       1                                                                                                             
the       2                                                                                                             
was       1                                                                                                             
                                                                                                                        
                                                                                                                        
PoSH>                                                 

 

Does almost look too simpe .. Not ?

Also this account for The and the being the same word .

 

Enjoy,

Greetings /\/\o\/\/

Published Tuesday, February 27, 2007 2:23 PM by admin
Filed under: ,

Comments

# re: Hey, PowerShell Guy ! How Can I Tally Up All the Words Found in a Text File?

Man, it is tough trying to Google characters! :)

I was wondering what |? means?

Wednesday, February 28, 2007 3:26 PM by LonerVamp

# re: Hey, PowerShell Guy ! How Can I Tally Up All the Words Found in a Text File?

? is an alias for where-object

Greetings /\/\o\/\/

Thursday, March 01, 2007 7:18 AM by MoW

# re: Hey, PowerShell Guy ! How Can I Tally Up All the Words Found in a Text File?

I ran this on both vista and x64 and get the following error ...

PS C:\> .\wordid.ps1

Method invocation failed because [System.Object[]] doesn't contain a method named 'replace'.

At C:\wordid.ps1:5 char:61

+ ",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace( <<<< $_,' ')}

Method invocation failed because [System.Object[]] doesn't contain a method named 'replace'.

At C:\wordid.ps1:5 char:61

+ ",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace( <<<< $_,' ')}

Method invocation failed because [System.Object[]] doesn't contain a method named 'replace'.

At C:\wordid.ps1:5 char:61

+ ",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace( <<<< $_,' ')}

Method invocation failed because [System.Object[]] doesn't contain a method named 'replace'.

At C:\wordid.ps1:5 char:61

+ ",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace( <<<< $_,' ')}

Method invocation failed because [System.Object[]] doesn't contain a method named 'replace'.

At C:\wordid.ps1:5 char:61

+ ",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace( <<<< $_,' ')}

Method invocation failed because [System.Object[]] doesn't contain a method named 'replace'.

At C:\wordid.ps1:5 char:61

+ ",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace( <<<< $_,' ')}

Method invocation failed because [System.Object[]] doesn't contain a method named 'replace'.

At C:\wordid.ps1:5 char:61

+ ",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace( <<<< $_,' ')}

Method invocation failed because [System.Object[]] doesn't contain a method named 'replace'.

At C:\wordid.ps1:5 char:61

+ ",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace( <<<< $_,' ')}

Method invocation failed because [System.Object[]] doesn't contain a method named 'replace'.

At C:\wordid.ps1:5 char:61

+ ",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace( <<<< $_,' ')}

Method invocation failed because [System.Object[]] doesn't contain a method named 'replace'.

At C:\wordid.ps1:5 char:61

+ ",",".","!","?",">","<","&","*","=","`n" |% {$s = $s.replace( <<<< $_,' ')}

Method invocation failed because [System.Object[]] doesn't contain a method named 'Split'.

At C:\wordid.ps1:9 char:14

+ $w = $s.Split( <<<< ) |? {$_.Length -gt 0 }

PS C:\>

Any help is greatly appreciated ... i have need to analyze documents for duplicate content and this script looks like a tool i can use.  problem is, i am not a programmer and a novice with powershell

Thursday, March 01, 2007 11:16 AM by EdT

# re: Hey, PowerShell Guy ! How Can I Tally Up All the Words Found in a Text File?

how did you fill $s ?

it might be an arry of lines

try

$s = $s | out-string

and then the line again

Greetings /\/\o\/\/

Thursday, March 01, 2007 2:31 PM by MoW

# re: Hey, PowerShell Guy ! How Can I Tally Up All the Words Found in a Text File?

In unix there is a in-built command for this 'wc'

Friday, March 02, 2007 2:17 AM by Nikhil

# re: Hey, PowerShell Guy ! How Can I Tally Up All the Words Found in a Text File?

$SpecialChars = ",",".","!","?",">","<","&","*","=","`n",":"

$SpecialChars |% {$s = $s.replace($_,' ')}

This will solve your problem.

Friday, March 02, 2007 2:19 AM by Nikhil

# re: Hey, PowerShell Guy ! How Can I Tally Up All the Words Found in a Text File?

@Nikhil

> In Unix there is a buildin command for this WC

In PowerShell also but, this is not only a Wordcount,.

$s | measure-object -word

Greetings /\/\o\/\/

Friday, March 02, 2007 2:34 AM by MoW
Anonymous comments are disabled