Add to Technorati Favorites
Welcome to ThePowerShellGuy.com Sign in | Join | Help

Hey PowerShell, how popular is this Baby Name ?

When I did see this blogpost : Baby Names, Nameless Keys, and Mumbling , I also decided to "scrape" the Social Security Administration's Popular Baby Names site

I already did some webpage scraping like this in PowerShell : generate a random list of surnames . (Yes,Yes breaking rule seven ;-) )

but here we do a bot more processing, and make easy to work with the data we did scrape of the webpage.

While going on from the technique from that post, as this time the layout of the table is a bit different, also I get some more columns and convert it to an array of Objects so we can query easy on the data in PowerShell and also be able to export the data to other applications.

The Table on the Webpage looks like this :

 

After the text scraping and processing we can use it in PowerShell like this :

 

PoSH> $list | where {$_.name -eq 'Marc'}                                                                                
                                                                                                                        
                                                                                                                        
Name    : Marc                                                                                                          
Count   : 6335                                                                                                          
Percent : 0.0507 %                                                                                                      
Sex     : Male                                                                                                          
Rank    : 289                                                                                                           
                                                                                                                        
                                                                                                                        
                                                                                                                        
PoSH> $list | where {$_.rank -eq 1}                                                                                     
                                                                                                                        
                                                                                                                        
Name    : Jacob                                                                                                         
Count   : 179896                                                                                                        
Percent : 1.4409 %                                                                                                      
Sex     : Male                                                                                                          
Rank    : 1                                                                                                             
                                                                                                                        
Name    : Emily                                                                                                         
Count   : 149420                                                                                                        
Percent : 1.2525 %                                                                                                      
Sex     : Female                                                                                                        
Rank    : 1                                                                                                             
                                                                                                                        
                                                                                                                        
                                                                                                                        
PoSH> $list | where {$_.Percent -gt 1} | Format-Table -AutoSize                                                         
                                                                                                                        
Name         Count Percent  Sex    Rank                                                                                 
----         ----- -------  ---    ----                                                                                 
Jacob       179896 1.4409 % Male      1                                                                                 
Michael     165257 1.3236 % Male      2                                                                                 
Joshua      151094 1.2102 % Male      3                                                                                 
Emily       149420 1.2525 % Female    1                                                                                 
Matthew     148038 1.1857 % Male      4                                                                                 
Andrew      131862 1.0562 % Male      5                                                                                 
Christopher 129095 1.0340 % Male      6                                                                                 
Joseph      126394 1.0124 % Male      7                                                                                 
Daniel      125929 1.0086 % Male      8                                                                                 
Madison     123729 1.0372 % Female    2                                                                                 
                                                                                                                        
                                                                                                                        
PoSH> $list | Export-Csv -not nameslist.csv ; ii nameslist.csv                                                          
PoSH>                                               

 And export to Csv to do some Excel analyzing :

 

Ok, how does this work :

First we get the Source of the Page as HTML :

 

$decade = 2000

$wc = new-Object System.Net.WebClient
$nl = $wc.DownloadString("http://www.ssa.gov/OACT/babynames/decades/names$($decade)s.html"

 

Extract the data

As the data in the Table on the website is a but Awkward formatted ( Male and Female in one row ) we need to do some processing, to keep the Regex simple I decided to just grab all the name,percent and score cells, in one long array :

 

# Process list

$r = [regex]'="15%">(.*?)</td>'
$m = $r.Matches($nl)

As you can see here, this is not realy usefull yet, as it is not realy structured :

 

PoSH> $r = [regex]'="15%">(.*?)</td>'                                                                                   
PoSH> $m = $r.Matches($nl)                                                                                              
PoSH> $m.count                                                                                                          
6000                                                                                                                    
PoSH> $m[0]                                                                                                             
                                                                                                                        
                                                                                                                        
Groups   : {="15%">Jacob</td>, Jacob}                                                                                   
Success  : True                                                                                                         
Captures : {="15%">Jacob</td>}                                                                                          
Index    : 7697                                                                                                         
Length   : 17                                                                                                           
Value    : ="15%">Jacob</td>                                                                                            
                                                                                                                        
                                                                                                                        
                                                                                                                        
PoSH> $m[0..10] |% {$_.groups[1].value}                                                                                 
Jacob                                                                                                                   
179,896                                                                                                                 
1.4409                                                                                                                  
Emily                                                                                                                   
149,420                                                                                                                 
1.2525                                                                                                                  
Michael                                                                                                                 
165,257                                                                                                                 
1.3236                                                                                                                  
Madison                                                                                                                 
123,729                                                                                                                 
PoSH>                                             

 

 So we need to do some more processing,

For this we loop trough the array make new objects, and adding the properties using add-member, as we know the order of the objects (Male name, Count, Percent,Female name,Count,Percent) I first get the Male name and  use the MoveNext() Method of the $foreach variable, that is provided in a Foreach loop, here to skip iterations of the loop .

Also note that I convert the Count to an Integer (to be able to do a good sort on it later ) , and format the Percent at 4 digits behind comma.

 

# Convert to arrray of Objects

$script:list = @()

Foreach ($i in 0..($m.count -1) ) {

  # Male

  $Record = new-Object -typename System.Object
  $Record | add-Member -memberType noteProperty -name Name -Value $m[$i].groups[1].value
  [void] $foreach.MoveNext()
  $Record | add-Member -memberType noteProperty -name Count -Value ([int]($m[$foreach.current].groups[1].value))
  [void] $foreach.MoveNext()
  $Record | add-Member -memberType noteProperty -name Percent -Value ("{0:p4}" -f (([double]$m[$foreach.current].groups[1].value) / 100))
  $Record | add-Member -memberType noteProperty -name Sex -Value Male
  $script:list += $record

  # Female

  [void] $foreach.MoveNext()
  $Record = new-Object -typename System.Object
  $Record | add-Member -memberType noteProperty -name Name -Value $m[$foreach.current].groups[1].value
  [void] $foreach.MoveNext()
  $Record | add-Member -memberType noteProperty -name Count -Value ([int]($m[$foreach.current].groups[1].value))
  [void] $foreach.MoveNext()
  $Record | add-Member -memberType noteProperty -name Percent -Value ("{0:p4}" -f (([double]$m[$foreach.current].groups[1].value) / 100))
  $Record | add-Member -memberType noteProperty -name Sex -Value Female
  $script:list += $record

}

 

Allready does look better ;-) :

 

PoSH> $list | select -First 5                                                                                           
                                                                                                                        
Name                                                  Count Percent                       Sex                           
----                                                  ----- -------                       ---                           
Jacob                                                179896 1.4409 %                      Male                          
Emily                                                149420 1.2525 %                      Female                        
Michael                                              165257 1.3236 %                      Male                          
Madison                                              123729 1.0372 %                      Female                        
Joshua                                               151094 1.2102 %                      Male                          
                                                                                                                        
                                                                                                                        
PoSH>                                              

Add Ranking :

As we lost the ranking during this proces we are going to re-generate that by sorting the list and adding the Rank Property :

 

# Sort 

$list = $list | sort -desc count

# Add rank

$list | where {$_.Sex -eq 'Male'} |% {$rank = 0} {$rank++;$_ | add-Member -memberType noteProperty -name Rank -Value $rank}
$list | where {$_.Sex -eq 'Female'} |% {$rank = 0} {$rank++;$_ | add-Member -memberType noteProperty -name Rank -Value $rank}

 

 And then we are done and can use this list to do some Babyname statistics ;-)

 and you can see how high your name was in this list.

I hope this post showed how you can convert unstructured data into structured data in PowerShell and how easy it is after that to work with it in PowerShell and to Export this data.

I added the complete code also as an attachement to this post.

Enjoy,

Greetings /\/\o\/\/

Published Tuesday, February 13, 2007 2:24 PM by admin
Attachment(s): BabyNames.ps1

Comments

# Scripting Games 2007 Advanced PowerShell Scores list

To check the Scores in The Scripting Games 2007, I made the following script for the Advanced PowerShell

Monday, February 19, 2007 5:19 PM by The PowerShell Guy

# Baby name creator

I have gone through the information in your website, its really awesome. I got a website which is related to yours. Here I am providing the links regarding that. Hope it will be helpful for you.

<a

href=http://www.babynamecreator.net>baby name</a>

Friday, February 23, 2007 3:55 PM by Timothy Johnson

# re: Hey PowerShell, how popular is this Baby Name ?

@ Timothy

Thanks,

You by-passed my Spam filter on this one ;-)

Greetings /\/\o\/\/

Friday, February 23, 2007 6:23 PM by MoW

# Re: Array or Database or ??

I have an example here of creating your own object: [link] I was mostly inspired by this post: [link]

Thursday, August 02, 2007 8:53 PM by Latest Newsgroup Posts

# re: Hey PowerShell, how popular is this Baby Name ?

Hey /\/\o\/\/

I was wondering how you would combine two existing object properties into one (in Powershell)? I.E. $_.firstname + $_.lastname = $_.fullname

Do you need to you the add-member cmdlet, with [string]::Join?

Thursday, July 31, 2008 9:31 AM by n1ckml007

# re: Hey PowerShell, how popular is this Baby Name ?

you can use select and custom properties for this.

$user | select @{name='fullName';e={$_.first + $_.last}}

or you can use scriptproperty

Greeetings /\/\o\/\/

Friday, August 01, 2008 1:43 PM by MoW

# re: Hey PowerShell, how popular is this Baby Name ?

n00b questions, what does "e=" do in PoSH?

Also I realized that add-member wouldn't work unless I created a new object. As I'm working with a dataset table object. I figured out how to create the new column:

$full_name = new-object "System.Data.DataColumn" ("full_name", [string])

$ds.Tables[0].Columns.add($full_name)

how would I populate the column / rows with the value of the $_.lastname, $firstname properties (columns)?

Friday, August 01, 2008 3:24 PM by n1ckml007

# re: Hey PowerShell, how popular is this Baby Name ?

nevermind, I now understand your select-object command. E is an alias for expression.

I ended up making the change in the SQL query to have it rename a column using "SELECT" blah "AS" renamed blah

Monday, August 11, 2008 3:24 PM by n1ckml007

# Converting Website Data Into PowerShell Objects

Yesterday, I stumbled over an excellent blog post written in 2007 by MOW, a good friend and PowerShell

Thursday, November 27, 2008 8:15 AM by Dreaming in PowerShell

# Converting Website Data Into PowerShell Objects

Yesterday, I stumbled over an excellent blog post written in 2007 by MOW, a good friend and PowerShell

Thursday, November 27, 2008 8:22 AM by Dreaming in PowerShell
Anonymous comments are disabled