Counting Lines of Source Code in PowerShell
Oren Eini recently ran into some performance problems while using PowerShell to count the number of lines in a source tree:
_I wanted to know how many lines of code NHibernate has, so I run the following PowerShell command…
(gci -Recurse | select-string . ).Count
The result:
(Graphic of PowerShell still working after about 5 minutes, using 50% of his CPU.)
Bummer.
_
The performance problem from this command comes from us preparing for a rich pipeline experience in PowerShell that you never use. With only a little more text, you could have run even more powerful reports:
Line count per path:
gci . \*.cs -Recurse | select-string . | Group Path
Min / Max / Averages:
gci . \*.cs -Recurse | select-string . | Group Filename |
Measure-Object Count -Min -Max -Average
Comment ratio:
$items = gci . \*.cs -rec
($items | select-string "//").Count / ($items | select-string .).Count
But if you don’t need that power, there are alternatives that perform better. Let’s look at some of them. We’ll use a baseline of the command that Oren started with:
[C:\temp]
PS:14 > $baseline = Measure-Command { (gci . *.cs -Recurse | Select-String .).Count }
… and a comparison to the LineCount.exe he pointed to:
PS:15> $lineCountExe = Measure-Command { C:\temp\linecount.exe *.cs /s }
PS:16 > $baseline.TotalMilliseconds / $lineCountExe.TotalMilliseconds
41.5567286307833
(The Select-String approach is about 41.5x slower)
Since we don’t need all of the PowerShell metadata generated by Select-String, and we don' t need the Regular Expression matching power of Select-String, we can instead use the [File]::ReadAllText() method from the .NET Framework:
PS:17 > $readAllText = Measure-Command { gci . *.cs -rec | % { [System.IO.File]::ReadAllText($_.FullName) } | Measure-Object -Line }
PS:18 > $readAllText.TotalMilliseconds / $lineCountExe.TotalMilliseconds
3.30927987204783
This is now about 3.3x slower – but is only 87 characters! With a PowerShell one-liner, you were able to implement an entire linecount program.
If you want to go further, you can write a linecount program yourself:
## Get-LineCount.ps1
## Count the number of lines in all C# files in (and below)
## the current directory.
function CountLines($directory)
{
$pattern = "*.cs"
$directories = [System.IO.Directory]::GetDirectories($directory)
$files = [System.IO.Directory]::GetFiles($directory, $pattern)
$lineCount = 0
foreach($file in $files)
{
$lineCount += [System.IO.File]::ReadAllText($file).Split("`n").Count
}
foreach($subdirectory in $directories)
{
$lineCount += CountLines $subdirectory
}
$lineCount
}
CountLines (Get-Location)
Now, about 2.7x slower – but in an easy to read, easy to modify format that saves you from having to open up your IDE and compiler.
PS:19 > $customScript = Measure-Command { C:\temp\Get-LineCount.ps1 }
PS:20 > $customScript.TotalMilliseconds / $lineCountExe.TotalMilliseconds
2.73733204860216
And to nip an annoying argument in the bud:
## Get-LineCount.rb
## Count the number of lines in in all C# files in (and below)
## the current directory
require 'find'
def filelines(file)
count = 0
while line = file.gets
count += 1
end
count
end
def countFile(filename)
file = File.open(filename)
totalCount = filelines(file)
file.close()
totalCount
end
totalCount = 0
files = Dir['**/*.cs']
files.each { |filename| totalCount += countFile(filename) }
puts totalCount
Which gives:
PS:21 > $rubyScript = Measure-Command { C:\temp\Get-LineCount.rb }
PS:22 > $rubyScript.TotalMilliseconds / $lineCountExe.TotalMilliseconds
3.0709602651302