BinShred - Parsing Arbitrary Binary Data in PowerShell
When working with raw binary data (especially in security forensics), it is common to need to write parsers for this binary data. For example, extracting file contents out of the NTFS data structures on disk. For many common data structures, there are already binary parsers written for them that you can leverage, but you’ll still sometimes need to write your own.
BinShred is a PowerShell module that lets you do this.
BinShred uses a custom parsing language called a BinShred Template (.bst). Unlike the code-heavy templates used by things like 010 Editor, this grammar (implemented in ANTLR) is designed to be as close as possible to the language that people actually use when describing file formats informally.
You can install it from the PowerShell Gallery:
Install-Module –Name BinShred
For a full treatment of how to write these binary parsers, please see the included help topic. However, here’s a very simple example:
Consider a simple example of the following binary content:
PS C:\> Format-Hex words.bin
Path: C:\words.bin
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
00000000 4C 48 02 00 00 00 05 00 00 00 48 65 6C 6C 6F 05 LH........Hello.
00000010 00 00 00 57 6F 72 6C 64 ...World
From either documentation or investigation, we've determined that the file
format has two main portions: a header, followed by a list of words. The
header itself has 2 bytes in ASCII as the magic signature, followed by an
integer representing the number count of the number of words. After that,
each word entry has an integer representing the word length, followed by a
word (of that length) in UTF8.
A BinShred Template (.bst) for this file looks like this:
header :
magic (2 bytes as ASCII)
wordCount (4 bytes as UINT32)
words (wordCount items);
words :
wordLength (4 bytes as UINT32)
word (wordLength bytes as UTF8);
Regions are identified as words followed by a colon. Within a region, you
identify properties by writing their property names followed by the length
and data type of that property. A semicolon identifies the end of a region.
When you supply this template to the ConvertFrom-BinaryData cmdlet, the resulting
object represents the data structures contained in that binary file as
objects.
PS > binshred -Path .\words.bin -TemplatePath .\wordParser.bst
Name Value
---- -----
magic LH
wordCount 2
words (...)
PS > (binshred -Path .\words.bin -TemplatePath .\wordParser.bst).Words[0]
Name Value
---- -----
wordLength 5
word Hello
While BinShred is capable of processing fairly complicated binary formats (such as the BMP example above,) you will likely run into data structures that require much more advanced parsing logic. For these, be sure to check out Kaitai Struct (https://kaitai.io/)), which is a very robust binary parsing engine. While it does not support binary parsing via PowerShell, it is possible to compile file format parsers one-at-a-time into C# files, which you can then load into PowerShell and use.