Categories


Archives


Recent Posts


Categories


Inspecting Bytes with Node.js Buffer Objects

astorm

Frustrated by Magento? Then you’ll love Commerce Bug, the must have debugging extension for anyone using Magento. Whether you’re just starting out or you’re a seasoned pro, Commerce Bug will save you and your team hours everyday. Grab a copy and start working with Magento instead of against it.

Updated for Magento 2! No Frills Magento Layout is the only Magento front end book you'll ever need. Get your copy today!

This entry is part 1 of 4 in the series Text Encoding and Unicode. Later posts include Unicode vs. UTF-8, When Good Unicode Encoding Goes Bad, and PHP and Unicode.

I’ve started to really enjoy Node.js’s Buffer object for byte level examination of files.

For example — if you create a text file with a bit of unicode in it

# File: some-file.txt
Hyvä

and then write a small program that looks like this

# File: read-bytes.js

const fs = require('fs')
function main() {
    // returns a Buffer object -- bytes.toString() will
    // transform the buffer into a string
    const bytes = fs.readFileSync('/path/to/some-file.txt')
    for(const byte of bytes) {
      console.log(
        // byte will be a Number -- here we format that
        // number as binary (toString(2)) and then pad
        // out our zeros
        byte.toString(2).padStart(8, '0'),
        ' ',
        byte
      )
    }
}
main()

You’ll get a list of each byte — formated as both a binary and then a base-10 number — printed out.

% node read-bytes.js
01001000   72
01111001   121
01110110   118
11000011   195
10100100   164
00001010   10

This isn’t exactly new tech — the unix command line program hexdump can do similar things.

% hexdump -C some-file.txt

00000000  48 79 76 c3 a4 0a                                 |Hyv...|
00000006

But I never found its default formats (hexadecimal, first column is offsets, etc.) well optimized for how my brain thinks about byte streams.

It’s also possible to do this sort of thing in other programming languages — but the mechanics are a bit weird. The C/C++ primitives (or at least the 90s era primitives I used) for this are too fiddly, and even something modern like Go or Rust makes you jump through hoops which might make sense for production code, but are a burden if all I want to do is write a small program to see what a file’s actual bytes are.

Series NavigationUnicode vs. UTF-8 >>

Copyright © Alan Storm 1975 – 2021 All Rights Reserved

Originally Posted: 9th February 2021