Buffer Objects
For a long time, JavaScript was lacking support for handling arrays of binary data. That changed with ECMAScript 2015, when typed arrays were introduced and allowed better handling of those cases.
All these types descend from an abstract TypedArray class, each one specializing in specific integer word sizes and signing, and some providing floating-point support. Float64Array, for example, represents an array of 64-bit floating-point numbers; Int16Array represents an array of 16-bit signed integers.
Of all typed arrays, the most useful probably is Uint8Array. It allows handling binary data such as image files, networking protocols, and video streaming in a way that is similar to how it's done in other programming languages.
In the Node.js world, there's another class, Buffer, that descends from Uint8Array. It adds some utility methods like concat(), allocUnsafe(), and compare(), and methods to read and write various numeric types, mimicking what's offered by the DataView class (eg: readUint16BE, writeDoubleLE, and swap32).
Text Encoding
The subject of encoding text characters using computers is vast and complex and I'm not going to delve deeply into it in this short article.
Conceptually, text is composed of characters: the smallest pieces of meaningful content - usually letters, but depending on the language there may be other, very different symbols. Then each character is assigned a code point: a number that identifies it in a one-to-one relationship.
Finally, each code point may be represented by a sequence of bits according to an encoding. An encoding is a table that maps each code point to its corresponding bit string.
It may look like it's a simple mapping, but several encodings were created in the last decades to accommodate many computer architectures and different character sets.
The most widespread encoding is ASCII.
It includes the basic Latin characters (A-Z, upper and lower case), digits, and many punctuation symbols, besides some control characters. ASCII is a 7-bit encoding, but computers store data in 8-bit blocks called bytes, so many 8-bit encodings were created, extending the first 128 ASCII characters with another 128 language-specific characters.
A very common encoding is ISO-8859-1, also known by several other names like "latin1", "IBM819", "CP819", and even "WE8ISO8859P1". ISO-8859-1 adds some extra symbols to ASCII and many accented letters like Á, Ú, È, and Õ.
Those language-specific encodings worked reasonably well for some situations but failed miserably in multi-language contexts.
Losing information and data corruption was common because programs tried to manipulate text using the incorrect encoding for a file.
Even showing the file content was a mess, because text files don't include the name of the encoding that was used to encode its content.
Trying to fix that, the Unicode standard was started in the 1980s. Their objective is to map every currently used character in every human language (and some historic scripts) to a single code point.
The Unicode standard also defines a number of generic encodings that are able to encode every Unicode code point.
The most common encodings for Unicode are UTF-16, UCS-2, UTF-32, and UTF-8. Except for UCS-2, which uses a fixed width for all code points (thus preventing it to represent all Unicode characters), all of these encodings use a variable number of bits to encode each code point.
On the web, the most widely used Unicode encoding is UTF-8, because it's reasonably efficient for many uses. It uses 1 byte for the ASCII characters, and 2 bytes for most European and Middle-East scripts - but it's less efficient for Asian scripts, requiring 3 bytes or more. For example, the French saying "Il vaut mieux prévenir que guérir" is equivalent to the following sequence of bytes when it's encoded using UTF-8 (the bytes are shown in hexadecimal):
I | l | v | a | u | t | m | i | e | u | x | p | r | é | ||||
49 | 6c | 20 | 76 | 61 | 75 | 74 | 20 | 6d | 69 | 65 | 75 | 78 | 20 | 70 | 72 | c3 | a9 |
v | e | n | i | r | q | u | e | g | u | é | r | i | r | |||
76 | 65 | 6e | 69 | 72 | 20 | 71 | 75 | 65 | 20 | 67 | 75 | c3 | a9 | 72 | 69 | 72 |
Convert Buffer to String Node.js
Having said all that, the content of a Buffer may be quickly converted to a String using the toString() method:
const b = Buffer.from([101, 120, 97, 109, 112, 108, 101]);
console.log(b.toString()); // example
Remember that speech about text encodings? Well, the Buffer class uses UTF-8 by default when converting to/from strings, but you can also choose another one from a small set of supported encodings:
const b = Buffer.from([101, 120, 97, 109, 112, 108, 101]);
console.log(b.toString('latin1')); // example
Most of the time, UTF-8 is the best option both for reading and writing. But for completeness, here is the full list of supported encodings in Node.js (as of September/2021) - the names are not case sensitive:
Encoding | Accepted aliases |
ascii | |
base64 | |
base64url | |
hex | |
latin1 | binary |
ucs2 | ucs-2 |
utf8 | utf-8 |
utf16le | utf-16le |
Convert Node.js String to Buffer
It is also possible to convert data in the opposite direction. Starting from a string, you can create a new Buffer from it (if the encoding is not specified, Node.js assumes UTF-8):
const s = Buffer.from('example', 'utf8');
console.log(s); // <Buffer 65 78 61 6d 70 6c 65>
If you need to write text to an existing Buffer object, you can use its write() method:
const b = Buffer.alloc(10);
console.log(b);
// <Buffer 00 00 00 00 00 00 00 00 00 00>
b.write('example', 'utf8');
console.log(b);
// <Buffer 65 78 61 6d 70 6c 65 00 00 00>
You can even set the starting position (offset):
const b = Buffer.alloc(10);
b.write('test', 4, 'utf8');
console.log(b);
// <Buffer 00 00 00 00 74 65 73 74 00 00>
Conclusion
In a nutshell, it's easy to convert a Buffer object to a string using the toString() method. You'll usually want the default UTF-8 encoding, but it's possible to indicate a different encoding if needed. To convert from a string to a Buffer object, use the static Buffer.from() method - again optionally passing the encoding.
If you're a Node.js developer interested in advancing your knowledge, add these posts to your reading list:
- Node.js Architecture (common problems and how to fix them)
- Queues in Node.js (all types explained)
- Containerizing Node.js Applications With Docker (a practical guide)
- Node.js Discord Bot (how to create one)