Categories


Archives


Recent Posts


Categories


The Challenges of Asynchronous Grammar

astorm

Frustrated by Magento? Then you’ll love Commerce Bug, the must have debugging extension for anyone using Magento. Whether you’re just starting out or you’re a seasoned pro, Commerce Bug will save you and your team hours everyday. Grab a copy and start working with Magento instead of against it.

Updated for Magento 2! No Frills Magento Layout is the only Magento front end book you'll ever need. Get your copy today!

I put it off for as long as I could, but “modern” javascript finally caught up with me. I’ve been working with a NodeJS codebase that uses asynchronous programming patterns, and I’ve had to relearn a lot of the basic techniques I use to understand what-the-heck an application/program is doing.

I realize these are debates and conversations the javascript communities have been having for nearly a decade, but I’m not writing for them. I’m writing for me, and perhaps dear reader, for you, in case you find yourself stumbling down these same corridors.

Functions as Variables

The first hurdle a programmer will run into with “async” is the idea that a language can treat functions like it does any other data. When you start learning to program, you have variables that you pass to functions. They’re two separate things. Unless you had a very specific sort of computer science education, the idea that a function itself could be a variable is non-obvious.

Once you’ve grasped this concept, some bigger ideas start to fall into place. If you’ve done a significant amount of jQuery programming, you’ve probably seen or used code that looks like this.

var result = jQuery.map(['a','b','c'], function(item){
    console.log("Called callback function for individual item in array: " + item);
    return '#' + item;
});
console.log("And here's the result");
console.log(result);

The map function allows you to change the value in each element of an array. Its first argument is the array you want to change. The second argument is a function. This is the “functions as variable” we talked about earlier. When you use a function as an argument, it’s often called a callback function.

The underlying code in map will call this function once for each value in the array. The return value of this function will become the new value of the array element. The jQuery.map method returns the new array.

The above code produces output that looks something like this.

Called callback function for individual item in array: a
Called callback function for individual item in array: b
Called callback function for individual item in array: c

And here's the result
Array(3) [ "#a", "#b", "#c" ]

i.e. our program calls the callback three times, and the map function returns an array where each element is the return value from one of those callback calls

Async Callbacks

The second async hurdle is understanding the fundamental idea behind async — namely that your code isn’t always going to run in the order you write it in. Consider this imaginary three function program

doSomethingImportant();
doSomethingImportantButSlow();
returnOutputToUser();
//program exits

Three function calls — the first one does something, the second one does something but is slow to execute, and the final one returns output to the user. As written, the program will need to wait for the slow function before it can send its output.

In async programming, we defer the running of doSomethingImportantButSlow until after the output is returned to the user.

doSomethingImportant();
setTimeout(doSomethingImportantButSlow, 0);
returnOutputToUser();
//program exits

By passing the slow function to setTimeout, our main program is able to get to returnOutputToUser almost immediately. Behind the scenes javascript schedules doSomethingImportantButSlow to run via the event loop (which is outside the scope of this article).

Asynchronous programming doesn’t reduce the execution time of your program. Instead it just defers slower code, and ideally defers it to a point in your program where users don’t experience any slowdown/blocking waiting for your code to finish. It’s a similar concept to using job queues to defer long running work in traditional MVC applications.

The trick with embracing asynchronous programming is you never know what’s going to be the slow part of your program, so some proponents recommend you make as much as possible asynchronous, and design your APIs accordingly. Whenever you use a callback in a function you need to be ready for the underlying library to schedule that callback via setTimeout (or the related setImmediate).

Async Map

Let’s take a look at the above map call, but implemented in an asynchronous way. The NodeJS package async has a huge collection of tools for programming in an asynchronous manner, including a version of map.

First, we’ll install the async package. We’re going to use version 2.6.2 — the 3.0 version shipped less than a month ago and I’m still wrapping my head around some of the things it’s changed.

$ mkdir working-folder
$ cd working-folder
$ npm install async@2.6.2

and then we’ll write a version of our map program above using the map method

// File: hello-async.js
const async = require('async');
var result = async.map(['a','b','c'], function(item){
    console.log("Called callback function for individual item in array: " + item);
    return '#' + item;
});
console.log("And here's the return value of async.map");
console.log(result);

If we run the program, we’re in for a surprise

$ node hello-async.js
Called callback function for individual item in array: a
Called callback function for individual item in array: b
Called callback function for individual item in array: c
And here's the return value of async.map
undefined

We can see the output from all our console.log calls — but the result variable isn’t defined. That’s because async.map doesn’t return the new array.

So how do we get our returned array? It turns out the async.map method accepts a third argument, which is also a callback function. From the async docs

A callback which is called when all iteratee functions have finished, or an error occurs. Results is an Array of the transformed items from the coll. Invoked with (err, results).

So let’s try using this callback. If we change our program to add a third argument.

// File: hello-async.js
const async = require('async');
var result = async.map(
    ['a','b','c'],
    function(item){
        console.log("Called callback function for individual item in array: " + item);
        return '#' + item;
    },
    // our new function
    function(err, results) {
        console.log("An error?");
        console.log(err);
        console.log("Our Results?");
        console.log(results);
    }
);
console.log("And here's the return value of async.map");
console.log(result);

and look at the output …

$ node hello-async.js
Called callback function for individual item in array: a
Called callback function for individual item in array: b
Called callback function for individual item in array: c
And here's the result
undefined

Huh. The second callback wasn’t called. We’re still stuck.

Continuation Passing

It turns out that, unlike jQuery’s map (or PHP’s array_map function, etc.), the first callback to map, (the one that does the work and the async docs call iteratee), should not return a result. Returning from this function won’t result in an error, but it also won’t perform the mapping functionality we want.

Instead — in addition to the array element, the underlying async library will pass the iteratee function a second argument that’s another callback function. Instead of returning a value from our callback, we need to call this function with our value.

function(item, callback){
    console.log("Called callback function for individual item in array: " + item);
    // not like this
    // return '#' + item;

    // but like this
    callback(null, '#' + item);
},

The second argument to this function should be the value you want to map into your array. You could use the first argument, to indicate an error happened. Passing null signals that everything went fine and there were no errors.

This style of programming — using other functions to communicate a function’s result, is sometimes called continuation passing.

Let’s take a look at our full program using the correct syntax.

// File: hello-async.js
const async = require('async');
var result = async.map(
    ['a','b','c'],
    function(item, callback){
        console.log("Called callback function for individual item in array: " + item);
        // not like this
        // return '#' + item;

        // but like this
        callback(null, '#' + item);
    },
    function(err, results) {
        console.log("An error?");
        console.log(err);
        console.log("Our Results?");
        console.log(results);
    }
);
console.log("And here's the return value of async.map");
console.log(result);

If we try running this program, we’ll see the following output.

Called callback function for individual item in array: a
Called callback function for individual item in array: b
Called callback function for individual item in array: c
An error?
null
Our Results?
[ '#a', '#b', '#c' ]
And here's the return value of async.map
undefined

Hooray! Our result callback received the modified/mapped array.

While there’s a lot more to explore with even this simple program, we’ll close with one final example. Consider our program’s output if the work inside our iteratee callback is, itself, pushed into async space.

function(item, callback){
    setTimeout(function(){
        console.log("Called callback function for individual item in array: " + item);
        // not like this
        // return '#' + item;

        // but like this
        callback(null, '#' + item);
    }, 0);
}

With this callback in place, our program output would look like the following.

And here's the return value of async.map
undefined
Called callback function for individual item in array: a
Called callback function for individual item in array: b
Called callback function for individual item in array: c
An error?
null
Our Results?
[ '#a', '#b', '#c' ]

That is, we see the final line of logging (And here’s the return value of async.map) first, and it’s only once the main execution finishes that all our callback code runs.

While the patterns in the async package are a little more complicated that you might be used to, it’s situations like these that they’re designed to handle.

The Continuation Passing Grammar Problem

There’s a lot to critique in async programming. It makes your program’s execution harder to follow, complex logic can lead to many layers of nested callbacks, and things that seem guaranteed and deterministic might not not be.

For me though, having spent the past month doing my first real “deep dive” into an asynchronous system, I think the biggest change-in-thinking that continuation style async programming requires is the switch from return values to callbacks.

In most programming languages — a function returns a value. You can rely on this — the syntaxes across languages are mostly the same

# Perl
sub someSubRoutine() {
    //...
    return $value;
}

# PHP
function someFunction() {
    //...
    return $value;
}

# C
int someFunction() {
    //...
    return value;
}

# python
def someFunction():
    #...
    return value

# ruby
def someFunction
    //...
    value;  // ruby returns the last evaluated expression --
            // no need for a return.  So beautiful. Perl can
            // do this as well

//etc...

With continuation passing, at least as practiced in javascript, you’ve swapped a return statement for a function call

var someFunction = function(param1, param2, callback) {
    //...
    callback(null, value);
}

someOtherFunction(someFunction);

By itself, not a big deal. Except — the grammar/semantics of how to use your callback is subject to the whims of the folks implementing the library you’re using.

var someOtherFunction(f) {
    //...
    var callback = function(err, result) {
        if(err) {
            // error handling code
            return;
        }
        // value handling code
    }
    f(param1, param2, callback);
}

In the above example — it’s the someOtherFunction function that decides how the callback function works. The convention of an error being the first parameter and the value being the second parameter is subject to the whims of library and program authors instead of being something that’s built into the language.

This means when you’re exploring a new codebase you can never be certain of what a callback’s purpose is, which, fair enough, that’s programming. But we’re replacing one relatively simple thing that requires us to understand basic programming principles (returning a value) to one that requires us to have a full understanding of the application libraries we’re using before being able to line-read the code.

In full fairness, the NodeJS community has a done a lot to promote the use of error first callbacks and this can be an elegant pattern. However, it seems like a pattern that’s best left to framework/systems code vs. being a pattern intended for main application flow.

Series NavigationPromises: A Better Asynchronous Grammar >>

Copyright © Alan Storm 1975 – 2019 All Rights Reserved

Originally Posted: 17th June 2019