LeoNerd's programming thoughts: 2013/12

2013/12/24

Futures advent day 24

Day 24 - Futures compared to Callbacks

It would seem at first glance that futures provide similar benefits to managing control flow by callbacks. However, they provide several advantages in comparison.

When performing a sequence of many operations using callbacks, the ever-increasing nesting nature of the callback functions leads to an ugly indenting pyramid look in the source code.

FIRST_CB( $arg1, sub {
  SECOND_CB( $arg2, sub {
    THIRD_CB( $arg3, sub {
      FINISHED()
    });
  });
});

Because futures are connected together using the return value of a function, not through a value passed into it, they can avoid this mess and remain at a fixed indentation level. This also allows, for example, a new stage to be added between existing stages without upsetting the indentation of the following code; making neater diff output in revision control systems, and giving less chance of a merge conflict when branching.

FIRST_F( $arg1 )->then(sub {
  SECOND_F( $arg2 )
})->then(sub {
  THIRD_F( $arg3 )
})->then(sub {
  FINISHED()
});

Moreover, many other shapes of control flow start to look much more like their synchronous counterparts, precisely because they are linked together using the return values out of the individual units and require no other values to be passed in.

Possibly the most simple example of concurrent control flow is a two-way merge case, where two operations are started concurrently waiting for the result of both before continuing. Using callbacks this would need to be solved by each callback storing its result in a variable they both lexically capture, and checking in each whether both results have been provided.

my $one_result; my $two_result;

ONE_CB( sub {
  $one_result = shift;
  if( defined $two_result ) {
    FINISHED($one_result, $two_result);
  }
});
TWO_CB( sub {
  $two_result = shift;
  if( defined $one_result ) {
    FINISHED($one_result, $two_result);
  }
});

Immediately two issues come to light here. First is the repeated FINISHED code - if that were itself a further chain of operations with callbacks, this would be impossible (or at least very tedious) to repeat twice, and of course gets much worse beyond two concurrent branches. Secondly, we are testing the results for definedness - maybe undef is a perfectly valid result from each function. In that case we'd have to track two further variables to simply remark whether each operation has completed:

my $one_done; my $one_result;
my $two_done; my $two_result;

ONE_CB( sub {
  $one_result = shift;
  $one_done++;
  if( $two_done ) { ... }
});
...

This example of course only handles the success case. Imagine how much more complex the code would be if each function took two code references, one for success and one for failure, and additionally returned some kind of operation ID that would be used to cancel the operation in progress if it was no longer required. This would now need eight lexically captured variables, adding much more boilerplate control-flow noise to the code. Moreover, now there are more variables being shared among code blocks, it creates the possibility that strong reference cycles remain long after the operation has finished, failed, or been cancelled that retain an object in memory long after it was required. It may end up looking something like (and keep in mind this is the most simple case of two concurrent operations and a single "afterwards"):

my $one_done; my $one_result; my $one_failed; my $one_id;
my $two_done; my $two_result; my $two_failed; my $two_id;

my $finished = sub {
  undef $one_id; undef $two_id;
  FINISHED();
};

$one_id = ONE_CB(
  sub { $one_result = shift; $one_done++;
        $finished->() if $two_done; },
  sub { $one_failed++;
        TWO_CANCEL($two_id) if !$two_done; undef $two_id;
        FAILED() },
);
$two_id = TWO_CB(
  sub { $two_result = shift; $two_done++;
        $finished->() if $one_done; },
  sub { $two_failed++;
        ONE_CANCEL($one_id) if !$one_done; undef $one_id;
        FAILED() },
);

By comparison, the Future needs_all constructor neatly wraps up all this implicit behaviour, removing the control- and data-flow noise from the code, and much more concisely expressing its intent.

Future->needs_all(
  ONE_F(), TWO_F(),
)->then(sub {
  my ( $one_result, $two_result ) = @_;
  FINISHED($one_result, $two_result);
})->else(sub {
  FAILED();
})->get;

So, there we have it. In the past 24 posts we have seen how Futures can neatly express all the various kinds of control-flow logic we typically find in a Perl program, and also express the additional shapes of code we find useful when working with asynchronous and concurrent programming. This neatness ultimately comes from the fact that a Future object is a first-class value representing the operation itself, and being first-class comes the ability to combine it with others to produce new first-class values to represent combinations of this operation with others.

Futures allow the control- and data-flow structure of a program to be inherently expressed together, describing the dependency relationships between individual operations. Both successful results and failures are automatically propagated up from the atomic units that create them, through the various layers of logic up towards the topmost level of the program. Actions in progress can be abandoned when no longer required, causing a graceful cancellation of the activity that had been pending up until that time.

Futures change state from pending to complete when they are provided with a result, meaning that when they become ready they already have the results stored in them. This makes for convenient control-flow that coincides with data-flow; ensuring that the result of an operation is passed to the next operation in the sequence at the time it is executed. This convenient pairing of control- and data-flow stands in contrast to the split nature of other kinds of concurrency control, such as callback functions or locks and mutexes, which generally only manage the flow of control and require other techniques like lexical variables shared between multiple closures to provide the data flow. Such sharing of mutable state between domains of concurrency is the source of many kinds of concurrency bug which cannot happen with Futures.

In summary, Futures provide a useful abstraction to build all kinds of program logic on top of, whether it is initially intended to be asynchronous or not. Middle-level library modules especially will benefit from using Futures to express intent and combine actions together, as they will then automatically be able to make use of asynchronous and concurrent abilities of the base layers they are built from, without having to expressly depend on those being present.

<< First | < Prev

2013/12/23

Futures advent day 23

Day 23 - Additional Benefits of Futures

Beyond simply being able replicate regular perl control-flow styles, building program logic on top of Futures has many additional benefits.

The primary benefit is of course the ability to work asynchronously, allowing the concurrency of being able to start multiple operations and wait for them all to succeed. We have seen this with the tree-forming needs_all, needs_any and wait_any constructor methods, and the fmap utility.

my $f = Future->needs_all(
  ONE(), TWO(), THREE(),
);

my ( $one, $two, $three ) = $f->get;

my $f = fmap1 {
  FUNC($_),
} foreach => [ @VALUES ], concurrent => 10;

my @results = $f->get;

Because Futures represent an operation in progress they are an ideal place to provide cancellation logic, allowing the consumer of the would-be result to abandon it and declare it no longer useful. This can be done explicitly by calling the cancel method.

sub PROCESS_REQUEST {
  my ( $req ) = @_;
  my $f = GET_RESULT( $req->PARAMS );

  $f->on_done(sub {
    $req->REPLY( @_ );
  });
  $req->ON_CLIENT_CLOSE(sub {
    $f->cancel;
  });

  return $f;
}

A failed Future provides an analog to a thrown exception, causing an entire chain or tree of operations to be abandoned and propagating back up towards the caller until a suitable error-handling block is found. In addition however, a failed Future can provide a full list of values as well as a single string. This allows error handlers to be much more fine-grained in their ability to distinguish different types of error.

my $f = GET("http://my-site-here.com/")
  ->else_with_f(sub {
    my ( $f, $failure, $op ) = @_;
    # may be           http, $request, $response
    if( $op eq "http" and $_[3]->code == 500 ) {
      say "Server is unavailable";
      return Future->new->done( $HOLDING_PAGE );
    }
    return $f;
  });

Middleware library functions can easily be built on top of basic actions implemented by futures and providing more of their own. When writing and testing sub libraries it becomes a simple matter to use these futures within the unit-tests themselves as a way to mock out responses from lower levels of logic in order to test the library code in isolation. For example, if we wish to unit-test a middleware function that uses an HTTP user agent to fetch a page, parse it, and return the page title we can provide a simple tiny user agent wrapper that just returns a new future, and does nothing else:

my $resp_f; my $url;
sub GET {
  ( $url ) = @_;
  return $resp_f = Future->new;
}
...

Our unit test can then drive the behaviour of that "user agent", as well as testing the function's results:

...
my $f = get_page_title( \&GET, "http://my-site-here.com/" );

isa_ok( $f, "Future" );
is( $url, "http://my-site-here.com/" );
ok( defined $resp_f, 'Response future created' );
is( !$f->is_ready, '$f is not yet ready' );

$resp_f->done( HTTP::Response->new(
  200, "OK", [ Content_type => "text/html" ],
  "<html><head><title>My title</title></head><body /></html>",
));

ok( $f->is_ready, '$f is now ready after HTML response' );
is( scalar $f->get, "My title", '$f->get returns title' );

Because we have a future to represent both sides of the function (its caller and the inner GET function it uses to fetch the page content) we have been able to easily test the function in the middle. The unit test script itself at various times takes on the role either of the outside caller or the inner HTTP user agent, and is able to easily interleave the two to ensure a neat test.

2013/12/24

Day 24 - Futures compared to Callbacks

2013/12/23

Day 23 - Additional Benefits of Futures

2013/12/22

Day 22 - Equivalence of Control Flow

2013/12/21

Day 21 - Implementing Timeouts

2013/12/20

Day 20 - Automatic Cancellation

2013/12/19

Day 19 - Cancellation

2013/12/18

Day 18 - Using fmap Concurrently

2013/12/17

Day 17 - Returning a List of Results

2013/12/16

Day 16 - Iterating Over a List

2013/12/15

Day 15 - Performing an Action Repeatedly

2013/12/14

Day 14 - Waiting for Alternatives

2013/12/13

Day 13 - Waiting For Multiple Futures

2013/12/12

Day 12 - Asynchronous Await

2013/12/11

Day 11 - Transforming Results or Failures

2013/12/10

Day 10 - Conditional Chaining

2013/12/09

Day 9 - The call and wrap methods

2013/12/08

Day 8 - Handling Successes and Failures

2013/12/07

Day 7 - Failures with Values

2013/12/06

Day 6 - Control Flow with Failures

2013/12/05

Day 5 - Causing a Future to fail

2013/12/04

Day 4 - Coping with failure

2013/12/03

Day 3 - Chaining Futures to perform a sequence of actions

2013/12/02

Day 2 - Doing something when a Future completes

2013/12/01

Day 1 - Futures can return values synchronously

Day 9 - The `call` and `wrap` methods