[Home] [Feed] [Twitter] [GitHub]

More intelligent HTTP routing with NodeJS

July 22 2010

Earlier this week, I wrote an article for YDN covering some of the reasons why one might want to run a multi-core HTTP server in NodeJS and some strategies for intelligently allocating connections to different workers. While routing based on characteristics of the TCP connection is useful, the approach outlined in that post has a serious shortcoming - we cannot actually read any data off of the socket when making these decisions. Doing so before passing off the file descriptor would cause the worker process to miss critical request data, choking the HTTP parser.

The above limitation precludes interrogating properties of the HTTP request itself (e.g. headers, query parameters, etc) to make routing decisions. In practice, there are a wide variety of use-cases where this is important: routing by cookie, vhost, path, query parameters, etc. In addition to cache affinity, this can provide some rudimentary forms of access control (e.g. by running each vhost in a process with a different UID or chroot(2) jail) or even QoS (e.g. by running each vhost in a process with its nice(2) value controlled).

Naively we could use NodeJS as a reverse HTTP proxy (and a pretty good one, at that), but the overhead of proxying every byte of every request is kind of a drag. As it turns out, we can use file descriptor passing to efficiently hand off each TCP connection to the appropriate worker once we've read enough of the request to make a routing decision. Thus, once the routing process delegates a connection to a worker, that worker owns it completely and the routing process has nothing more to do with it. No juggling connections, no proxying traffic, nothing. The trick is to do this in such a way that allows the routing process to parse as much of the request as it needs to while ensuring that all socket data remains available to the worker.

Step by step, we can do the following. Note that this does not work with HTTP/1.1 keep-alive, which multiplexes multiple requests over a single connection.

  1. Accept the TCP connection in the routing process
  2. Set up a data handler for the TCP connection that both retains a record of every byte received and uses a specially-constructed instance of the interruptible HTTP parser  (part of NodeJS core) to parse as much of the request as we need
  3. Once we've seen enough of the request, make a routing decision; here we just use the vhost specified in the request
  4. Hand off the file descriptor and all data seen thus far to the worker
  5. In the worker, construct a net.Stream connection around the received FD and use it to emit a synthetic 'data' event to replay data already read off of the socket by the routing process

It's important to note that this does not rely on any modifications to the HTTP stack in the worker - just plane vanilla NodeJS. In order to do this, we have to recover from the fact that parsing the HTTP request in the routing process is destructive - it's pulling bytes off of the socket that are not available to the worker once it takes over the TCP connection. To make sure that the worker doesn't miss a single byte seen on the socket since its inception, we send over all data seen thus far and replay it in the worker using the synthetic 'data' event.

First, router.js:

var HTTPParser = process.binding('http_parser').HTTPParser;
var net = require('net');
var path = require('path');
var sys = require('sys');
var Worker = require('webworker/webworker').Worker;

var VHOSTS = ['foo.bar.com', 'baz.bizzle.com'];
var WORKERS = {};

VHOSTS.forEach(function(vh) {
    WORKERS[vh] = new Worker(path.join(__dirname, 'worker.js'));
});

net.createServer(function(s) {
    var hp = new HTTPParser('request');
    hp.data = {
        'headers' : {
        },
        'partial' : {
            'field' : '',
            'value' : ''
        }
    };

    var seenData = '';

    hp.onURL = function(buf, start, len) {
        var str = buf.toString('ascii', start, start + len);

        if (hp.data.url) {
            hp.data.url += str;
        } else {
            hp.data.url = str;
        }
    };

    hp.onHeaderField = function(buf, start, len) {
        if (hp.data.partial.value) {
            hp.data.headers[hp.data.partial.field] = hp.data.partial.value;
            hp.data.partial = {
                'field' : '',
                'value' : ''
            };
        }

        hp.data.partial.field += buf.toString(
            'ascii', start, start + len
        ).toLowerCase();
    };

    hp.onHeaderValue = function(buf, start, len) {
        hp.data.partial.value += buf.toString(
            'ascii', start, start + len
        ).toLowerCase();
    };

    hp.onHeadersComplete = function(info) {
        // Clean up partial state
        if (hp.data.partial.field.length > 0 &&
            hp.data.partial.value.length > 0) {
            hp.data.headers[hp.data.partial.field] = hp.data.partial.value;
        }

        delete hp.data.partial;

        hp.data.version = {
            'major' : info.versionMajor,
            'minor' : info.versionMinor
        };

        hp.data.method = info.method;
        hp.data.upgrade = info.upgrade;

        if ('host' in hp.data.headers &&
            hp.data.headers.host in WORKERS) {
            s.pause();

            WORKERS[hp.data.headers.host].postMessage(
                seenData, s.fd
            );
        } else {
            s.write(
                'HTTP/' + info.versionMajor + '.' + info.versionMinor + ' ' +
                '400 Host not found\r\n'
            );
            s.write('\r\n');
            s.end();
        }
    };

    s.ondata = function(buf, start, end) {
        seenData += buf.toString('ascii', start, end);

        var ret = hp.execute(buf, start, end - start);

        if (ret instanceof Error) {
            s.destroy(ret);
            return;
        }
    };
}).listen(8080);

... next, worker.js:

var Buffer = require('buffer').Buffer;
var http = require('http');
var net = require('net');
var sys = require('sys');

var srv = http.createServer(function(req, resp) {
    resp.writeHead(200, {'Content-Type' : 'text/plain'});
    resp.write('Hello, vhost world!\n');
    resp.end();
});

onmessage = function(msg) {
    var s = new net.Stream(msg.fd);
    s.type = srv.type;
    s.server = srv;
    s.resume();

    srv.emit('connection', s);
    s.emit('data', msg.data);
    s.ondata(new Buffer(msg.data, 'ascii'), 0, msg.data.length);
};

Keep in mind that this code is a prototype only (please don't ship it - I've left out a lot of error handling for the sake of readability ;), but I thought it was interesting enough to share with a broader audience. This implementation takes advantage of the task management and message passing facilities of node-webworker. It should run out of the box on node-v0.1.100.

Anyway, the key to this is being able to replay the socket's data in the worker. You'll notice in the code above that we're calling net.Stream.pause() once we've received all necessary data in the routing process. This ensures that this process doesn't pull any more data off of the socket. If the kernel's TCP stack receives more data for this socket after we've paused the stream, it will sit in the TCP receive buffer waiting for someone to read it. Once the worker process ingests the passed file descriptor and inserts it into its event loop, this newly-arrived data will be read. In a nutshell, we use the TCP stack itself to buffer data for us. If we really wanted to be clever, we might be able to use recv(2) with MSG_PEEK to look at data arriving on the socket while leaving it for the worker, but I'm not sure how this would play with the event loop.

Finally, while I think this is an interesting technique, it's worth noting that a typical production NodeJS deployment would be behind an HTTP load balancer anyway, to front multiple physical hosts for availability if nothing else. Many load balancers can route requests based on a wide variety of characteristics like vhost, client IP, backend load, etc. However, if one doesn't want/need a dedicated load balancer, or needs very application-specific logic to make routing decisions, I think the the above could be a useful tool.