georgi

Highly responsive ajax applications without excess requests and bandwidth waste?

Ajax is not as powerful as one would imagine

If you have ever tried to develop a robust ajax application, you have probably reached a point where you want the client to be notified about a change on the server in “real time”. The most simple example – you need to create a chat application and want the client to instantly receive new messages upon their arrival on the server. And, if you like me, are not a fan of Flash ( which may be one of your major flaws ) and want to achieve this using only ajax techniques, you have found out that is not that simple.

There is no real channel between the server and the client

In that case you have no real permament connection between the server and the client and there is no real way of “triggering” events or sending data from the server to the client in any traditional way. Moreover, ajax does not support the client “listening” and waiting for a server’s call ( well, that’s a client after all ). You googled for some time, asked some fellow developers and almost all you could find was – “Just use javascript’s (notorious) setTimeout() or setInterval() function to repeatedly poll the server and check whether there is new data for the client”.

Like, for example ( using jQuery ) :

function read(){
  jQuery.ajax( {
    'url'     : 'chat/read/',
    'cache'   : false,
    'dataType': 'json',
    'success' : function(messages){
      //if there are new messages - write them on the wall
      if( messages ){
	//write messages to the chat wall
      }
    } );
  }
}
 
setTimeout( function(){ read() }, 2000 );

( I will not show any server-side code examples as they are simple enough – the method being /chat/read/ would just read from the database and return new records. Moreover, what we have here is not what we’re trying to achieve – the example here is for descriptive purposes only. )

Is that what we call highly-responsive, flexible applications, you’d ask? No. This technique has several drawbacks:

  • it sends http requests from the client to the server and gets response ( most of the time empty ones ) on every X number of seconds. X usually is a far too small number and that can translate into way too much wasted bandwidth, if we are talking about a high-traffic website and if we need that polling done for multiple actions
  • it can eventually make the client hang-up as most modern browsers ( except for Chrome – or at least – what Chrome pleads ) don’t have smart javascript garbage collection and all those requests can be quite costly.
  • it is not really highly-responsive – it has a lag of 2 seconds which may be enough to make the users nervous
  • this just doesn’t feel like the right way of doing things

Comet

Well, I need to be honest – the Comet technique ( also known as long-polling ) seems like nothing new but I personally was not aware of it’s power till recently ( Eelco pointed it to me ). But you will be amazed how many open- and closed-source projects have no idea that it is out there and do not try to use it. And it is simple – you make a request to the server, it gets it and just… waits. Waits till there is meaningful data to be returned or returns something less useful ( like null ), if the request is about to timeout. On a standard apache configuration that would mean 5 minutes – you can make just one single request on every 5 minutes, not on every 2 seconds ( which a traditional ajax app would do ). That is 150+ times less requests, less bandwidth, less javascript functions’ calls and the application would feel much more responsive.

There is, however, a small side note to mention…

PHP Sessions are not keen on long-polling

And, if you hadn’t read this post, you’d have had some hard time finding this out. It is just that in every normal application, you can be 97% sure you will be using sessions. And in most of the time, you will rely on the built-in php sessions’ handler. So far, so good. But the small problem is that you cannot simultaneously be running different scripts ( or more than one instantiation of a script ) which use one and the same user session ( that, again, was a discovery by Mr. Pagebaker ). PHP will just queue all subsequent requests which try to interact with that session, until the first one has been closed. Well, that is not much of an asynchroneous technique either then. Getting back to our chat application – if you want to long-poll the server and wait for new messages ( what we will call read() ) and simultaneously send new messages to the server ( write() ), the application will not write() until the read() request has finished ( which we have set to 5 mins, if in the worst case there is no new data to be received ) – that is absolutely not what we need.

Frameworks to the rescue

But, as all self-respectful developers, you have adopted a php framework which provides an alternative to the native php sessions’ handler – be it file, database or any other geeky approach. I’d prefer CakePHP where it is simple – just open your /app/config/core.php and find

Configure::write('Session.save', 'php');

change that to

Configure::write('Session.save', 'database');

and uncomment the few lines below that line with respective data.

So, now let’s see what we have:

Our client-side script ( jQuery again ):

function read(){
  jQuery.ajax({
    'url'     : '/chat/read/',
    'cache'   : false,
    'dataType': 'json',
    'success' : function(messages){
      if( messages.length ){
	//write to chat wall
      }
      read();
    }
  });
}

It is important to notice that even though I have not shown the write() function, you should use jQuery’s Ajax Queue Plugin, so that our write requests are kept in good order. That is not needed when we read() as no subsequent read() requests will be fired before the first one has ended.

And, our cakephp chat controller method:

public function read(){
  $lastMessage_id = $this->Session->read('Conversation.lastMessage_id');
  if( $lastMessage_id === null ){
    echo $this->Json->encode( array() );
    $this->Session->write( 'Conversation.lastMessage_id', 0 );
    exit(0);
  }
  $now = time();
  $timeout = 300;
  while(!connection_aborted()){
    $messages = $this->Conversation->read( $lastMessage_id );
    if( !empty( $messages ) ){
      $this->Session->write( 'Conversation.lastMessage_id', $messages[ count( $messages ) - 1]['Message']['id'] );
      echo $this->Json->encode( $messages );
      exit(0);
    }
    elseif( time() > $now + $timeout ){
      echo $this->Json->еncode( array() );
      exit(0);
    }
    sleep(1);
  }
}

In short – we get the last message id from the user session then use it to fetch newer records from the model. If there are none and the user hasn’t left our webpage, we start looking for new messages once again and so on. We also keep track on the time we have spent so far in that iteration – we don’t want an ugly timeout, that is why we just send an empty response, if we have reached a critical level ( $timeout = 300 in that case ).

For simplicity, I do not use views in that example but it is highly recommended that you avoid using the JSON component and use views instead ( who knows, you might need a different format in the future ).

Notice the sleep(1) line – we don’t want our CPU to burn out after a few hours of work, so some time for rest is highly recommended. You may, of course, use smaller intervals but at your own risk.

That is it for now – feel free to comment and thanks for reading 🙂

Tags: , , , , , , , , , ,

2 Responses to “Highly responsive ajax applications without excess requests and bandwidth waste?”

  1. frank Says:

    This is an interesting subject. Are you still there? Didn’t see much followup. I was debating on developing a mechanism that would syncronize a client and server in the dependency caching way. In this way, the client must notify the server that there has been a new action that requires a CRUD routine and server in turn would adopt and post back the collection of changes the client had compiled while disconnected.

    My question is, what do you do to address an application that must make requests to the server by the hundreds of thousands per minute. For example, an application with 10 thousand concurrent users, who’s browsers are all “short polling” the server, will make 600,000 http requests per minute. What kind of overhead would this create if making mini round trips to the database and is there an alternate approach so that collections or arrays can be used and manipulated on the server without having to batch their data in with SQL each second.

  2. georgi georgi Says:

    @frank: if you have 10 000 concurrent users each second then you must have quite of a web server… In this situation you would need a database cluster ( a whole farm ) as your application is write-intensive. I don’t think you can cache anything as each user will be served by the web-server by a new instance of your script and you have no idea what other users are doing – one would think you can “cache” the data coming from all users in files on the hard drive and then run a script each X seconds which will write those files in the db but, hey – the DB is a file on the hard drive, too – you will be adding an additional layer which ( in my opinion ) will at least not make the process more efficient.
    You can try using a DBE with transaction support ( most of them do have such support ) and maybe commit the transaction each X seconds.

Leave a Reply