LinkedIn for iPad: The Native/Web Messaging Bridge and WebSockets

May 9, 2012

This is the third article in a series of blog posts about LinkedIn's new iPad app. In the first post, we discussed how we built a snappy mobile experience using HTML5 local storage. In the second post, we covered the challenges we faced in the implementation of smooth infinite scrolling in HTML5. In this article, we are going to discuss the messaging layer between the native and web portions of the app and how we've used WebSockets to provide near-instant communication across the native bridge.

The Way It Was: URL Scheme Navigation

If you've developed any iOS apps that embed a UIWebView, you're likely familiar with the limitations of communicating with the UIWebView. Sure it's easy to send a message or call a JavaScript function inside the UIWebView with stringByEvaluatingJavaScriptFromString:, but communication from the JavaScript to the native Objective C is clunky and awkward.

The most common way of communicating from JavaScript to native in iOS is via the URL Scheme pattern. On the JavaScript side, you have something like:

And on the native side, you must trap that navigation:

While this method served us well in the iPhone app, we began to run into issues in our messaging-heavy iPad app. Speed was definitely a concern. Delivering a message over the URL scheme bridge could consume over 100 ms. Aside from that, however, we faced two major issues:

  • Sending too many messages, too quickly, would cause unpredictable behavior.

    What was happening was that if the first URL navigation wasn't completed before the second one was sent, the first would be canceled before it reached native, and we would only receive the second message. In short, we were losing messages.

  • After sending a message, it would often seem like our UIWebView had died. It would no longer render new views and generally exhibited odd behavior.

    Once an HTML document attempts navigation in iOS, iOS assumes that the document is no longer required, and thus disabled certain functionality such as innerHTML setting and timers. Thankfully, Alexandre Poirot had already run into and posted a solution for this issue — performing the navigation in an iFrame.

Enter WebSockets

We still needed a faster, and more reliable messaging system. Since HTML5 offers WebSockets as a new feature, we decided to see if it would meet our needs. We expected that WebSockets would have the following advantages:

  • WebSockets can communicate asynchronously from JavaScript to native.
  • WebSockets don't have a payload limit
  • WebSockets don't require us to encode our JSON strings as URL parameters
  • WebSockets should be faster than URL scheme navigation

Traditionally, WebSocket clients (our iPad app in this case) speak to a WebSocket server that is remote. Since our goal was communication between the native and web halves of the app, the server needed to be the app itself.

The first step was to look for an existing WebSocket server written in Objective C that could easily be dropped into the app. Though many WebSocket clients existed, there hadn't been much work done yet in the area of WebSocket servers, so we resolved to write our own.

A Brief Overview of WebSockets

The WebSocket implementation in HTML5 is not as straightforward as one might hope. There are obviously two sides to the implementation, the client and the server. The client initiates contact with the server by creating a new WebSocket object in JavaScript, specifying a ws:// URL. The client will then send a handshake "challenge" to the server. Once the server completes the challenge and sends back the appropriate response, the connection between the two endpoints is established.

Once established, this connection does not behave like a normal TCP socket connection. Rather, communication between the two sides occurs via a series of frames. Frames are defined as starting with a 0x00 and ending with a 0xFF character. Because of this framing, straightforward binary communication is impossible, making WebSocket really only useful for transporting text.

WebSockets were not standardized when we began work on the iPad app. In fact, WebSockets have undergone a number of changes to the handshake protocol, and different browsers have implemented different versions, making WebSockets ill-suited for cross-browser use. Though RFC 6455 has now standardized the handshake and other details, iOS uses the hybi-00 variant of the handshake protocol. Thus, our implementation does, too.

The handshake itself is a rather bizarre process. When the client connects to the WebSocket server, it sends a series of HTTP-like headers.

The server's job is to then take the two WebSocket-Key values and concatenate the numbers from the random strings and divide that by the number of spaces in the string. The two strings are then concatenated together, followed by the random string in the body of the message. All of this is then MD5 summed and returned in the body of the response:

The client validates that all of the math was performed correctly and either allows or disallows the connection accordingly.

WebSocket Issues

It wasn't long after we implemented and began using WebSockets in the early iPad app that we ran into our first problem. One of the developers couldn't get the app to work at all if WebSocket support was enabled. On every launch, the app would immediately crash. This was puzzling, indeed.

It turns out that if the device (or host computer in the case of the simulator), has a proxy defined, the WebSocket client will crash in the JavaScript as soon as it's created. Obviously, this was not good news. However, this is a bug in the iOS WebSocket implementation (more info here and here).

We also ran into other issues with WebSockets. For example, a sleeping device could sometimes put the open connection into an invalid state. If we attempted to use this before discovering its invalid state, it would also lead to a crash. Basically, anything that goes wrong with a WebSocket inside the JavaScript leads to a crash.

Because of this and certain issues we encountered when communicating from the native side to the JavaScript side — moving the UIWebView between View Controllers while messages are pending on the WebSocket crashes — we only use WebSockets one-way, from client to server, for now.

But even after addressing all of the other issues, the proxy issue remained. We thus had to support two modes of operation. If the user had a proxy configured, we would detect that and disable WebSockets. However, now we were using the previously problematic URL scheme implementation with its lost-messages issue.

The Hybrid Approach

What we have now is a hybrid approach to messaging. If WebSockets are able to be used, we will use them. If they are not, we can disable them and fall back to a variant of our previous URL scheme implementation.

We called the new URL scheme implementation webmsg://. Basically, it removes the payload from the URL. Instead, when we want to send messages to the native app, we put them into a FIFO queue (a JavaScript array from which we splice off the front). We then trigger a webmsg:// navigation on an iFrame. Now, it doesn't matter if the first navigation gets canceled. When the second one makes it through, that's all we need. Whenever any webmsg:// navigation is received by the native side, it uses stringByEvaluatingJavaScriptFromString: to remove the first item from the queue and process it. It repeats this until there are no more items in the queue. Though this still has the speed issues inherent to URL Scheme, when multiple messages are queued, it's considerably faster than the previous approach.

However, webmsg:// is different from WebSocket. WebSocket is inherently asynchronous. There's no way to block on sending a message and its processing, whereas our implementation of webmsg:// is synchronous. When we receive the navigation event, we immediately process the messages before returning control back to Javascript. Currently, we take advantage of this distinction to allow our app to make on-the-fly selection of the communication method, particularly for our transitions which benefit from synchronous processing.

Revisiting our Assumptions

We started our WebSocket investigation with 4 assumptions. So did we hit them all?
  • WebSockets can communicate asynchronously from JavaScript to native. Check.
  • WebSockets don't have a payload limit

    They don't need to (bounded by memory, of course). However, we've defined a limit in our implementation that is sufficiently large for our purposes.

  • WebSockets don't require us to encode our JSON strings as URL parameters. Check.
  • WebSockets should be faster than URL scheme navigation

    They are definitely better when you consider that our original URL scheme method lost messages. However, in a full round-trip scenario where we send a message to the native code, then wait for a callback message from the native code, was our WebSocket implementation faster? Here are the numbers from 100 such round-trip messages, compiled from the new iPad:

    Original webmsg:// WebSocket Fastest Slowest Average
    41ms 51ms 44ms
    130ms 154ms 68ms
    54.65ms 61.46ms 48.09ms

    On average, WebSocket is faster, but practically negligibly so. However, it is far more consistent than either of the URL scheme implementations which had widely varied timings. That, coupled with the asynchronous behavior, make WebSockets a win for many solutions.

Conclusion

Our use of WebSockets is still evolving and we are still working hard to resolve the transition issues and enable bi-directional communication so that we only need to use webmsg:// when proxies are enabled. We are also working toward making our WebSocket Server implementation open-sourced so that others can build upon the work we've begun.

Acknowledgements

Special thanks goes to Trunal Bhanse, Akhilesh Gupta, Aarthi Jayaram, Kiran Prasad, and Ganesh Srinivasan for their help in identifying and debugging WebSocket issues and this post.

Topics