All the source code for this post is available on GitHub: vnc.js and tcp.js.
Last summer, as a LinkedIn intern, I participated in LinkedIn's first public Intern Hackday. Interns from all over Silicon Valley came down to LinkedIn's headquarters in Mountain View to hack all night, eat free food, pound energy drinks, and compete for awesome prizes. With Macbook Airs, iPads, and bragging rights on the line, my team decided to go big. Our goal:
In this post, I'll tell you how we combined Node.js, socket.io, HTML5 canvas, and a solid understanding of the RFB protocol to create vnc.js.
In order to get a VNC client working the browser, we needed the following pieces:
- A way to connect to a remote server: the only way to do this within a browser is to use a proxy. We built the proxy using Node.js and established a persistent connection to it using socket.io.
- Implement the RFB protocol: with Node.js abstracting away the TCP connection, the next step was to use the RFB protocol to communicate with the remote server.
- Render the image in the browser: once the server was sending us data, we used the HTML5 canvas element to render it in the browser. This worked well, as we can conveniently transfer the 32bit pixel data in row major order directly to the canvas.
tcp.js: A TCP proxy written on top of Node.js and Socket.IO
The first step was establishing connectivity between the browser, the Node.js proxy, and a remote VNC host. Communication between the browser and Node.js is handled easily using socket.io. However, communication between Node.js and the VNC host is more complicated: it requires TCP.
Implementing the RFB protocol in the browser
Given our time constraints, we decided to implement a subset of the RFB protocol 3.3. Namely, we would only use encryption during the authentication phase (which uses the very simple DES block cipher) and we would only support Raw and CopyRect encodings.
We broke the RFB implementation into the following four steps:
Part 1: Authentication
We supported two different versions of authentication: None and VNC Authentication. There isn't much to say about the former, so I'll outline VNC Authentication. It turns out that the RealVNC RFB spec we were using was not entirely accurate. It lays out the following steps:
- The server sends a random 16 byte challenge.
- Tthe client encrypts that challenge with the user entered password and sends back the encrypted version.
Unfortunately, the spec left out several very important details:
- The key must be null padded to 8 bytes before encryption. This seems obvious after the fact, but it wasn't called out and had us scratching our heads.
- The encryption must be done in two phases: you must encrypt the low 8 bytes and then the high 8 bytes and append the results into a single 16 byte response.
- The bits must actually be flipped (not inverted). Think bit level endianness: the most significant bit will become the least significant bit and so on.
Part 2: ServerInit and Synchronization
The ServerInit phase is straightforward: the server sends a message with the machine name, the dimensions of the screen, and the pixel format. Once we've received the screen dimensions we create an HTML5 canvas with the same dimensions. Given the time restrictions, we didn't attempt anything fancy like scaling. The advantage of using a canvas that's the exact same size as the remote frame buffer is that mouse movements and clicks are translated 1 to 1.
Part 3: Continuous Frame Buffer updates
The update process works as follows: the client sends a FrameBufferUpdateRequest specifying the encoding methods it supports and the server sends back the content using one of the encoding formats you've requested. The RFB protocol supports incremental updates, so we would request a full update at the start followed by incremental updates at some periodic interval.
To keep it simple, the only encoding methods we implemented were Raw and CopyRect. Raw encoding is just the raw pixel data. CopyRect is the server taking advantage of the fact that it knows exactly what the client has in its buffer, so it is able to tell you to move a portion of your buffer to another part of the screen. CopyRect is significantly more efficient than Raw, especially for operations such as window dragging.
The FrameBufferUpdate messages, which are the messages that contain the screen pixel data, had one gotcha: after the initial implementation, we found that our screen updates were working only ~30% of the time. As it turns out, the RFB spec allows for multiple updates to be sent in a single message. Once we were able to understand exactly what the server was sending back, we were able to correctly process the FrameBufferUpdate messages and at that point we had working screen viewer!
Part 4: Mouse and Keyboard Support
The Final Product
Grab the code on Github - vnc.js and tcp.js - and try it out for yourself! Of course, the usual hackday disclaimer applies: the project was thrown together in under 24 hours, may contain bugs, and has almost no documentation.
A huge thanks to my teammates, who sacrificed much sleep to make this hack possible: Avik Das, Fabio Angius, and Ferris Jumah.