TL;DR
This post is brief presentation of netty-websocket-http1: alternative netty/java implementation of RFC6455 - the WebSocket protocol.
Its advantage is significant per-core throughput improvement (1.8–2x) for small frames in comparison to netty’s out-of-the-box websocket codecs, and minimal heap allocations on frame path. Library may also be combined with netty-websocket-http2.
Its purpose is to be the basis for high-performance RPC transport of small binary messages (protocol buffers), mainly cross-datacenter communications over internet — both http1 and http2.
Preliminary performance evaluation with netty’s out-of-the-box codec showed only ~1M 120 byte messages per core over non-TLS connection — very modest for this simple wire format, concluding from experience with other protocols.
Additionally there are unnecessary per-frame allocations as netty’s codec expects binary payloads wrapped as WebSocketFrame messages (which are likely not useful for users application purposes), plus allocates array per-frame for payload masking (latter was recently improved).
use case & scope
-
Intended for efficiently encoded, dense binary data: no extensions (compression) support / outbound text frames / inbound UTF8 validation.
-
Library assumes small frames — many have payload <= 125 bytes, most are < 1500, maximum supported is 65k (65535 bytes).
-
Just codec — fragments, pings, close frames are decoded & validated only. It is responsibility of user code to handle frames according to protocol (reassemble frame fragments, perform graceful close, respond to pings).
-
Dedicated decoder for case of exchanging tiny messages over TLS connection: only non-masked frames with <= 125 bytes of payload for minimal per-WebSocket state (memory) overhead.
-
No per-frame heap allocations in WebSocket FrameFactory / decoder.
-
Single-threaded (transport IO event-loop) callbacks / frame factory API: in practice user code has its own message types to carry data, external means (e.g. mpsc / spsc queues) may be used to properly publish messages on event-loop thread.
FrameFactory / Callbacks API
1.WebSocketFrameFactory
to create outbound frames as plain byte buffers, which helps to reduce pressure on memory
allocator, and avoid either two tiny buffers (“header” plus payload) or redundant memory copies for each frame.
It is library user responsibility to mask outbound frame once payload is written: ByteBuf WebSocketFrameFactory.mask(ByteBuf)
public interface WebSocketFrameFactory {
ByteBuf createBinaryFrame(ByteBufAllocator allocator, int binaryDataSize);
// create*Frame are omitted for control frames, created in similar fashion
ByteBuf mask(ByteBuf frame);
}
2.WebSocketFrameListener
to receive inbound frames
public interface WebSocketFrameListener {
void onChannelRead(ChannelHandlerContext ctx, boolean finalFragment,
int rsv, int opcode, ByteBuf payload);
// netty handler callbacks are omitted for brevity
// lifecycle
default void onOpen(ChannelHandlerContext ctx) {}
default void onClose(ChannelHandlerContext ctx) {}
3.WebSocketCallbacksHandler
to exchange WebSocketFrameListener
for WebSocketFrameFactory
on successful
WebSocket handshake
public interface WebSocketCallbacksHandler {
WebSocketFrameListener exchange(
ChannelHandlerContext ctx, WebSocketFrameFactory webSocketFrameFactory);
}
4.Similar to Netty, this library has WebSocketClientProtocolHandler
& WebSocketServerProtocolHandler
for end users.
These handlers are responsible for whole WebSocket http handshake process — up until WebSocketCallbacksHandler
exchange on successful handshake completion.
It is common for WebSocketCallbacksHandler
to also implement WebSocketFrameListener
, so users have
class FrameHandler implements WebSocketCallbacksHandler,
WebSocketFrameListener {
WebSocketFrameFactory webSocketFrameFactory;
WebSocketFrameListener exchange(
ChannelHandlerContext ctx,
WebSocketFrameFactory webSocketFrameFactory) {
this.webSocketFrameFactory = webSocketFrameFactory;
}
void onChannelRead(ChannelHandlerContext ctx,
boolean finalFragment, int rsv, int opcode, ByteBuf payload) {
// read inbound frames, write outbound frames /w webSocketFrameFactory
}
}
Performance test module serves as good API showcase for both client and server.
performance
Below is per-core throughput comparison with netty’s out-of-the-box WebSocket handlers: non-masked frames with 8, 64, 125, 1000 bytes of randomized payload over encrypted/non-encrypted connection.
- non-encrypted
- encrypted
websocket-over-http2
One drawback of websocket-over-http2 support with OOTB netty codecs is that It is much slower than either http2 or WebSocket alone (2 protocols to decode from byte stream).
This library helps to ease the problem as It may be combined with jauntsdn/websocket-http2 using http1 codec API for comparable benefit. With 8, 125, 1000 bytes of randomized payload frames over encrypted connection results are as follows: