Alternative RSocket-RPC: fast application services communication and transparent GRPC bridge

May 20, 2021
RSocket Mstreams java grpc

Exchanging millions of buffers per second with plain RSockets is not useful for business applications because they need structured data instead of raw bytes.

That’s why there was RSocket-RPC - remote procedure call system on top of Protocol Buffers with language agnostic service definitions. It uses code-generation for performance equivalent to hand-written code.

Protocol buffers have tiny on-wire overhead, acceptable performance and is native format of Grpc - important for interoperability and available tooling that can be used directly, or easily adapted.

Alternative RSocket-RPC

Original RSocket/RSocket-RPC is based on problematic RSocket/RSocket-java and seems is not supported anymore (generated RPC stubs do not compile). Approach based on code generation and single efficient data format was dropped in favour of data abstraction and runtime tricks to translate between serialized and structured messages.

For alternative RSocket-RPC, code generation was redone to align It with jauntsdn/RSocket-reactor.

Let’s enumerate outstanding differences:

GRPC transport

For internet clients, RSocket-RPC has limited utility because existing solutions (HTTP/Grpc) are mature and already good enough. This relates to both client libraries and API gateways - haproxy, nginx and recently even AWS API gateway support Grpc. Some of them have features and performance that would require years of dedicated teams to reach.

In data center environment, RSocket/RSocket-RPC has several advantages over Http2/Grpc, main of them is considerable performance gains.

Fortunately, RSocket streams are able to accommodate Http2 streams which then may be routed to RSocket-RPC origins over TCP.

This way RSocket-RPC services can be efficiently accessed by existing Grpc clients - the message format is same so there is no penalty of cross-format data translation.

Scope and features:

Like TCP and Unix sockets, GRPC transport can be used with any jauntsdn/RSocket-JVM based implementation, and was verified with RSocket-reactor and RSocket-rxjava.

It is lean dependency-wise: only io.netty:netty-codec-http2 and jauntsdn:rsocket-transport, no Protobuf related libraries. Compiled JAR is less than 60 KBytes in size.

Performance

Performance was evaluated on commodity box running linux 5.4.0 with EPOLL IO and non-TLS TCP transport. JVM is OpenJDK 11.0.11.

Its purpose is comparison against plain RSocket with RSocket-reactor on single core throughput.

This test gives estimate of per-request (with request-response) and per-message (request-stream, channel) RPC overhead.

Message size is selected small (less than 10 bytes) so most of CPU time is spent on RPC request encoding & message serialization instead of memory copy, which is better gauge for RPC efficiency.

RSocket-RPC

Single connection / single core stream throughput test.

16:55:09.484 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.stream.client.Main client received messages: 3094767
16:55:10.484 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.stream.client.Main client received messages: 2997843
16:55:11.484 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.stream.client.Main client received messages: 2996896
16:55:12.484 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.stream.client.Main client received messages: 3004344
16:55:13.484 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.stream.client.Main client received messages: 3000798
16:55:14.484 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.stream.client.Main client received messages: 3002806

3M msg/sec vs 3.5M msg/sec demonstrated by Plain RSocket, or 14% worse throughput.

This small overhead is caused by Protobuf serialization only as there is single continuous stream per connection (for N response messages there is 1 request).

Single connection / single core stream throughput test.

Server

16:56:20.876 rsocket-netty-io-transport-epoll-1-2 com.jauntsdn.rsocket.rpc.examples.channel.server.Main server received messages: 1726494
16:56:21.874 rsocket-netty-io-transport-epoll-1-2 com.jauntsdn.rsocket.rpc.examples.channel.server.Main server received messages: 1734271
16:56:22.877 rsocket-netty-io-transport-epoll-1-2 com.jauntsdn.rsocket.rpc.examples.channel.server.Main server received messages: 1734271
16:56:23.875 rsocket-netty-io-transport-epoll-1-2 com.jauntsdn.rsocket.rpc.examples.channel.server.Main server received messages: 1718717
16:56:24.875 rsocket-netty-io-transport-epoll-1-2 com.jauntsdn.rsocket.rpc.examples.channel.server.Main server received messages: 1710940
16:56:25.874 rsocket-netty-io-transport-epoll-1-2 com.jauntsdn.rsocket.rpc.examples.channel.server.Main server received messages: 1726494

Client

16:56:21.804 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.channel.client.Main client received messages: 1742048
16:56:22.804 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.channel.client.Main client received messages: 1726494
16:56:23.803 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.channel.client.Main client received messages: 1718717
16:56:24.804 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.channel.client.Main client received messages: 1718717
16:56:25.803 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.channel.client.Main client received messages: 1726494
16:56:26.803 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.channel.client.Main client received messages: 1734271

With request-channel RSocket-RPC exchanges ~3.4M msg/sec, while plain RSocket is at 4.8M msg/sec - 29% slower due to additional serialization/deserialization of request stream (for N response messages there is N request messages).

Single connection / single core continuous requests window throughput test.

16:58:16.609 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.response.client.Main client received messages: 1329611
16:58:17.607 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.response.client.Main client received messages: 1320154
16:58:18.608 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.response.client.Main client received messages: 1321358
16:58:19.609 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.response.client.Main client received messages: 1317221
16:58:20.607 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.response.client.Main client received messages: 1321927
16:58:21.607 rsocket-netty-io-transport-epoll-1-1 com.jauntsdn.rsocket.rpc.examples.response.client.Main client received messages: 1329611

RSocket-RPC demonstrates ~1.3M msg (requests)/sec, or 38% worse throughput compared to plain RSocket (2.1M requests / sec).

Substantial difference is due to additional overhead of request metadata encoding for each outbound message under request-response model (for N responses there is N requests).

RSocket-GRPC

Test consists of RSocket-RPC server with RSocket-GRPC transport, and Grpc client based on grpc-java 1.37.0 over Netty, under OpenJDK 11.0.11.

Services and messages are identical to ones from RSocket-RPC test.

Each result is compared with grpc-java-only setup.

Single stream throughput test.

17:52:03.752 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.stream.Main Client received messages: 1428347
17:52:04.752 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.stream.Main Client received messages: 1454406
17:52:05.752 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.stream.Main Client received messages: 1568482
17:52:06.752 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.stream.Main Client received messages: 1480556
17:52:07.752 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.stream.Main Client received messages: 1386296
17:52:08.752 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.stream.Main Client received messages: 1492142

Server-stream is 50% slower compared to RSocket-RPC over TCP: the reason is chattiness of Http2, overhead of grpc-java on client side, netty http2 codec on server side.

On the other hand numbers correspond to the best case with grpc-java only setup which has distinctively more fluctuations in 0.8-1.5M msg/sec range.

Single stream throughput test.

17:53:04.856 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.bidistream.Main ==> Bidi stream sent messages: 709548, received messages: 458496
17:53:05.856 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.bidistream.Main ==> Bidi stream sent messages: 790257, received messages: 291072
17:53:06.856 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.bidistream.Main ==> Bidi stream sent messages: 828460, received messages: 276224
17:53:07.856 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.bidistream.Main ==> Bidi stream sent messages: 831259, received messages: 275083
17:53:08.856 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.bidistream.Main ==> Bidi stream sent messages: 600732, received messages: 818881
17:53:09.856 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.bidistream.Main ==> Bidi stream sent messages: 664493, received messages: 656692
17:53:10.856 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.bidistream.Main ==> Bidi stream sent messages: 850823, received messages: 280517

Exchange is around 1.2M msg/sec, comparable to grpc-java only setup.

Continuous requests window throughput test.

10:52:47.579 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.response.Main Client received messages: 101663
10:52:48.579 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.response.Main Client received messages: 103388
10:52:49.579 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.response.Main Client received messages: 101563
10:52:50.579 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.response.Main Client received messages: 101165
10:52:51.579 pool-2-thread-1 com.jauntsdn.rsocket.showcase.grpc.response.Main Client received messages: 101255

The result is around 100K msg (requests)/sec, same with grpc-java only setup.

Numbers do not look impressive compared to order of magnitude faster RSocket-RPC counterpart (1.3M requests/sec), but this is what current Http2/Grpc-java/Netty stack is able to offer.

📌 Summary: alternative RSocket library for high performance network applications on JVM

February 1, 2023
RSocket Mstreams java

Jaunt-RSocket-RPC, Spring-RSocket, GRPC: quantitative and qualitative comparison

September 3, 2021
RSocket Mstreams java

RSocket-JVM: streamlining implementation for each vendor platform

April 22, 2021
RSocket Mstreams java