How CryptoCompare Ensures Resilience of Data Services in Times of Turmoil

Quynh Tran-Thanh
CCData
Published in
4 min readMar 16, 2020

--

On March 12 digital asset markets plunged — Bitcoin dropped 25% within a period of half an hour. CryptoCompare saw the number of trades received from exchanges more than double from around 800 trades/second to 1800 trades/second. Throughout the day we received 120 million trades compared to an average day of 75 million trades.

Many exchanges and data aggregators struggled to keep up with this amount of data, but for CryptoCompare it was business as usual.

This article explains how we have ensured the resilience and high performance of our data services to meet our SLAs.

Received trades per second more than doubled

Key components to resilience

There are multiple components for how to build and maintain a resilient and performant data service, all of them contribute to a good architecture. We will explain each of the following components:

  1. Microservices architecture
  2. Message queue system
  3. Monitoring and alerts
  4. Great team

Microservices architecture

We built our data processing on a lot of small services, each responsible for a small task. This makes it easy to detect problems, as we can see immediately where something gets stuck, and can fix it very quickly, as usually these services are small and easy to understand. Just an example of some of our microservices:

  • Exchange snapshot service: responsible for calculating 24 hour volume, last price, open, high, low, close values
  • Exchange historical service: responsible for calculating minute and hour candles
  • Exchange streaming service: responsible for forwarding trades and snapshots from exchanges to a central broadcaster
  • Aggregate index service: responsible for aggregating exchange snapshots into an aggregate price
  • Total volume service: listens to all exchanges and aggregates volumes per asset

The great thing about the microservices architecture is that when the load increases, we can add more services that do the same calculation, taking the pressure off a single service — this way we can easily scale.

In the last few days CryptoCompare has seen an increase in API usage up to 300 million calls a day. This is actually not even the highest period we have ever experienced. In late 2017, during the bitcoin rally, we experienced 180 million calls an hour.

API usage have doubled in the last months

Message queue system

We use a message queue system to listen to new data updates and broadcast them to services interested in the dataset. Data processing services receive messages that will be put into a queue for processing. This gives a lot of flexibility for the architecture, as we can add new services very quickly by subscribing to whatever dataset needed.

For example, if we wanted to create a new real-time metric using prices and orderbook data for a certain asset, the service simply needs to subscribe to those datasets — similarly to how our customers consume data from us via our websocket API.

In times of high load, we might see queues building up, as messages are waiting to be processed. As mentioned above, on 12th March, CryptoCompare received 120 million trades with a peak load of 1800 trades per second.

The chart below shows the queue build up with a max of 15 messages in the queue. One message is processed within milliseconds — so a max of 15 messages gives us a delay of a few milliseconds.

Our latency was in the millisecond order of magnitude in times of high load.

Maximum queue length is 15 messages — our latency was in the millisecond order of magnitude

Monitoring and alerts

Managing thousands of microservices requires a lot of analytics and monitoring. We are using Grafana and Icinga to follow the health of each service. For example we count the number of messages processed by a service, and if the number drops below a critical value, an engineer investigates.

Great team

Of course the most important part of the a resilient service is the great engineering team that makes sure all services are up and running, and the necessary scaling is done at times of high traffic like last week.

How did exchanges do?

Exchanges struggled to catch up with prices, between Bitstamp and OKEx the price gap was as large as 500 USD at certain times.

Bitstamp and OKEx price difference was as high as 500 USD

Conclusion

While many exchanges and data aggregators struggle to keep up with such data surges, CryptoCompare’s architecture enables us to respond quickly to changes in the market in accordance with our data SLAs.

Learn more

If you want to test our data services, visit our API documentation where you can test the streaming connection using the newly launched streamer playground.

For data inquires contact data@cryptocompare.com.

--

--