by Vict0r#7462

TL,DR: Vict0r’s independent research around various endpoint provider options for node operators. While researching and testing a failover application, I began looking at the differences between RPC endpoint providers. I found some providers offer load-balanced endpoints backed by geographically distributed nodes, which could be of interest to Threshold node operators.

Statements and conclusions reached in this document represent the opinions and ideas of the author, and do not reflect the position of the Threshold DAO or any of its members. This document does not constitute endorsement of any service provider or service plan. No compensation was offered or requested for the opinions presented herein. As always, do your own research. YMMV.

What follows is not a discussion on whether you should run a node or not. This discussion assumes you already decided to run a node or are an active node operator.

Regardless of what hardware, cloud provider, or geographic region a PRE node is running on, chances are the node operator is not also running a Geth (Ethereum) node and a Polygon node. It’s not impossible, but certainly not easy nor cheap. Enter professional endpoint providers such as Alchemy and Infura. There are several to choose from, and this document aims to aggregate information and to aid in choosing a provider.

What Prompted This Endeavor

On April 22, 2022 Infura suffered a major outage that affected MetaMask users (all 30M of them$^1,^2$) and anyone else relying on Infura for web3 functionality. This sort of thing is by no means unique to Infura - a quick review of Alchemy’s status page tells a similar story$^3$.

This outage lead me to worry about slashing of my stake and I started to think about ways to set up a failover system of some kind. I have heard from one fellow staker that he made a separate configuration file with an alternative RPC provider, however, loading and activating an alternate configuration file requires rebooting the node. In an attempt to counteract or address this eventuality, I deployed an application called “dshackle$^4$” for evaluation and testing after being told about it by another community member that had used it in a different capacity.

                                 _     _     __ _     _                _    _
                                | |   | |   / /| |   | |              | |  | |
    ___ _ __ ___   ___ _ __ __ _| | __| |  / /_| |___| |__   __ _  ___| | _| | ___
   / _ \\ '_ ` _ \\ / _ \\ '__/ _` | |/ _` | / / _` / __| '_ \\ / _` |/ __| |/ / |/ _ \\
  |  __/ | | | | |  __/ | | (_| | | (_| |/ / (_| \\__ \\ | | | (_| | (__|   <| |  __/
   \\___|_| |_| |_|\\___|_|  \\__,_|_|\\__,_/_/ \\__,_|___/_| |_|\\__,_|\\___|_|\\_\\_|\\___|
    Emerald Dshackle - Fault Tolerant Load Balancer for Blockchain API
    <https://github.com/emeraldpay/dshackle>

Dshackle is a load balancer that can be used to proxy multiple endpoints and thus be configured as a fail over - if one RPC doesn’t respond, the call will be routed to another provider that is available. Despite receiving help from a well-rounded community member, it took an absurd amount of time to configure and run. To run dshackle, a second VPS is required, along with a domain name (a FQDN is required for SSL encryption) and considerable patience. This extra infrastructure isn’t free either and will add approximately $10 per month. If you are interested in dshackle and want to give it a shot, I took notes which could save days worth of reading and trial and error.

Other Options?

Dshackle may be overkill, not to mention not every (private) node operator is interested in the extra expense and technical overhead that goes hand in hand with dshackle. But what is the risk of an endpoint going down and exposing your stake to potential slashing? Are there other options besides running a separate VPS instance and spending a few nights figuring out how to make dshackle work? One alternative could be proxyd, a package from the Ethereum-Optimism GitHub repository. According to the official description, proxyd is a “Configurable RPC request router and proxy” application, however, I was only able to proxy one protocol - but that doesn’t mean proxyd cannot do it.

The Reality

The reality is that my Infura RPC endpoint has been very reliable over the past 2 years. Further, slashing won’t be enabled until the launch of tBTC v2 (and you will need to run the tBTC v2 client). You won’t face slashing for downtime if your node runs PRE, only if it acts maliciously. This isn’t a fear-mongering paper, nor am I trying to sell you something or spread FUD. One member of the community told me they weren’t worried about their RPC endpoints going down, which led me to reassess my worries. The silver lining of overthinking a situation is that you may find alternatives that are less drastic than running a load-balancer like dshackle and easier to implement.

Comparing Features and Options

The information in the table below was aggregated in an effort to compare service providers and features. Each provider listed below, and there are others not included here to keep this reasonable in length, has a status monitoring page that displays the current service status as well as the past 90 days. As there appears to be only one status page for every provider, it would appear reasonable to infer that the monitor covers free as well as paid plans. For the past 90 days, Infura reports 99.99% uptime$^9$, QuickNode 100%$^{10}$, and Alchemy 99.97%$^{11}$ - Ankr does not appear to have a status dashboard at the time of writing.

Infura Infura Infura Ankr Ankr QuickNode QuickNode QuickNode QuickNode Alchemy Alchemy
Service Level Free Developer Team Free Premium Discover Launch Build Scale Free Growth
Signup required Yes Yes Yes No No Yes Yes Yes Yes Yes Yes
Email Required Yes Yes Yes No No Yes Yes Yes Yes Yes Yes
Web3 Wallet needed No No No No Yes No No No No No No
Ethereum Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Eth (testnets) Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
Matic Yes w/credit card Yes w/credit card Yes w/credit card Yes Yes Yes Yes Yes Yes Yes Yes
Matic (testnet) unknown Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
HTTP Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
WSS Yes Yes Yes No Yes Yes Yes Yes Yes Yes Yes
Free Plan Yes n/a n/a Yes n/a Yes n/a n/a n/a Yes n/a
Endpoints 3 unknown unknown unknown ? 1 1 10 20 5 15
Requests limit 100,000/day 200,000/day 1,000,000/day 1M /day Unlimited 10M API Credits 300K 20M 60M 300M Compute Units (CU) 400M CU
Daily Request Limit Yes Yes Yes Soft Limit No Yes 150K No No No No
Auto-Scaling No Yes
Overages fees unknown unknown unknown unknown n/a n/a $0.10/10K $0.07/10K $0.05/10K n/a $1.20 / 1M CU
Rate limited unknown unknown unknown Yes No 25 requests / sec 25 requests / sec 100 requests / sec 300 requests / sec 330 CU / sec 660 CU / sec
Geo distributed unknown unknown unknown US/EU only Global distribution single region single region multiple regions global regions unknown unknown
Geo-steering unknown unknown unknown unknown Yes unknown Yes Yes Yes unknown unknown
Support Community Support direct direct Discord custom SLA Community Support Community Support 24 hr response time 8-12 hr response time 24/7 Discord Support 24/7 Dedicated Discord
Trial n/a No No n/a No n/a 7 days 7 days 7 days n/a n/a
Pricing Free $50/month $225/month Free pay-as-you-go Free $9/month (legacy) $49/month $299/month Free $49/month

While aggregating the data in the table above, I noticed that some provider’s plans specifically state that infrastructure is geo-distributed and that geo-steering requests to the closest (geographically) data center where the node cluster(s) reside is employed. None of these options are necessarily groundbreaking, however, not every provider offers these functions/features for every service level. And the purpose of this document is to find the best alternative to running your own load-balanced proxy.

Can’t Beat Free, Right?