-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run multiple nimbus-eth1 mainnet instances #193
Comments
10 instances each for mainnet / holesky / sepolia - the database takes 2-3 weeks to create, so we'll pre-seed the nodes with a pre-prepared database copy each instance needs about 300gb disk for the state - we should also think about setting it up in such a way that they have access to era1/era stores for historical block data (a single copy shared between the nodes) |
From a conversation with Jacek we can start with a setup like this and then grow from there:
The priority is on deploying |
|
Here is it's config template: Looks like it needs some additional configuration in regards of syncing (prepared database or era files). Found other config options here: |
We have those era files at the host:
Shall I point |
FYI Errors in
Metrics (
|
cc @mjfh can you take a look at this? see https://github.com/status-im/nimbus-eth2/blob/unstable/docs/logging.md for our logging levels - in particular, remote nodes doing strange things should never result in any logs above debug level - from the point of view of nimbus, it is "normal" for remote nodes to misbehave and we should have logic in place that deals with the misbehavior rather than raising the issue to the user via logs - ie these are expected conditions, that there exist nodes that do strange things so they are not errors, warnings or even info. |
Exporting era1 can be done like that:
where |
Shortcut for era1 files suggested by Jacek: Downloaded to: Checksums match the file they provide. |
Import from era files should be done like that I guess:
With current speed it should take ~1h to import. |
RPC API doesn't show much:
|
Syntactically, this is a valid, if minimalistic, response: https://ethereum.org/en/developers/docs/apis/json-rpc/#eth_syncing https://github.com/status-im/nimbus-eth1/blob/178d77ab310a79f3fa3a350d3546b607145a6aab/nimbus/core/chain/forked_chain.nim#L356-365 sets proc setHead(c: ForkedChainRef,
headHash: Hash256,
number: BlockNumber) =
# TODO: db.setHead should not read from db anymore
# all canonical chain marking
# should be done from here.
discard c.db.setHead(headHash)
# update global syncHighest
c.com.syncHighest = number but https://github.com/status-im/nimbus-eth1/blob/master/nimbus/nimbus_import.nim never calls
However, it reports syncing because server.rpc("eth_syncing") do() -> SyncingStatus:
## Returns SyncObject or false when not syncing.
# TODO: make sure we are not syncing
# when we reach the recent block
let numPeers = node.peerPool.connectedNodes.len
if numPeers > 0:
var sync = SyncObject(
startingBlock: w3Qty com.syncStart,
currentBlock : w3Qty com.syncCurrent,
highestBlock : w3Qty com.syncHighest
)
result = SyncingStatus(syncing: true, syncObject: sync)
else:
result = SyncingStatus(syncing: false) which isn't really correct. Having peers does not imply syncing. So the issue is basically that it's not syncing, but falsely showing that it is syncing. Its not syncing when connected to an EL is itself a bug but right now expected, known issue being addressed. I'm not sure I've seen the falsely-showing-syncing reported before. |
Progress
|
Stopped era/era1 import (would take 10 days). After starting the node, those appear in the logs:
|
BTW, this is how we run ❯ /docker/nimbus-eth1-mainnet-master/repo/build/nimbus \
--network=mainnet \
--data-dir='/docker/nimbus-eth1-mainnet-master/data/shared_mainnet_0' \
--nat=extip:194.33.40.70 \
--log-level=DEBUG \
--listen-address=0.0.0.0 \
--tcp-port=30304 \
--udp-port=30304 \
--max-peers=160 \
--discovery=V4 \
--jwt-secret=/docker/nimbus-eth1-mainnet-master/data/jwt.hex \
--rpc=true \
--ws=false \
--graphql=false \
--http-address=127.0.0.1 \
--http-port=8546 \
--rpc-api=eth,debug \
--engine-api=false \
--engine-api-ws=false \
--metrics=true \
--metrics-address=0.0.0.0 \
--metrics-port=9401 \
--era1-dir=/docker/era1 \
--era-dir=/data/era |
❯ ./rpc.sh eth_blockNumber
{
"jsonrpc": "2.0",
"id": 1,
"result": "0x1386e70"
}
❯ ./rpc.sh eth_syncing
{
"jsonrpc": "2.0",
"id": 1,
"result": false
} |
Turn these on/true |
Done! BTW, more interesting messages:
|
Do you know which block was involved here (if you have a block hash or number for example)? The issue is that only PoW blocks have uncles. PoS (post-merge) blocks don't. But, by that token, PoS blocks are supposed to have an ommersHash indicating this. So the question is, is it a PoW block which might have real uncles and there's a bug finding them, or is the bug in detecting that a PoS block should not have uncles. |
Hm, I'm not sure how to help, but you can check the logs yourself: There are quite a lot of such warning: ~22k yesterday, ~9k today already. |
BTW, logs are on Kibana also: |
Changed a port for BNs to use engine port, should be working now. Also made a PR for review: #199 |
Non-issue/overzealous logging: status-im/nimbus-eth1#2639 |
After discussing the layout with Dustin and Jacek we came to a conclusion that setups where multiple BNs with one EL are unsupported and should be avoided. Instead we will run two ELs per host, and half of BNs will run without. Branches In addition to that we want EL diversity, so we'll start with a split similar to
This probably should be implemented using a flag in the layout file, same way we enable validator clients with Is this correct @tersec ? |
@tersec need your review of new mainnet layout: It's changed considerably:
|
|
I can speak to the I've decided to drop it as nobody has talked about this in ages and it made the setup more complex. |
|
That's a good question, does the BN used for public API endpoint need to have an EL? If so we might have to make an exception. |
It should, yes |
this is still an important test that we should keep, since many of our users don't have a public ip and we want to catch regressions. |
Ok, I will keep this test. Previously it was for unstable and testing nodes (different hosts) with geth EL attached. |
I think we can achieve it not by changing already existing rules but by adding extra rules that deny access for specific ports. |
Merged the PR: Layout is mostly changed. What left:
|
Initially, these don't have to have validators attached to them, but function as a fourth backing EL in addition to Nethermind, Erigon, and Geth.
To facilitate syncing, it can be provided by a combination of era file syncing and/or a pre-prepared database synced close to current mainnet head.
The text was updated successfully, but these errors were encountered: