Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

status api: Get engine type api #11

Open
wants to merge 680 commits into
base: master
Choose a base branch
from

Conversation

tonyxuqqi
Copy link

What is changed and how it works?

Issue Number: Ref tikv#12842

What's Changed:

Add engine_type API in status server.
Returns "raft-kv" if it's v1 and "partitioned-raft-kv" if it's v2.

Related changes

  • PR to update pingcap/docs/pingcap/docs-cn:
  • Need to cherry-pick to the release branch

Check List

Tests

  • Unit test

Release note

Add a new api in tikv status server.
The api is  GET  /engine_type and it returns "raft-kv" and "partitioned-raft-kv" respectively for storage engine "raft-kv" and "partitioned-raft-kv". Note that it will return the actual engine type, not necessarily the one in the configuration. 

YuJuncen and others added 30 commits December 27, 2022 14:46
close tikv#13941, ref pingcap/tidb#39620

- If failed to get initial snapshot, remove the subscription as soon as possible. 
- Added a cache of getting checkpoint. This cache is lease-based -- the lease time is simply the tick interval of the coordinator. 
- Make the channel size huger for don't blocking the main loop when many regions migrating.

Signed-off-by: hillium <[email protected]>
Signed-off-by: hillium <[email protected]>
Signed-off-by: 山岚 <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#12842

Signed-off-by: tabokie <[email protected]>
Signed-off-by: Xinye Tao <[email protected]>
ref tikv#12842

Implement periodical purge in v2.

Signed-off-by: tabokie <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#12842

If operations like snapshot, split, are aborted by restart, they needs
to be either resumed or cleanup. This PR checks for garbage after
restart and resume committed operations.

Signed-off-by: Jay Lee <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#12876

Signed-off-by: Wenbo Zhang <[email protected]>
Signed-off-by: Zwb <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#12842

a few panic fix
1) update_approximate_raft_log_size may run into divid by zero error 
2) appy_delete may have None write_batch
3) StoreMeta::set_region may run into region corruption error if it's destroyed and re-created. 
4) TabletSnapManager's snapshot size calculation may throw Other error.

Signed-off-by: qi.xu <[email protected]>
Signed-off-by: Jay Lee <[email protected]>

Co-authored-by: qi.xu <[email protected]>
Co-authored-by: Jay Lee <[email protected]>
Co-authored-by: Ti Chi Robot <[email protected]>
…13995)

ref tikv#12842

Whenever timeout, the peer will check for unapplied logs whether
there are pending conf change and trigger heavy reads. So we
wait till most logs are applied before ticking.

It also fix following issues:
- PersistenceListener is not installed
- implementation of persisted_apply_index is wrong
- parse tablet name is wrong

Signed-off-by: Jay Lee <[email protected]>
close tikv#13997

Support to use evict_entry_cache when restart node.

Signed-off-by: tabokie <[email protected]>
Signed-off-by: hongyunyan <[email protected]>
Signed-off-by: Xinye Tao <[email protected]>
Signed-off-by: Jay Lee <[email protected]>
Signed-off-by: Wenbo Zhang <[email protected]>
Signed-off-by: Zwb <[email protected]>

Co-authored-by: Xinye Tao <[email protected]>
Co-authored-by: Jay <[email protected]>
Co-authored-by: Zwb <[email protected]>
Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#12842

The API is supposed to be used with `append` but nowhere can we find
the clue. This PR merges `cut_logs` and `append` to reduce confusion
and mistakes.

Signed-off-by: Jay Lee <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#12842

Publish tablet in apply thread is unsafe. This PR moves the operation to
raftstore. It also fixes the issues that applying two splits at a time can
cause panic. It also makes sure cache will be cleared after tablet is published.

Signed-off-by: Jay Lee <[email protected]>
ref tikv#12842

These two are helpers to utilize the static KV pairs in logger. In the
past, we use `logger.list()` to try to format the configured KV pairs,
but it will not work as values are omitted.

Signed-off-by: Jay Lee <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#13730

Introduce priority-based channel

Signed-off-by: Connor1996 <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#12842

When the tablet contains dirty data right after split, generating snapshot may
just a waste. On the other hand, split usually happens on all peers, so delay
it a bit actually makes all peers more likely to be initialized by split. So
this PR rejects generating snapshot when it detects it still has dirty data.

Signed-off-by: Jay Lee <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#12842

1. store heartbeat should add snapshot and kv engine used size

Signed-off-by: bufferflies <[email protected]>

Co-authored-by: Xinye Tao <[email protected]>
ref tikv#12842

Make apply adaptive to reduce high tail latency.

Signed-off-by: Jay Lee <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#12842

- add water metrics
- fix potential panic when destroying a peer
- fix incorrect store size

Signed-off-by: Jay Lee <[email protected]>
* util: Fix incorrect memory capacity

Signed-off-by: Wish <[email protected]>

* Fix lints

Signed-off-by: Wish <[email protected]>

* Check capacity with /proc/meminfo

Signed-off-by: Wish <[email protected]>

Signed-off-by: Wish <[email protected]>
* hotfix kvproto for global config

Signed-off-by: husharp <[email protected]>

* make format happy

Signed-off-by: husharp <[email protected]>

Signed-off-by: husharp <[email protected]>
Co-authored-by: Ti Chi Robot <[email protected]>
…ock request (tikv#14037)

close tikv#14038, close pingcap/tidb#40114

Fixes the problem that when handling repeated acquire_pessimistic_lock requests is recevied, should_not_exist is ignored. 

TiKV provides idempotency for these RPC requests, but for acquire_pessimistic_lock, it ignored the possibility that the client may expect a pessimistic_rollback between two acquire_pessimistic_lock request on the same key. In this case the second request may come from another statement and carries `should_not_exist` that wasn't set in the previously finished pessimistic lock request. If the first request successfully acquired the lock and the pessimistic_rollback failed, TiKV may return a sucessful response, making the client believe that the key doesn't exist before. In some rare cases, this has risk to cause data inconsistency.

Signed-off-by: MyonKeminta <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#12842

This PR fixes several bugs and metrics:
- Now waterfall timer will be reset in before_write, the goal is to solve
    the confusion that stall writes can pollute the whole waterfall metrics.
- Perf context is changed not to be associated with engine instance. Perf
      context is thread local and instance independent under the hook.
- Fix flushed index advance failure due to suspicious flush.
- Support print long uncommitted logs and fix incorrect commit time

Signed-off-by: Jay Lee <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
ref tikv#12842

Move transaction related code to txn_ext.rs.

Fix the bug that snapshot doesn't set term and extra_op.

Signed-off-by: Jay Lee <[email protected]>
ref tikv#12842

1) add snapshot apply metrics
2) disable bloomfilter for raftkv-v2 for now until a proper ratio is found 
3) disable rocksdb write stall for raftkv-v2 until the tablet flow control is fully verified.

Signed-off-by: Qi Xu <[email protected]>

Co-authored-by: Qi Xu <[email protected]>
ref tikv#12876

fix witness raft log gc panic and refactor

Signed-off-by: Wenbo Zhang <[email protected]>

Co-authored-by: Xinye Tao <[email protected]>
ref tikv#12999

copr: support handling keyspace request

Signed-off-by: iosmanthus <[email protected]>
ref tikv#13730

Support priority-based scheduling for the scheduler worker pool.

Signed-off-by: Connor1996 <[email protected]>

Co-authored-by: Xinye Tao <[email protected]>
lijie and others added 30 commits April 20, 2023 12:06
ref tikv#12842

Add some configurations for RocksDB filter enhancements

Signed-off-by: tabokie <[email protected]>
…h importing data keys (tikv#14583)

ref tikv#12842, ref tikv#14095, ref tikv#14097

support renaming encrypted dir (inefficiently) and batch importing data keys

Signed-off-by: tabokie <[email protected]>
close tikv#14581

store heartbeat will report sending/recving count to the pd .

Signed-off-by: bufferflies <[email protected]>
ref tikv#14579

raftstore,server: add enable_v2_compatible_learner config

The new config is added to clean up hard code tiflash check

Signed-off-by: Neil Shen <[email protected]>
…LUE_LEN` (tikv#14618)

close tikv#14619

fix a bug with `process_old_collation_kv` function. 

related with tikv#11931, forget process `physical_table_id_column_cnt` in process_old_collation_kv function

Signed-off-by: Jason Mo <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
…ore-v2 (tikv#14584)

ref tikv#14579

enable raftstore-v1 apply tablet snapshot sent from raftstore-v2

Signed-off-by: Spade A <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
…#14574)

ref tikv#14547

raft: peers shouldn't hibernate incorrectly when one node fails

Signed-off-by: qupeng <[email protected]>

Co-authored-by: Ti Chi Robot <[email protected]>
* raftstore-v2: prevent resolving store 0

Do not cache invaild peer otherwise it may send raft message to
store 0 during region split.

Signed-off-by: Neil Shen <[email protected]>

* address comments

Signed-off-by: Neil Shen <[email protected]>

---------

Signed-off-by: Neil Shen <[email protected]>
* done

Signed-off-by: Spade A <[email protected]>

* add panic

Signed-off-by: Spade A <[email protected]>

---------

Signed-off-by: Spade A <[email protected]>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
…time (tikv#14530)

ref tikv#14321

Add the apply log duration metrics.

Signed-off-by: tonyxuqqi <[email protected]>
close tikv#14595

Make tiflash engine compatible with gc peer

Signed-off-by: Neil Shen <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#14630

cdc: support filter lossy DDL changes. We don't need to send those changes downstream.

Signed-off-by: hi-rustin <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
close tikv#14609

Update the Azure SDK to latest version to support later developments.

Signed-off-by: LykxSassinator <[email protected]>
close tikv#14224

Fix fd leak caused by continuous profiling

Signed-off-by: tabokie <[email protected]>
… requests (tikv#14637)

close tikv#14636, ref pingcap/tidb#42937

Makes TiKV support checking whether the lock is primary when handling check_txn_status.

Signed-off-by: MyonKeminta <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: tonyxuqqi <[email protected]>
close tikv#14664

Fix stale read by correct updating peers

Signed-off-by: Neil Shen <[email protected]>

Co-authored-by: tonyxuqqi <[email protected]>
close tikv#14658

Record the missing check_leader gRPC metrics.

Signed-off-by: you06 <[email protected]>
…size (tikv#14625)

ref tikv#12842

1) optimize the load based split config based on region size
2) polish a log message when it cannot find a target peer of the message.

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#14575

Support decode simple write request in v1

Signed-off-by: lidezhu <[email protected]>

Co-authored-by: Xinye Tao <[email protected]>
ref tikv#12842

support dynamically adjusting write buffer settings

Signed-off-by: tabokie <[email protected]>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ref tikv#14575

Make snapshot_meta accessible

Signed-off-by: CalvinNeo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.