Skip to content

Latest commit

 

History

History
55 lines (41 loc) · 3.11 KB

README.md

File metadata and controls

55 lines (41 loc) · 3.11 KB

Zookeeper HBase Agent 4 Haproxy - How it works

Context

Haproxy has the availability to call an external agent to check if the backend is available or not like describe here https://www.haproxy.com/blog/using-haproxy-as-an-api-gateway-part-3-health-checks/

The corresponding haproxy would be

frontend thrift
    bind :25003
    mode tcp
    option tcplog
    acl thrift_failover_up srv_is_up(thrift_failover/thrift1)
    acl thrift_production_up srv_is_up(thrift_production/thrift1)
    use_backend thrift_failover if thrift_failover_up !thrift_production_up
    default_backend thrift_production

backend thrift_failover
    mode tcp
    balance roundrobin
    server thrift1 195.154.166.122:9090 check

backend thrift_production
    mode tcp
    balance roundrobin
    server thrift1 195.154.166.121:9090 check agent-check agent-inter 5s agent-port 9999

Infra example

The idea would be to have an agent checking hbase zookeeper configuration to know if the backend is available.

Init

Available params

Every available params can be found by running zk-hbase4haproxy --help.

Params Description example
help Print usage guide. zk-hbase4haproxy help
--zkQuorum Zookeeper Quorum Connection to connect to zk-hbase4haproxy run --zkQuorum="localhost:2181/hbase"
--hbaseRsGroup Hbase' RS Group to monitor zk-hbase4haproxy run --zkQuorum="localhost:2181/hbase" --hbaseRsGroup=default
--manualRecovery The agent should not go back to UP status after a DOWN even if region server are available (default is false). zk-hbase4haproxy run --zkQuorum="localhost:2181/hbase" --manualRecovery
--port Port to listen haproxy incoming check (default is 9999) zk-hbase4haproxy run --zkQuorum="localhost:2181" --port=9000

Option manualRecovery is used in case you dont want the agent to send back UP after a DOWN period. It can be used to be sure the table you are working on are available. You could have a thousand reasons for the table to not open after a disaster. You would need extra step such as:

  • Fix table $ sudo -u hbase hbase hbck -fix <Table_Name>
  • Balance table $ echo balancer | hbase shell
  • Table major compaction $ echo "major_compact <Table_Name>" | hbase shell
  • Restart hbase master
  • Sometimes it could be a hdfs problem

When solution is fixed, to go back to UP status, you need to restart the agent.