Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for hostgroup_attributes for AWS Aurora auto-discovery #4279

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
ef3d6bd
Fix DEBUG 'conn_unregister' flow for GR monitoring with async_handlers
JavierJF May 8, 2023
af80944
Add support for Group Replication (GR) autodiscovery
JavierJF May 8, 2023
4ee5c8b
Improve recovery (OFFLINE_HARD) optimization for GR autodiscovered se…
JavierJF May 8, 2023
549a828
Improve error reporting for invalid monitoring GR resultsets
JavierJF May 9, 2023
e7a477a
Merge branch 'v2.x' of github.com:sysown/proxysql into v2.x-aurora_au…
JavierJF Jun 8, 2023
bf5d8cb
Add functions for breaking down 'commit' checksum generation
JavierJF Jun 28, 2023
9e3ab51
Add helper functions for server creation/destruction in hostgroup
JavierJF Jun 28, 2023
4524bcb
Fix AWS Aurora replicas not being SHUNNED due to replication lag
JavierJF Jun 29, 2023
72cfd24
Fix AWS Aurora new writer not honoring 'new_reader_weight'
JavierJF Jun 29, 2023
2d5359a
Fix invalid servers removal when present in multiple AWS Aurora clusters
JavierJF Jun 29, 2023
813355c
Honor hostgroup attributes for AWS Aurora auto-discovery
JavierJF Jun 29, 2023
3d06427
Improve simulator support for AWS Aurora
JavierJF Jun 30, 2023
efe83fe
Fix propagation of 'OFFLINE_HARD' servers due to 'read_only_action_v2'
JavierJF Jun 30, 2023
97463ca
Fix invalid use of SHUNNED servers for checksum computation in 'read_…
JavierJF Jun 30, 2023
cfa6d89
Merge branch 'v2.x' of github.com:sysown/proxysql into v2.x-aurora_au…
JavierJF Jun 30, 2023
3fb6ab9
Fix compilation on Centos 6 due to old GCC limitations with 'auto'
JavierJF Jul 5, 2023
d13dda0
Merge branch 'v2.x' of github.com:sysown/proxysql into v2.x-aurora_au…
JavierJF Jul 12, 2023
743a7f8
Remove outdated commented code from Aurora implementation
JavierJF Jul 19, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 13 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,13 @@ default: build_src
.PHONY: debug
debug: build_src_debug

.PHONY: testaurora_random
testaurora_random: build_src_testaurora_random

.PHONY: testaurora
testaurora: build_src_testaurora
cd test/tap && OPTZ="${O0} -ggdb -DDEBUG -DTEST_AURORA" CC=${CC} CXX=${CXX} ${MAKE}
cd test/tap/tests && OPTZ="${O0} -ggdb -DDEBUG -DTEST_AURORA" CC=${CC} CXX=${CXX} ${MAKE} $(MAKECMDGOALS)
# cd test/tap && OPTZ="${O0} -ggdb -DDEBUG -DTEST_AURORA" CC=${CC} CXX=${CXX} ${MAKE}
# cd test/tap/tests && OPTZ="${O0} -ggdb -DDEBUG -DTEST_AURORA" CC=${CC} CXX=${CXX} ${MAKE} $(MAKECMDGOALS)

.PHONY: testgalera
testgalera: build_src_testgalera
Expand Down Expand Up @@ -128,10 +131,18 @@ build_lib_debug: build_deps_debug
build_src_testaurora: build_lib_testaurora
cd src && OPTZ="${O0} -ggdb -DDEBUG -DTEST_AURORA" CC=${CC} CXX=${CXX} ${MAKE}

.PHONY: build_src_testaurora_random
build_src_testaurora_random: build_lib_testaurora_random
cd src && OPTZ="${O0} -ggdb -DDEBUG -DTEST_AURORA -DTEST_AURORA_RANDOM" CC=${CC} CXX=${CXX} ${MAKE}

.PHONY: build_lib_testaurora
build_lib_testaurora: build_deps_debug
cd lib && OPTZ="${O0} -ggdb -DDEBUG -DTEST_AURORA" CC=${CC} CXX=${CXX} ${MAKE}

.PHONY: build_lib_testaurora_random
build_lib_testaurora_random: build_deps_debug
cd lib && OPTZ="${O0} -ggdb -DDEBUG -DTEST_AURORA -DTEST_AURORA_RANDOM" CC=${CC} CXX=${CXX} ${MAKE}

.PHONY: build_src_testgalera
build_src_testgalera: build_lib_testgalera
cd src && OPTZ="${O0} -ggdb -DDEBUG -DTEST_GALERA" CC=${CC} CXX=${CXX} ${MAKE}
Expand Down
121 changes: 121 additions & 0 deletions include/MySQL_HostGroups_Manager.h
Original file line number Diff line number Diff line change
Expand Up @@ -432,6 +432,27 @@ enum REPLICATION_LAG_SERVER_T {
RLS__SIZE
};

/**
* @brief Contains the minimal info for server creation.
*/
struct srv_info_t {
/* @brief Server address */
string addr;
/* @brief Server port */
uint16_t port;
/* @brief Server type identifier, used for logging, e.g: 'Aurora AWS', 'GR', etc... */
string kind;
};

/**
* @brief Contains options to be specified during server creation.
*/
struct srv_opts_t {
int64_t weigth;
int64_t max_conns;
int32_t use_ssl;
};

class MySQL_HostGroups_Manager {
private:
SQLite3DB *admindb;
Expand Down Expand Up @@ -534,8 +555,26 @@ class MySQL_HostGroups_Manager {
MySQL_HostGroups_Manager* myHGM;
};

/**
* @brief Used by 'MySQL_Monitor::read_only' to hold a mapping between servers and hostgroups.
* @details The hostgroup mapping holds the MySrvC for each of the hostgroups in which the servers is
* present, distinguishing between 'READER' and 'WRITER' hostgroups.
*/
std::unordered_map<std::string, std::unique_ptr<HostGroup_Server_Mapping>> hostgroup_server_mapping;
/**
* @brief Holds the previous computed checksum for 'mysql_servers'.
* @details Used to check if the servers checksums has changed during 'commit', if a change is detected,
* the member 'hostgroup_server_mapping' is required to be regenerated.
*
* This is only updated during 'read_only_action_v2', since the action itself modifies
* 'hostgroup_server_mapping' in case any actions needs to be performed against the servers.
*/
uint64_t hgsm_mysql_servers_checksum = 0;
/**
* @brief Holds the previous checksum for the 'MYSQL_REPLICATION_HOSTGROUPS'.
* @details Used during 'commit' to determine if config has changed for 'MYSQL_REPLICATION_HOSTGROUPS',
* and 'hostgroup_server_mapping' should be rebuild.
*/
uint64_t hgsm_mysql_replication_hostgroups_checksum = 0;


Expand Down Expand Up @@ -585,6 +624,17 @@ class MySQL_HostGroups_Manager {
SQLite3_result *incoming_replication_hostgroups;

void generate_mysql_group_replication_hostgroups_table();
/**
* @brief Regenerates the resultset used by 'MySQL_Monitor' containing the servers to be monitored.
* @details This function is required to be called after any action that results in the addition of a new
* server that 'MySQL_Monitor' should be aware of for 'group_replication', i.e. a server added to the
* hostgroups present in any entry of 'mysql_group_replication_hostgroups'. E.g:
* - Inside 'generate_mysql_group_replication_hostgroups_table'.
* - Autodiscovery.
*
* NOTE: This is a common pattern for all the clusters monitoring.
*/
void generate_mysql_group_replication_hostgroups_monitor_resultset();
SQLite3_result *incoming_group_replication_hostgroups;

pthread_mutex_t Group_Replication_Info_mutex;
Expand Down Expand Up @@ -748,7 +798,10 @@ class MySQL_HostGroups_Manager {
void wrlock();
void wrunlock();
int servers_add(SQLite3_result *resultset);
std::string gen_global_mysql_servers_checksum();
bool commit(SQLite3_result* runtime_mysql_servers = nullptr, const std::string& checksum = "", const time_t epoch = 0);
void commit_generate_mysql_servers_table(SQLite3_result* runtime_mysql_servers = nullptr);
void commit_update_checksum_from_mysql_servers(SpookyHash& myhash, bool& init);
void commit_update_checksums_from_tables(SpookyHash& myhash, bool& init);
void CUCFT1(SpookyHash& myhash, bool& init, const string& TableName, const string& ColumnName, uint64_t& raw_checksum); // used by commit_update_checksums_from_tables()

Expand Down Expand Up @@ -787,6 +840,34 @@ class MySQL_HostGroups_Manager {
MyHGC * MyHGC_lookup(unsigned int);

void MyConn_add_to_pool(MySQL_Connection *);
/**
* @brief Creates a new server in the target hostgroup if isn't already present.
* @details If the server is found already in the target hostgroup, no action is taken, unless its status
* is 'OFFLINE_HARD'. In case of finding it as 'OFFLINE_HARD':
* 1. Server hostgroup attributes are reset to known values, so they can be updated.
* 2. Server attributes are updated to either table definition values, or hostgroup 'servers_defaults'.
* 3. Server is bring back as 'ONLINE'.
* @param hid The hostgroup in which the server is to be created (or to bring it back as 'ONLINE').
* @param srv_info Basic server info to be used during creation.
* @param srv_opts Server creation options.
* @return 0 in case of success, -1 in case of failure.
*/
int create_new_server_in_hg(uint32_t hid, const srv_info_t& srv_info, const srv_opts_t& srv_opts);
/**
* @brief Completely removes server from the target hostgroup if found.
* @details Several actions are taken if server is found:
* - Set the server as 'OFFLINE_HARD'.
* - Drop all current FREE connections to the server.
* - Delete the server from the 'myhgm.mysql_servers' table.
*
* This later step is not required if the caller is already going to perform a full deletion of the
* servers in the target hostgroup. Which is a common operation during table regeneration.
* @param hid Target hostgroup id.
* @param addr Target server address.
* @param port Target server port.
* @return 0 in case of success, -1 in case of failure.
*/
int remove_server_in_hg(uint32_t hid, const string& addr, uint16_t port);

MySQL_Connection * get_MyConn_from_pool(unsigned int hid, MySQL_Session *sess, bool ff, char * gtid_uuid, uint64_t gtid_trxid, int max_lag_ms);

Expand All @@ -812,6 +893,33 @@ class MySQL_HostGroups_Manager {
void update_group_replication_set_offline(char *_hostname, int _port, int _writer_hostgroup, char *error);
void update_group_replication_set_read_only(char *_hostname, int _port, int _writer_hostgroup, char *error);
void update_group_replication_set_writer(char *_hostname, int _port, int _writer_hostgroup);
/**
* @brief Tries to add a new server found during GR autodiscovery to the supplied hostgroup.
* @details For adding the new server, several actions are performed:
* 1. Lookup the target server in the corresponding MyHGC for the supplied hostgroup.
* 2. If server is found, and it's status isn't 'OFFLINE_HARD' do nothing. Otherwise:
* - If server is found as 'OFFLINE_HARD', reset the internal values corresponding to
* 'servers_defaults' values to '-1', update the defaulted values to the ones in its 'MyHGC', lastly
* re-enable the server and log the action.
* - If server isn't found, create it in the corresponding reader hostgroup of the supplied writer
* hostgroup, setting all 'servers_defaults' params as '-1', log the action.
* - After any of the two previous actions, always regenerate servers data structures.
*
* NOTE: Server data structures regeneration requires:
* 1. Purging the 'mysql_servers_table' (Lazy removal of 'OFFLINE_HARD' servers.)
* 2. Regenerate the actual 'myhgm::mysql_servers' table from memory structures.
* 3. Update the 'mysql_servers' resultset used for monitoring. This resultset is used for general
* monitoring actions like 'ping', 'connect'.
* 4. Regenerate the specific resultset for 'Group Replication' monitoring. This resultset is the way to
* communicate back to the main monitoring thread that servers config has changed, and a new thread
* shall be created with the new servers config. This same principle is used for Aurora.
*
* @param _host Server address.
* @param _port Server port.
* @param _wr_hg Writer hostgroup of the cluster being monitored. Autodiscovered servers are always added
* to the reader hostgroup by default, later monitoring actions will re-position the server is required.
*/
void update_group_replication_add_autodiscovered(const std::string& _host, int _port, int _wr_hg);
void converge_group_replication_config(int _writer_hostgroup);
/**
* @brief Set the supplied server as SHUNNED, this function shall be called
Expand Down Expand Up @@ -850,6 +958,19 @@ class MySQL_HostGroups_Manager {
bool aws_aurora_replication_lag_action(int _whid, int _rhid, char *server_id, float current_replication_lag_ms, bool enable, bool is_writer, bool verbose=true);
void update_aws_aurora_set_writer(int _whid, int _rhid, char *server_id, bool verbose=true);
void update_aws_aurora_set_reader(int _whid, int _rhid, char *server_id);
/**
* @brief Updates the resultset and corresponding checksum used by Monitor for AWS Aurora.
* @details This is required to be called when:
* - The 'mysql_aws_aurora_hostgroups' table is regenerated (via 'commit').
* - When new servers are discovered, and created in already monitored Aurora clusters.
*
* The resultset holds the servers that are present in 'mysql_servers' table, and share hostgroups with
* the **active** clusters specified in 'mysql_aws_aurora_hostgroups'. See query
* 'SELECT_AWS_AURORA_SERVERS_FOR_MONITOR'.
* @param lock Wether if both 'AWS_Aurora_Info_mutex' and 'MySQL_Monitor::aws_aurora_mutex' mutexes should
* be taken or not.
*/
void update_aws_aurora_hosts_monitor_resultset(bool lock=false);

SQLite3_result * get_stats_mysql_gtid_executed();
void generate_mysql_gtid_executed_tables();
Expand Down
24 changes: 24 additions & 0 deletions include/MySQL_Monitor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,14 @@ A single AWS_Aurora_monitor_node will have a AWS_Aurora_status_entry per check.

*/

#ifdef TEST_AURORA

#define TEST_AURORA_MONITOR_BASE_QUERY \
"SELECT SERVER_ID, SESSION_ID, LAST_UPDATE_TIMESTAMP, REPLICA_LAG_IN_MILLISECONDS, CPU"\
" FROM REPLICA_HOST_STATUS ORDER BY SERVER_ID "

#endif

class AWS_Aurora_replica_host_status_entry {
public:
char * server_id = NULL;
Expand Down Expand Up @@ -200,6 +208,17 @@ enum class MySQL_Monitor_State_Data_Task_Result {
TASK_RESULT_PENDING
};

/**
* @brief Holds the info from a GR server definition.
*/
struct gr_host_def_t {
string host;
int port;
int use_ssl;
bool writer_is_also_reader;
int max_transactions_behind;
int max_transactions_behind_count;
};

class MySQL_Monitor_State_Data {
public:
Expand Down Expand Up @@ -237,6 +256,11 @@ class MySQL_Monitor_State_Data {
* @details Currently only used by 'group_replication'.
*/
uint64_t init_time = 0;
/**
* @brief Used by GroupReplication to determine if servers reported by cluster 'members' are already monitored.
* @details This way we avoid non-needed locking on 'MySQL_HostGroups_Manager' for server search.
*/
const std::vector<gr_host_def_t>* cur_monitored_gr_srvs = nullptr;

MySQL_Monitor_State_Data(MySQL_Monitor_State_Data_Task_Type task_type, char* h, int p, bool _use_ssl = 0, int g = 0);
~MySQL_Monitor_State_Data();
Expand Down
10 changes: 8 additions & 2 deletions include/SQLite3_Server.h
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
#include "proxysql.h"
#include "cpp.h"
#include <vector>
#include <string>

class SQLite3_Session {
public:
Expand All @@ -14,7 +15,7 @@ class SQLite3_Session {
};

#ifdef TEST_GROUPREP
using group_rep_status = std::tuple<bool, bool, uint32_t>;
using group_rep_status = std::tuple<bool, bool, uint32_t, std::string>;
#endif

class SQLite3_Server {
Expand Down Expand Up @@ -70,7 +71,12 @@ class SQLite3_Server {
unsigned int num_aurora_servers[3];
unsigned int max_num_aurora_servers;
pthread_mutex_t aurora_mutex;
void populate_aws_aurora_table(MySQL_Session *sess);
/**
* @brief Handles queries to table 'REPLICA_HOST_STATUS'.
* @details This function needs to be called with lock on mutex aurora_mutex already acquired.
* @param sess The session which request is to be handled.
*/
void populate_aws_aurora_table(MySQL_Session *sess, uint32_t whg);
void init_aurora_ifaces_string(std::string& s);
#endif // TEST_AURORA
#ifdef TEST_GALERA
Expand Down
2 changes: 2 additions & 0 deletions include/proxysql_admin.h
Original file line number Diff line number Diff line change
Expand Up @@ -544,6 +544,8 @@ class ProxySQL_Admin {

#ifdef TEST_AURORA
void enable_aurora_testing();
void enable_aurora_testing_populate_mysql_servers();
void enable_aurora_testing_populate_mysql_aurora_hostgroups();
#endif // TEST_AURORA

#ifdef TEST_GALERA
Expand Down
14 changes: 2 additions & 12 deletions include/proxysql_glovars.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,27 +5,17 @@
#define CLUSTER_SYNC_INTERFACES_MYSQL "('mysql-interfaces')"

#include <memory>
#include <string.h>
#include <prometheus/registry.h>

#include "configfile.hpp"
#include "proxy_defines.h"
#include "proxysql_utils.h"

namespace ez {
class ezOptionParser;
};

/**
* @brief Helper function used to replace spaces and zeros by '0' char in the supplied checksum buffer.
* @param checksum Input buffer containing the checksum.
*/
inline void replace_checksum_zeros(char* checksum) {
for (int i=2; i<18; i++) {
if (checksum[i]==' ' || checksum[i]==0) {
checksum[i]='0';
}
}
}

#ifndef ProxySQL_Checksum_Value_LENGTH
#define ProxySQL_Checksum_Value_LENGTH 20
#endif
Expand Down
43 changes: 43 additions & 0 deletions include/proxysql_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
#define __PROXYSQL_UTILS_H

#include <cstdarg>
#include <functional>
#include <type_traits>
#include <memory>
#include <string>
Expand All @@ -11,6 +12,12 @@
#include <dirent.h>
#include <sys/resource.h>

#include "sqlite3db.h"

#ifndef ProxySQL_Checksum_Value_LENGTH
#define ProxySQL_Checksum_Value_LENGTH 20
#endif

#ifndef ETIME
// ETIME is not defined on FreeBSD
// ETIME is used internaly to report API timer expired
Expand Down Expand Up @@ -206,8 +213,44 @@ uint64_t get_timestamp_us();
*/
std::string replace_str(const std::string& str, const std::string& match, const std::string& repl);

/**
* @brief Split a string into a vector of strings with the provided 'char' delimiter.
* @param s String to be split.
* @param delimiter Delimiter to be used.
* @return Vector with the string splits. Empty if none is found.
*/
std::vector<std::string> split_str(const std::string& s, char delimiter);

std::string generate_multi_rows_query(int rows, int params);

void close_all_non_term_fd(std::vector<int> excludeFDs);

/**
* @brief Helper function used to replace spaces and zeros by '0' char in the supplied checksum buffer.
* @param checksum Input buffer containing the checksum.
*/
inline void replace_checksum_zeros(char* checksum) {
for (int i=2; i<18; i++) {
if (checksum[i]==' ' || checksum[i]==0) {
checksum[i]='0';
}
}
}

/**
* @brief Generates a ProxySQL checksum as a string from the supplied integer hash.
* @param hash The integer hash to be formated as a string.
* @return String representation of the supplied hash.
*/
std::string get_checksum_from_hash(uint64_t hash);

/**
* @brief Remove the rows from the resultset matching the supplied predicate.
* @param resultset The resultset which rows are to be removed.
* @param pred Predicate that should return 'true' for the rows to be removed.
*/
void remove_sqlite3_resultset_rows(
std::unique_ptr<SQLite3_result>& resultset, const std::function<bool(SQLite3_row*)>& pred
);

#endif
Loading