Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tutorial and example config files #42

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 29 additions & 9 deletions benchmarks/warm_benchmark.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -41,21 +41,41 @@ int main(int argc, char **argv) {
settings.resource_manager_address, settings.resource_manager_port,
*settings.device
);
if (!instance.connect()) {
spdlog::error("Connection to resource manager failed!");
return 1;
}

auto leased_executor = instance.lease(settings.benchmark.numcores, settings.benchmark.memory, *settings.device);
if (!leased_executor.has_value()) {
spdlog::error("Couldn't acquire a lease!");
return 1;
bool skip_resource_manager = !opts.executors_database.empty();

std::optional<rfaas::executor> leased_executor;
if (!skip_resource_manager) {

if (!instance.connect()) {
spdlog::error("Connection to resource manager failed!");
return 1;
}

leased_executor = instance.lease(settings.benchmark.numcores, settings.benchmark.memory, *settings.device);
if (!leased_executor.has_value()) {
spdlog::error("Couldn't acquire a lease!");
return 1;
}

} else {

std::ifstream in_cfg(opts.executors_database);
rfaas::servers::deserialize(in_cfg);
in_cfg.close();

leased_executor = instance.lease(rfaas::servers::instance(), settings.benchmark.numcores, settings.benchmark.memory);
if (!leased_executor.has_value()) {
spdlog::error("Couldn't acquire a lease!");
return 1;
}

Comment on lines +48 to +72
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider refactoring duplicated error handling in resource leasing

Both branches of the if statement handle leasing an executor and check for failure in a similar way. To improve maintainability and reduce code duplication, consider extracting common error handling into a separate function or restructuring the logic.

}

rfaas::executor executor = std::move(leased_executor.value());

if (!executor.allocate(opts.flib, opts.input_size,
settings.benchmark.hot_timeout, false)) {
settings.benchmark.hot_timeout, false, skip_resource_manager)) {
spdlog::error("Connection to executor and allocation failed!");
return 1;
}
Expand Down
4 changes: 2 additions & 2 deletions config/benchmark.json
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
{
"config": {
"rdma_device": "",
"rdma_device_port": 10005,
"rdma_device_port": 10006,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Unresolved References to Old RDMA Device Port (10005)

The old RDMA device port (10005) is still referenced in the following configuration files:

  • config/executors_database.json
  • config/executor_manager.json

Please update these references to the new port number (10006) to ensure consistency across the codebase.

🔗 Analysis chain

Please clarify the reason for changing the RDMA device port.

The RDMA device port has been changed from 10005 to 10006. While this change is straightforward, it's important to understand the reasoning behind it.

Could you please provide more context on why this change was necessary? Additionally, it would be helpful to document this change in a comment or in the project's documentation to ensure other developers are aware of the new port number.

To verify if this port change affects other parts of the system, you can run the following command:

This will help identify any other files that might need to be updated with the new port number.


Correction Needed: Update the search command for Python files.

The previous command failed because the python file type was not recognized. Please use the .py extension to search Python files instead.


Adjust the search command to include both configuration and Python files correctly.

The previous search did not find any references to 10005 because the file type filters were too restrictive. Please use the following command to search both configuration files and Python files for the old RDMA port number:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any references to the old port number (10005) in the codebase
rg --type-add 'config:*.{json,yaml,yml,toml}' --type config --type python '10005'

Length of output: 112


Script:

#!/bin/bash
# Search for any references to the old port number (10005) in the codebase
rg --type-add 'config:*.{json,yaml,yml,toml}' --type config -g '*.py' '10005'

Length of output: 226


Script:

#!/bin/bash
# Search for any references to the old port number (10005) in configuration and Python files
rg --type-add 'config:*.{json,yaml,yml,toml}' --type=config -g '*.py' '10005'
rg --type-add 'config:*.{json,yaml,yml,toml}' --type=config '10005'

Length of output: 406

"resource_manager_address": "",
"resource_manager_port": 0
},
"benchmark": {
"memory": 256,
"pin_threads": false,
"repetitions": 100,
"warmup_repetitions": 0,
"numcores": 1,
"hot_timeout": -1
}
}

7 changes: 4 additions & 3 deletions config/executor_manager.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
{
"config": {
"rdma_device": "",
"rdma_device_port": 10000,
"rdma_device_port": 10005,
"node_name": "exec-mgr-node",
"resource_manager_address": "",
"resource_manager_port": 0,
"resource_manager_secret": 0
"resource_manager_secret": 42,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Enhance security for the resource manager secret.

The resource_manager_secret has been changed from 0 to 42. While this is an improvement over using 0, consider the following:

  1. Use a more secure, randomly generated value instead of a predictable number like 42.
  2. Implement a secure method to distribute and update this secret.
  3. Consider using environment variables or a secure secret management system instead of hardcoding the value in the configuration file.

"rdma-sleep": true
},
"executor": {
"use_docker": false,
Expand All @@ -13,4 +15,3 @@
"pin_threads": false
}
}

16 changes: 9 additions & 7 deletions config/executors_database.json
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
{
"executors": [
{
"address": "",
"port": 10000,
"cores": 0
}
]
"executors": [
{
"node": "exec-mgr-node",
"address": "",
"port": 10005,
"cores": 1,
"memory": 512
}
]
}
9 changes: 6 additions & 3 deletions config/resource_manager.json
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
{
"config": {
"rdma_device": "",
"rdma_device_port": 0,
"http_network_address": "",
"http_network_port": 0
"rdma_device_port": 10000,
"http_network_address": "0.0.0.0",
"http_network_port": 5000,
"rdma-threads": 1,
"rdma-secret": 42,
"rdma-sleep": true
}
}
147 changes: 126 additions & 21 deletions docs/tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,71 @@ global management of billing and it distributes data on active executor servers
Here, we skip the deployment of resource manager for simplicity.
On small deployments with just few executor servers, we can bypass this step.

### RDMA Configuration

rFaaS currently supports InfiniBand and RoCE devices through the ibverbs library.
If you do not own such a device, you can still emulate it on a regular Ethernet network
with [SoftROCE](https://github.com/SoftRoCE). This kernel driver allows to create an emulated
RDMA device on top of a regular network. Of course, it won't be able to achieve the same performance
as a regular RDMA device but it will implement a similar set of functionalities.

The installation of [SoftROCE] should be straightforward on most modern Linux distributions,
and it should not be necessary to manually compile and install kernel modules.

For example, on Ubuntu-based distributions, you need to install `rdma-core`, `ibverbs-utils`.
Then, you can add a virtual RDMA device with the following command:

```
sudo rdma link add test type rxe netdev <netdev>
```

where `netdev` is the name of your Ethernet device used for emulation. You can check the device
has been created with the following command:

```
ibv_devices

device node GUID
------ ----------------
test 067bcbfffeb6f9f6

```

To fully check the configuration works, we can run a simple performance
test by using tools provided in the package: `perftest`. In one shell, open:

```
ib_write_bw
```

And in the second one, start:

```
ib_write_bw <ip>
```

Replace `<ip>` with the IP address of selected net device.
You should see the following output:

```
---------------------------------------------------------------------------------------
#bytes #iterations BW peak[MB/sec] BW average[MB/sec] MsgRate[Mpps]
Conflicting CPU frequency values detected: 3902.477000 != 400.000000. CPU Frequency is not max.
65536 5000 419.30 361.10 0.005778
---------------------------------------------------------------------------------------
```

Another tool used that can be used is `rping` available in `rmadcm-utils` package:

```
rping -s -a <ip-address> -v

rping -c -a <ip-address> -v
```

### Setup

We assume that rFaaS executor will be executed on one server, using the RDMA device `server_device`
and the network address `server_ip`.
We assume that rFaaS executor will be executed on one server, using the RDMA device `server_device` and the network address `server_ip`.
Then, we're going to deploy the benchmarker on another machine, using the RDMA device `benchmark_device` and the netwrok address `benchmark_ip`.

These four variables will be used when modifying JSON config files.
Expand Down Expand Up @@ -47,12 +108,7 @@ We can use default values for maximal size of inline messaged and the receive bu
}
```

To defines these automatically, we can use a jq one-liner:

```
jq --arg device "$server_device" --arg address "$server_ip" '.devices[0].name = $device | .devices[0].ip_address = $address' <src-dir>/config/devices.json > server_devices.json
jq --arg device "$client_device" --arg address "$client_ip" '.devices[0].name = $device | .devices[0].ip_address = $address' <src-dir>/config/devices.json > client_devices.json
```
To generate these automatically, we can the helper script `tools/device_generator.sh > devices.json`

### rFaaS function

Expand Down Expand Up @@ -96,10 +152,12 @@ and it needs to be extended with device.
{
"config": {
"rdma_device": "<rdma-device>",
"rdma_device_port": <device-port>,
"rdma_device_port": 10005,
"node_name": "exec-mgr-node",
"resource_manager_address": "",
"resource_manager_port": 0,
"resource_manager_secret": 0
"resource_manager_secret": 42,
"rdma-sleep": true
},
"executor": {
"use_docker": false,
Expand Down Expand Up @@ -129,8 +187,8 @@ After starting the manager, you should see the output similar to this:

```console
[13:12:31:629452] [P 425702] [T 425702] [info] Executing rFaaS executor manager!
[13:12:31:634632] [P 425702] [T 425702] [info] Listening on device rocep61s0, port 10006
[13:12:31:634674] [P 425702] [T 425702] [info] Begin listening at 192.168.0.21:10006 and processing events!
[13:12:31:634632] [P 425702] [T 425702] [info] Listening on device rocep61s0, port 10005
[13:12:31:634674] [P 425702] [T 425702] [info] Begin listening at 192.168.0.21:10005 and processing events!
```

### Benchmark Example
Expand All @@ -144,13 +202,15 @@ configuration from the previous step:

```json
{
"executors": [
{
"port": 10000,
"cores": 1,
"address": "<exec-mgr-address>"
}
]
"executors": [
{
"node": "exec-mgr-node",
"address": "<exec-mgr-address>"
"port": 10005,
"cores": 1,
"memory": 512
}
]
}
```

Expand All @@ -172,7 +232,7 @@ value describes the hot polling timeout in milliseconds.
{
"config": {
"rdma_device": "",
"rdma_device_port": 10005,
"rdma_device_port": 10006,
"resource_manager_address": "",
"resource_manager_port": 0
},
Expand All @@ -189,7 +249,7 @@ value describes the hot polling timeout in milliseconds.
We generate the configuration using the following command:

```
jq --arg device "$client_device" '.config.rdma_device = $device' ../repo2/config/benchmark.json > benchmark.json
jq --arg device "$client_device" '.config.rdma_device = $device' <src-dir>/config/benchmark.json > benchmark.json
```

To start a benchmark instance with the `name` functions from `examples/libfunctions.so`,
Expand All @@ -214,3 +274,48 @@ Data: 1

For details about this and other benchmarks, please take a look [at the documentation](benchmarks.md).

### Using Resource Manager

For large deployments, we deploy the resource manager to control leases.

An example of a configuration is available in `config/resource_manager.json`
and it needs to be extended with device selection.
By default, we assume that resource manager uses port 10000 for RDMA connections,
and port 5000 for the HTTP server.

```json
{
"config": {
"rdma_device": "",
"rdma_device_port": 10000,
"http_network_address": "0.0.0.0",
"http_network_port": 5000,
"rdma-threads": 1,
"rdma-secret": 42,
"rdma-sleep": true
}
}
```

We can use the following command to generate the configuration:

```
jq --arg device "$server_device" '.config.rdma_device = $device' <src-dir>/config/resource_manager.json > resource_manager.json
```

To start an instance of the resource manager, we use the following command:

```
bin/resource_manager -c config/resource_manager.json --device-database server_devices.json -i executors_database.json
```

Here, we populate the database of all executors by providing a JSON list generated previously: `-i executors_database.json`.
Alternatively, we can use the HTTP interface designed for integration with batch managers, and send a POST request that adds a new executor:

```
curl http://127.0.0.1:5000/add\?node\=exec-mgr-node -X POST -d '{"ip_address": "192.168.0.29", "port": 10005, "cores": 1, "memory": 512}'
```

Then, we only need to modify the configuration of `executor_manager.json` and `benchmark.json` to add the IP address and port of resource manager.
For executor manager, we remove the `--skip-resource-manager` flag. For benchmarker, we remove the `--executors-database executors_database.json`
parameter, as all executor data will now be handled by the resource manager.
4 changes: 4 additions & 0 deletions rfaas/include/rfaas/allocation.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,10 @@ namespace rfaas {
uint32_t func_buf_size;
int32_t listen_port;
char listen_address[16];

// Legacy support for skipping resource manager
int16_t cores = 0;
int32_t memory = 0;
};

struct LeaseStatus {
Expand Down
2 changes: 1 addition & 1 deletion rfaas/include/rfaas/client.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ namespace rfaas {

server_data instance = nodes_data.server(0);

return std::make_optional<rfaas::executor>(instance.address, instance.port, cores, memory, -1, _device);
return std::make_optional<rfaas::executor>(instance.address, instance.port, cores, memory, 0, _device);
}

private:
Expand Down
3 changes: 2 additions & 1 deletion rfaas/include/rfaas/executor.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -77,12 +77,13 @@ namespace rfaas {
~executor();

executor(executor&& obj);
executor& operator=(executor&& obj);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Ensure consistent implementation of special member functions in executor.

You have added a move assignment operator executor& operator=(executor&& obj);. Since executor manages resources and defines a custom destructor and move constructor, it's important to follow the Rule of Five. Please ensure that you also explicitly declare or delete the copy constructor and copy assignment operator to prevent unintended copying and resource management issues.


bool connect(const std::string & ip, int port);

// Skipping managers is useful for benchmarking
bool allocate(std::string functions_path, int max_input_size, int hot_timeout,
bool skip_manager = false, rdmalib::Benchmarker<5> * benchmarker = nullptr);
bool skip_manager = false, bool skip_resource_manager = false, rdmalib::Benchmarker<5> * benchmarker = nullptr);
void deallocate();
Comment on lines +86 to 87
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Incomplete Updates to executor::allocate Calls Identified

Several calls to executor.allocate are missing the new skip_resource_manager parameter. Please update the following locations to include the new parameter:

  • tests/basic_allocation_test.cpp (lines with executor.allocate()
  • benchmarks/parallel_invocations.cpp
  • benchmarks/cold_benchmark.cpp
  • benchmarks/cpp_interface.cpp
🔗 Analysis chain

Verify updates to all calls of allocate with the new parameter skip_resource_manager.

The allocate method's signature now includes a new parameter bool skip_resource_manager = false. Please ensure that all calls to this method in the codebase are updated to reflect this change. Additionally, consider reviewing the parameter names skip_manager and skip_resource_manager for potential clarity improvements to avoid confusion due to their similarity.

Run the following script to locate all calls to executor::allocate and verify they use the updated signature:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all calls to `executor::allocate` and display their usage.

rg --type cpp 'executor\.allocate\(' -A 2

Length of output: 1627

rdmalib::Buffer<char> load_library(std::string path);
void poll_queue();
Expand Down
6 changes: 4 additions & 2 deletions rfaas/lib/connection.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -122,13 +122,15 @@ namespace rfaas {
int rcv_buf, int max_inline_data):
_address(address),
_port(port),
_active(_address, _port, rcv_buf),
_rcv_buf_size(rcv_buf),
_send_buffer(1),
_receive_buffer(rcv_buf),
_max_inline_data(max_inline_data)
{
_active.allocate();
if(!address.empty()) {
_active = rdmalib::RDMAActive(_address, _port, rcv_buf);
_active.allocate();
}
Comment on lines +130 to +133
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Issues Found with _active Usage in resource_mgr_connection

The conditional initialization of _active based on a non-empty address introduces potential risks where _active may not be initialized. However, several methods within the resource_mgr_connection class directly use _active without verifying its initialization status. This can lead to undefined behavior or crashes if these methods are invoked when _active is not set.

Key Points to Address:

  • Methods such as connect(), disconnect(), is_connected(), and connection() use _active directly.
  • Ensure that these methods include appropriate checks to handle cases where _active is not initialized.
  • Review and update all usages of _active to include safeguards against uninitialized states.
🔗 Analysis chain

LGTM, but consider additional safeguards.

The conditional initialization of _active based on a non-empty address is a good improvement for flexible resource management. This change aligns well with the PR objective of updating the resource management mechanism.

However, please consider the following:

  1. Update the connect() method to handle cases where _active is not initialized.
  2. Add appropriate null checks or safeguards in other methods that use _active.
  3. Document the expected behavior of the object when the address is empty.

To ensure that all usages of _active are properly guarded, please run the following script:

This will help identify any potential issues with unguarded _active usage.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Find all usages of _active in the resource_mgr_connection class

# Test: Search for _active usages in resource_mgr_connection methods
ast-grep --lang cpp --pattern $'class resource_mgr_connection {
  $$$
  $_($_) {
    $$$
    _active$_
    $$$
  }
  $$$
}'

Length of output: 469


Script:

#!/bin/bash
# Description: Find all usages of _active in the resource_mgr_connection class with a corrected ast-grep pattern

ast-grep --lang cpp --pattern $'class resource_mgr_connection {
  $$$
  $$method_declaration {
    $$$
    _active$$$
    $$$
  }
  $$$
}'

Length of output: 484


Script:

#!/bin/bash
# Description: Find all usages of _active in the resource_mgr_connection class using ripgrep

# Search for '_active' within the resource_mgr_connection class
rg '_active' rfaas/lib/connection.cpp --context 2

Length of output: 3346

}

bool resource_mgr_connection::connect()
Expand Down
Loading