Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(functions): Add support for REST based remote functions #10911

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Joe-Abraham
Copy link
Contributor

@Joe-Abraham Joe-Abraham commented Sep 2, 2024

Fixes - #11036

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 2, 2024
@Joe-Abraham Joe-Abraham changed the title Add support for REST based remote functions [WIP] Add support for REST based remote functions Sep 2, 2024
Copy link

netlify bot commented Sep 2, 2024

Deploy Preview for meta-velox canceled.

Name Link
🔨 Latest commit 36f7abb
🔍 Latest deploy log https://app.netlify.com/sites/meta-velox/deploys/678dcdba923b010008ff11e8

@Joe-Abraham Joe-Abraham force-pushed the udf branch 12 times, most recently from b88d136 to b85e0e6 Compare September 4, 2024 09:32
@Yuhta Yuhta requested review from pedroerp and mbasmanova September 4, 2024 18:28
@Joe-Abraham Joe-Abraham force-pushed the udf branch 6 times, most recently from 0cd4510 to 74023dc Compare September 9, 2024 08:10
@pedroerp
Copy link
Contributor

pedroerp commented Sep 9, 2024

Pretty cool! I see the PR is still as draft, but I can help review when it's ready. Would also be nice to add some documentation on how to use it, the configs parameters, etc.

@Joe-Abraham Joe-Abraham force-pushed the udf branch 3 times, most recently from abe87e1 to 6c1606e Compare September 13, 2024 05:06
@Joe-Abraham Joe-Abraham force-pushed the udf branch 3 times, most recently from 05115f4 to 2ffec26 Compare September 20, 2024 05:18
@Joe-Abraham Joe-Abraham marked this pull request as ready for review November 28, 2024 03:52
@Joe-Abraham
Copy link
Contributor Author

@aditi-pandit Can you please review the changes?

/// (non-remote) function registered with the same name. The `overwrite` flag
/// controls whether to overwrite in these cases.
/// (non-remote) function registered with the same name. The `overwrite`
/// flagwrite controls whether to overwrite in these cases.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : wording... maybe write is not needed here.

#include <folly/io/async/EventBase.h>
#include <sstream>
#include <string>
#include "velox/common/memory/ByteStream.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line between the system and velox includes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added new line

#include "velox/vector/VectorStream.h"

#include "velox/functions/remote/client/RestClient.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this include to the correct alphabetical order in the previous velox includes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected the inclusion list

/// Network address of the servr to communicate with. Note that this can hold
/// a network location (ip/port pair) or a unix domain socket path (see
/// URL of the HTTP/REST server for remote function.
/// Or Network address of the servr to communicate with. Note that this can
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit : spelling "server"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected it

}
size_t writeCallback(char* ptr, size_t size, size_t nmemb, void* userdata) {
auto* outputBuf = static_cast<IOBufQueue*>(userdata);
size_t total_size = size * nmemb;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit : Use camelCase naming -> totalSize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected it

return totalCopied;
}
size_t writeCallback(char* ptr, size_t size, size_t nmemb, void* userdata) {
auto* outputBuf = static_cast<IOBufQueue*>(userdata);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use camel case "userData"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected it

using namespace folly;
namespace facebook::velox::functions {
namespace {
size_t readCallback(char* dest, size_t size, size_t nmemb, void* userp) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write comments explaining the signature and the parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the documentation


class RestClient : public HttpClient {
public:
std::unique_ptr<folly::IOBuf> performCurlRequest(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please write the documentation for this API. What is it for ? What do the parameters mean ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the documentation

memory::memoryManager()->addLeafPool()};
};

class listener : public std::enable_shared_from_this<listener> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some documentation about these classes and what are they for ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added documentation

// called to use the functions mentioned in this map
};

TypePtr deserializeType(const std::string& input) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of repetition between this code and https://github.com/facebookincubator/velox/blob/main/velox/functions/remote/server/RemoteFunctionService.cpp. Please can you refactor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

introduced the RemoteFunctionHelper.h and moved the duplicate code.

@@ -16,11 +16,23 @@ velox_add_library(velox_functions_remote_thrift_client ThriftClient.cpp)
velox_link_libraries(velox_functions_remote_thrift_client
PUBLIC remote_function_thrift FBThrift::thriftcpp2)

set(curl_SOURCE BUNDLED)
velox_resolve_dependency(curl)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't actually build curl. We only use it to force cpr to use the version we want. So if you require curl I would say use set_source to allow system curl to be used (this might require changes to cpr.cmake but iirc cpr can work with system curl as well).

Also move this into the root cml within an if().


velox_add_library(velox_functions_remote_rest_client RestClient.cpp)
velox_link_libraries(velox_functions_remote_rest_client Folly::folly
${CURL_LIBRARIES})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${CURL_LIBRARIES})
CURL::libcurl)

Always prefer targets over variables.

@Joe-Abraham Joe-Abraham force-pushed the udf branch 2 times, most recently from d031069 to c0dd88d Compare January 7, 2025 05:58
CMakeLists.txt Outdated
Comment on lines 537 to 525
velox_set_source(curl)
velox_resolve_dependency(curl)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. If you now try a build without testing you will see that curl is not installed.
But it should be enough to add FetchContent_MakeAvailable(curl) to curl.cmake. This might interact weirdly with cpr trying it's own curl build but you'll just have to test that (I recommend make clean before).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @assignUser for the review points and I have made the necessary changes.

I am bit confused where to add FetchContent_MakeAvailable(curl). Can you please suggest the change?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Joe-Abraham : @assignUser has suggested to add it to curl.cmake

@Joe-Abraham Joe-Abraham marked this pull request as draft January 9, 2025 04:44
@Joe-Abraham Joe-Abraham force-pushed the udf branch 3 times, most recently from a4a213f to 8da88f3 Compare January 13, 2025 04:43
@Joe-Abraham Joe-Abraham marked this pull request as ready for review January 13, 2025 04:49
Copy link
Collaborator

@aditi-pandit aditi-pandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Joe-Abraham : A quick round of comments. I have to look at the server/ files in more detail still.

#include <folly/init/Init.h>
#include <gmock/gmock.h>
#include <gtest/gtest.h>
#include <cstdio>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This header should move to the top as it is a standard C library.

CMakeLists.txt Outdated
Comment on lines 537 to 525
velox_set_source(curl)
velox_resolve_dependency(curl)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Joe-Abraham : @assignUser has suggested to add it to curl.cmake

class RemoteFunction : public exec::VectorFunction {
public:
RemoteFunction(
const std::string& functionName,
const std::vector<exec::VectorFunctionArg>& inputArgs,
const RemoteVectorFunctionMetadata& metadata)
const RemoteVectorFunctionMetadata& metadata,
std::unique_ptr<HttpClient> httpClient = nullptr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we prefer to pass an HttpClient here ? For thrift, a ThriftClient is created per RemoteFunction. Might be better to do so for HttpClient as well. You have an eventBase_ to work with as well.

serde_(getSerde(serdeFormat_)) {
restClient_(httpClient ? std::move(httpClient) : getRestClient()),
metadata_(metadata) {
if (metadata.location.type() == typeid(SocketAddress)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can have a member variable of enum for RemoteType = {REST, HTTP}, and set it in the constructor and use it in the apply instead of doing the type check each time.

std::unique_ptr<RemoteFunctionClient> thriftClient_;
remote::PageFormat serdeFormat_;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the serdeFormat_ and serde_ member variables ? It was better to construct them once in the constructor and use them in the function apply code, instead of call getSerde each time.

IOBufQueue inputBufQueue(IOBufQueue::cacheChainLength());
inputBufQueue.append(std::move(requestPayload));

CURL* curl = curl_easy_init();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this initialization need to be done each time invokeFunction is called ? Can these be member variables initialized in the constructor ?

}
}

VELOX_INSTANTIATE_TEST_SUITE_P(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to enhance RemoteFunctionTest fixtures to do both Rest and Thrift Server testing. You could use Rest/Thrift as a parameterization to TEST_SUITE.

// Always registers all Presto functions and make them available under a
// certain prefix/namespace.
LOG(INFO) << "Registering Presto functions";
functions::prestosql::registerAllScalarFunctions(FLAGS_function_prefix);

std::remove(FLAGS_uds_path.c_str());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't understand this change.

@@ -18,6 +18,7 @@
#include <gflags/gflags.h>
#include <glog/logging.h>
#include <thrift/lib/cpp2/server/ThriftServer.h>
#include "velox/functions/prestosql/StringFunctions.h"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to separate the changes in this file into a separate PR. Though the only change needed here really is the memory::initializeMemoryManager({}); line.

I'll send out a PR.

Comment on lines +524 to +525
set(cpr_SOURCE BUNDLED)
velox_resolve_dependency(cpr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants