Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::string_view support #275

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

sjanel
Copy link
Contributor

@sjanel sjanel commented Jan 15, 2023

If the json library supports std::string_view, allow jwt-cpp take builder parameters as std::string_view and algorithm constructors as well.

Also small optimizations in some methods of jwt.h to make sure only one object is returned so that RVO can be used.

PR is ready for review. Only the coverage is failing, I will check why.

@sjanel sjanel marked this pull request as draft January 15, 2023 10:36
@sjanel sjanel force-pushed the feature/string-view-support branch 4 times, most recently from cd66010 to a73ec5c Compare January 20, 2023 22:16
@sjanel sjanel marked this pull request as ready for review January 20, 2023 22:21
Copy link
Collaborator

@prince-chrismc prince-chrismc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally understand the motivation behind this contribution but I would generally suggest opening an issue to discuss ideas this large, tip for the future :)

There are a handful of changes which are a little unclear what the motivation so I left some comments during an initial pass.

Overall the code looks really good, my comments are for existing users who might want to update as well as some performance consideration :)

@@ -117,6 +119,14 @@ extern "C" {

namespace picojson {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a thirdparty library and I am not sure changing the source code here is going to help us move away from it since it's no longer maintained

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the modifications are necessary for the tests to compile. What do you suggest ?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need to modify the json library for it to work with jwt-cpp then we can't do so much string_view. jwt-cpp aims to be as agnostic as possible to json/ssl libraries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's no longer maintained, why is it a problem anyway as it should be removed from this repo ?

ASSERT_EQ("MTIz", jwt::base::trim<jwt::alphabet::base64url>("MTIz"));
ASSERT_EQ("MTIzNA", jwt::base::trim<jwt::alphabet::base64url>("MTIzNA%3d%3d"));
}
namespace jwt {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind removing these namespaces it introduces a lot of change and it more difficult to see whats different

Copy link
Contributor Author

@sjanel sjanel Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I placed using namespace jwt; instead and opened another PR.

@@ -67,7 +67,8 @@ namespace jwt {
return val.get_double();
}

static bool parse(value_type& val, string_type str) {
template<class string_t>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this not be specialized based on what the library supports?

Copy link
Contributor Author

@sjanel sjanel Aug 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

json::parse (from nlohmann) accepts a templated type. IMHO it's safe to also take any string type.

Comment on lines +67 to +68
template<class string_t>
static bool parse(json& val, const string_t& str) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be more ideal to specialize these so users do not make mistakes

https://github.com/nlohmann/json/blob/6af826d0bdb55e4b69e3ad817576745335f243ca/README.md?plain=1#L1640

There could be one for both types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template function solves two problems:

  • Ambiguity issues, for instance if you provide a function for std::string_view and const std::string & and you call the function with a const char * you will have a compile error
  • It allows usage of custom string. For instance, fbstring is a good drop-in replacement for std::string.

Comment on lines 503 to 508
std::string res;
if (!certbio || !keybio) {
ec = error::rsa_error::create_mem_bio_failed;
return {};
return res;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked how this impacts compiler optimizations? Last I checked the compilers are better at implementing copy elision with r-values

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having only one returned variable enables use of RVO (in the modified code). I believe that new code may be slightly more efficient, although I did not check. The impacts should be minor though, it's more a matter of code style in the end. If it's an issue for you, I can revert these modifications.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to keep the current one because it makes it obvious that an empty result is returned.

include/jwt-cpp/jwt.h Show resolved Hide resolved
include/jwt-cpp/string_types.h Outdated Show resolved Hide resolved
namespace jwt {

namespace {
base::details::padding count_padding(string_view base, std::initializer_list<std::string> fills) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be in base details with the other implementation since users would need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally base details are only for internal usage, when do you think users would need them ?

};
} // namespace helper

inline uint32_t index(const std::array<char, 64>& alphabet, char symbol) {
auto itr = std::find_if(alphabet.cbegin(), alphabet.cend(), [symbol](char c) { return c == symbol; });
if (symbol >= 'A' && symbol <= 'Z') { return static_cast<uint32_t>(symbol - 'A'); }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you checked the performance hit for all these branches?

Copy link
Contributor Author

@sjanel sjanel Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is: https://quick-bench.com/q/8u7o4DL5yBu256Rzgu_KLi2Z4JA
image

There are a lot less branches as in the original code you check all elements one by one linearly. New code is smarter.

for (const auto& fill : fills) {
if (base.size() < fill.size()) continue;
template<class StrInputIt>
padding count_padding(string_view base, StrInputIt fillStart, StrInputIt fillEnd) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please change these to snake_case to match the rest of the code base?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes sure.

@sjanel
Copy link
Contributor Author

sjanel commented Aug 11, 2023

I generally understand the motivation behind this contribution but I would generally suggest opening an issue to discuss ideas this large, tip for the future :)

There are a handful of changes which are a little unclear what the motivation so I left some comments during an initial pass.

Overall the code looks really good, my comments are for existing users who might want to update as well as some performance consideration :)

Thanks for the comments, I was quite busy these days, but I will try to come back on my PR, I did not forget it ;)
I will try to split it as well. I think I can extract some stuff in base.h first.

@sjanel sjanel force-pushed the feature/string-view-support branch 6 times, most recently from 3570344 to 478a743 Compare August 18, 2023 12:50
@sjanel sjanel force-pushed the feature/string-view-support branch from 478a743 to 5b418af Compare August 18, 2023 13:16
@@ -30,19 +35,31 @@ namespace jwt {
* base64-encoded as per [Section 4 of RFC4648](https://datatracker.ietf.org/doc/html/rfc4648#section-4)
*/
struct base64 {

#define JWT_BASE_ALPHABET \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a fan of new macros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced conditional compiling and I thought it was a good idea to factorize some code and avoid several definitions of the same alphabet

if (itr == alphabet.cend()) { throw std::runtime_error("Invalid input: not within alphabet"); }
template<class char_it>
inline uint32_t index(char_it alphabetBeg, char_it alphabetEnd, char symbol) {
if (symbol >= 'A' && symbol <= 'Z') { return static_cast<uint32_t>(symbol - 'A'); }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep in mind, that the new code is not a direct equivalent to the original though. It relies on the fact that the first 62 chars of the alphabet are "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789" which effectively demotes the alphabet type to a 2 char field. I am not sure if this is a smart thing to do, it doesn't feel right to provide a user hook to change the whole alphabet just to ignore 96% of it. In the best case this leads to inconvenience, in the worst to bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's at least equivalent considering the alphabets it receives. Do we expect the alphabets to change ? As this file is meant to be internal the risk is close to 0 to introduce a bug here (and it's unit tested anyway).

include/jwt-cpp/base.h Show resolved Hide resolved
include/jwt-cpp/base.h Show resolved Hide resolved
}

#else
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the point of checking for JWT_HAS_STRING_VIEW if the else branch uses string_view as well ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C++ v >= 17, there are no functions, only static constexpr arrays which makes the code simpler. This will allow easier code update when jwt-cpp will require a C++17 compiler as minimum version.

include/jwt-cpp/jwt.h Show resolved Hide resolved
@@ -1107,9 +1112,9 @@ namespace jwt {
* \param name Name of the algorithm
* \param siglen The bit length of the signature
*/
ecdsa(const std::string& public_key, const std::string& private_key, const std::string& public_key_password,
const std::string& private_key_password, const EVP_MD* (*md)(), std::string name, size_t siglen)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider keeping the algorithm name as a std string to avoid allocating if a string is passed in. Otherwise we convert string to string_view to string instead of directly moving the first string.

@@ -1311,9 +1320,9 @@ namespace jwt {
/// Hash generator function
const EVP_MD* (*md)();
/// algorithm's name
const std::string alg_name;
std::string alg_name;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the const ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already answered about that above

@@ -1336,9 +1345,9 @@ namespace jwt {
* to decrypt private key pem.
* \param name Name of the algorithm
*/
eddsa(const std::string& public_key, const std::string& private_key, const std::string& public_key_password,
const std::string& private_key_password, std::string name)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment above regarding move

@@ -117,6 +119,14 @@ extern "C" {

namespace picojson {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need to modify the json library for it to work with jwt-cpp then we can't do so much string_view. jwt-cpp aims to be as agnostic as possible to json/ssl libraries.

@sjanel sjanel force-pushed the feature/string-view-support branch from bcda1ae to 3e77901 Compare August 23, 2023 10:34
Copy link
Contributor Author

@sjanel sjanel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Reverted changes based on std::string res as suggested in code review comments
  • alg name as std::string passed by copy then std::move

namespace jwt {

namespace {
base::details::padding count_padding(string_view base, std::initializer_list<std::string> fills) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally base details are only for internal usage, when do you think users would need them ?

Comment on lines +67 to +68
template<class string_t>
static bool parse(json& val, const string_t& str) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template function solves two problems:

  • Ambiguity issues, for instance if you provide a function for std::string_view and const std::string & and you call the function with a const char * you will have a compile error
  • It allows usage of custom string. For instance, fbstring is a good drop-in replacement for std::string.

include/jwt-cpp/jwt.h Show resolved Hide resolved
@@ -30,19 +35,31 @@ namespace jwt {
* base64-encoded as per [Section 4 of RFC4648](https://datatracker.ietf.org/doc/html/rfc4648#section-4)
*/
struct base64 {

#define JWT_BASE_ALPHABET \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I introduced conditional compiling and I thought it was a good idea to factorize some code and avoid several definitions of the same alphabet

if (itr == alphabet.cend()) { throw std::runtime_error("Invalid input: not within alphabet"); }
template<class char_it>
inline uint32_t index(char_it alphabetBeg, char_it alphabetEnd, char symbol) {
if (symbol >= 'A' && symbol <= 'Z') { return static_cast<uint32_t>(symbol - 'A'); }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's at least equivalent considering the alphabets it receives. Do we expect the alphabets to change ? As this file is meant to be internal the risk is close to 0 to introduce a bug here (and it's unit tested anyway).

@@ -117,6 +119,14 @@ extern "C" {

namespace picojson {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's no longer maintained, why is it a problem anyway as it should be removed from this repo ?

}

#else
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In C++ v >= 17, there are no functions, only static constexpr arrays which makes the code simpler. This will allow easier code update when jwt-cpp will require a C++17 compiler as minimum version.

ec.clear();

auto c_str = reinterpret_cast<const unsigned char*>(cert_der_str.c_str());
auto c_str = reinterpret_cast<const unsigned char*>(cert_der_str.data());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for me.

include/jwt-cpp/jwt.h Show resolved Hide resolved
@@ -1311,9 +1320,9 @@ namespace jwt {
/// Hash generator function
const EVP_MD* (*md)();
/// algorithm's name
const std::string alg_name;
std::string alg_name;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already answered about that above

@sjanel sjanel force-pushed the feature/string-view-support branch from 3e77901 to 1fc6924 Compare August 23, 2023 10:47
@sjanel sjanel force-pushed the feature/string-view-support branch from 1fc6924 to 8262bb8 Compare August 23, 2023 10:55
@Thalhammer
Copy link
Owner

For some reason I could not reply to a bunch of your comments, so here we go.

It's at least equivalent considering the alphabets it receives. Do we expect the alphabets to change ? As this file is meant to be internal the risk is close to 0 to introduce a bug here (and it's unit tested anyway).

Likely not, but its exposed to the public and someone might use it with a different alphabet. Its a user visible header and the base64 stuff is not in a details namespace which makes it kinda public api (even if its not advertised as one). If we potentially break peoples code, we should at least break it at the compile stage (I.e. change the types inside the alphabets and remove the - no superficial - first 62 chars) instead of silently changing behavior in edge cases.

If it's no longer maintained, why is it a problem anyway as it should be removed from this repo ?

Mostly because it being there didn't hurt so far and removing it would likely break a lot of users codebase because its the default library. Also having one in there that does not rely on a third party json library makes it way easier to use (no extra dependencies).

@sjanel
Copy link
Contributor Author

sjanel commented Oct 27, 2023

Hello @Thalhammer @prince-chrismc

I answered your comments, not sure what is preventing the PR to be merged currently.

Also, it fixes following gcc 12 compiling error:

[build] include/jwt-cpp/base.h: In static member function ‘static const std::initializer_list<std::__cxx11::basic_string<char> >& jwt::alphabet::helper::base64url_percent_encoding::fill()’:
[build] include/jwt-cpp/base.h:86:100: error: ‘(((const std::__cxx11::basic_string<char>*)(& <temporary>)) != 0)’ is not a constant expression
[build]    86 |                                         static std::initializer_list<std::string> fill{"%3D", "%3d"};

and an attempt to fix it 'easily' is more complicated than expected.

Here with this PR and string_view it's much easier.

EDIT: sorry I did not catch your last global comment. I can revert the changes in the alphabet if needed. However I don't see how to avoid picojson code changes (see my below comment)

@sjanel
Copy link
Contributor Author

sjanel commented Oct 27, 2023

Mostly because it being there didn't hurt so far and removing it would likely break a lot of users codebase because its the default library. Also having one in there that does not rely on a third party json library makes it way easier to use (no extra dependencies).

That's the reason why I fixed it instead of removing it. They are not complicated changes, and backward compatible with ancient compilers. If we cannot remove it, and cannot change it, how can we move on ?

@prince-chrismc
Copy link
Collaborator

I am sincerely sorry, I've been meaning to follow up but I am currently between jobs and trying to move countries. It might be several more weeks before I have a chance to work from my computer so I'd really appreciate your patients.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants