-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Netron output (ONNX JSON) #3717
Conversation
… into onnx_json_output
… into onnx_json_output
… into onnx_json_output
… into onnx_json_output
… into onnx_json_output
… into onnx_json_output
Add the base64 dependency to the requirements.txt file. Need to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just fix your Tidy Errors
I found a bug with the implementation because of the bit math used that doesn't work with the alterations I made. |
… into onnx_json_output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit. Complicated arithmetic shifts.
(Approved).
std::string b64_encode(const std::vector<byte>& buf) | ||
{ | ||
std::size_t len = buf.size(); | ||
std::vector<byte> res_vec((len + 2) / 3 * 4, '='); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The result should be stored as std::string
, there is no reason to prefill it with =
, you just push_back
the characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
push_back
onto strings of very large lengths can be a performance issue. Ideally the space should be just reserved, rather than filled with =
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reserve
can be called if there is a perf issue. I dont think its that important here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't use std::string as is because the type being unsigned char
is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could probably use basic_string<unsigned char>
, but that's more code changes for what looks like a marginal benefit
std::size_t pad_cond = len % 3; | ||
const size_t last = len - pad_cond; | ||
|
||
for(size_t i = 0; i < last; i += 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use the iterators directly for the loop:
for(auto it = buf.begin(); it < buf.end(); it += 3)
{
std::size_t n = to_int(it[0]) << 16u | to_int(it[1]) << 8u | to_int(it[2]);
...
}
And add the to_int
function to bit_cast it and convert it to std::size_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it would be better to make a function to encode the triplet since you repeat the same below for the "padding":
template<class Input>
std::array<char, 4> encode(Input input)
{
std::size_t n = to_int(it[0]) << 16u | to_int(it[1]) << 8u | to_int(it[2]);
return {b64_chars.at(n >> 18u), b64_chars.at(n >> 12u & 0x3Fu), b64_chars.at(n >> 6u & 0x3Fu), b64_chars.at(n & 0x3Fu) };
}
Then the loop can do:
for(auto it = buf.begin(); it < buf.end(); it += 3)
{
copy(encode(it), std::back_inserter(result));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as iterator usage is concerned, lt
or gt
comparison is not recommended against buf.end()
. It should strictly be used by an equality comparison: ==
or !=
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as iterator usage is concerned, lt or gt comparison is not recommended against buf.end()
Not recommended, by who? These are random access iterators so they support comparison operators just like pointers. The <
operator should definitely be used here since we are skipping over by increments of 3, which means it could skip past the end, and it would become an infinite loop since we would never reach the end as we already past it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it could be UB if its past buf.end()
so we probably need to do buf.end() - remaining
to avoid that.
res_vec.at(j++) = b64_chars.at(pad_cond != 0 ? n >> 10u & 0x3Fu : n >> 2u); | ||
res_vec.at(j++) = b64_chars.at(pad_cond != 0 ? n >> 4u & 0x03Fu : n << 4u & 0x3Fu); | ||
res_vec.at(j++) = pad_cond != 0 ? b64_chars.at(n << 2u & 0x3Fu) : '='; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can reuse the encode
function so it doesnt repeat the code. You can also copy the chars into an array(this way pad_cond
is not getting sublty modified):
assert(pad_cond < 3);
std::array<char, 3> triple = {0};
// Get the remaining characters to encode
std::copy(buf.end() - pad_cond, buf.end(), triple.begin());
auto e = encode(triple);
// Add the encoded characters
std::copy(e.begin(), e.begin() + 1 + pad_cond, std::back_inserter(result));
// Pad string with `=`
result.append(3 - pad_cond, '=');
I'm not inclined the rewrite the base64 encode code for a 5th time so I'm going to leave it as is for now. We can improve on it later.. |
std::string expected{"AAAA"}; | ||
std::string actual{migraphx::b64_encode({input.begin(), input.end()})}; | ||
EXPECT(expected == actual); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also the variables should be inlined: EXPECT("AAAA" == migraphx::b64_encode({'\x00', '\x00', '\x00' })
.
TODO: Add driver documentation |
--netron
option to the MIGX driver that outputs a JSON file in the ONNX format that is readable by the Netron applicationNeed to test out a couple different ways of handling base64 encoding