Parallelize unary tanh on cpu, generalize ADD to allow more shapes #580
+107
−48
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm working on a project which needed those operations.
tanh was parallelized in the same manner as other unary ops
ADD is generalized to allow for ggml_can_repeat constraint, instead of the ggml_can_repeat_rows
This was done adding two extra branches in the function, one of them is likely very slow and handles the most general case. The second is particularly optimized for my project's need (adding MxN and 1xP tensors) and uses ggml_vec_add1_f32.