Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected behavior for GGML_MEAN #1005

Open
JohannesGaessler opened this issue Nov 2, 2024 · 8 comments
Open

Unexpected behavior for GGML_MEAN #1005

JohannesGaessler opened this issue Nov 2, 2024 · 8 comments
Labels
good first issue Good for newcomers

Comments

@JohannesGaessler
Copy link
Collaborator

The usual behavior for the "mean" operation in numerical frameworks is a reduction of a tensor to a single value. However, in GGML this operation instead calculates the mean per row. This is I think unexpected behavior and the naming is also inconsistent with GGML_SUM (per tensor) and GGML_SUM_ROWS (per row).

@JohannesGaessler JohannesGaessler added the good first issue Good for newcomers label Nov 2, 2024
@aksh16
Copy link

aksh16 commented Nov 25, 2024

Hi, is this issue still up for grabs? If it is, may I take this up?

@slaren
Copy link
Collaborator

slaren commented Nov 25, 2024

Most operations already operate on a per-row basis. If you want a full reduction, in most cases you can do that by creating an 1D view (at worst you might have to make it contiguous). It might make more sense to standardize on that rather than duplicating the ops between a global and per-row version.

@JohannesGaessler
Copy link
Collaborator Author

I am not aware of anyone working on this at the moment. If desirable we can talk for a few minutes in order to explain the general structure of the code.

Most operations already operate on a per-row basis. If you want a full reduction, in most cases you can do that by creating an 1D view (at worst you might have to make it contiguous). It might make more sense to standardize on that rather than duplicating the ops between a global and per-row version.

As of right now a per-tensor mean is not used for anything so performance would be completely irrelevant anyways. I think ggml_scale(ggml_sum(a), 1.0f/ggml_nelements(a)) would be a simpler workaround than calculating a per-row mean though. The minimal solution would I think be to just rename the op since the confusing name is in my opinion the biggest issue.

@slaren
Copy link
Collaborator

slaren commented Nov 25, 2024

It might make more sense to remove ggml_sum entirely and keep only ggml_sum_rows (and rename it to ggml_sum), and change other operations that operate on all the elements to operate per-row. Most operations already operate on rows, so why should sum or mean be different? And again, if you want to operate on all the elements, you can in most cases make a 1D view.

@JohannesGaessler
Copy link
Collaborator Author

I would also be fine with having only a single op for summation and defining reductive GGML ops to be per-row by default (but I think that we should then document this since the default in e.g. NumPy is per tensor). As far as I can tell there are currently SUM, COUNT_EQUAL and CROSS_ENTROPY_LOSS that are doing reductions per tensor. SUM and COUNT_EQUAL can be removed/converted to per-row logic without issue but not CROSS_ENTROPY_LOSS I think because for that operation the shape of the input matters.

@aksh16
Copy link

aksh16 commented Nov 27, 2024

So, these could be the tasks for this issue?

  • rename ggml_sum_rows to ggml_sum
  • remove the current ggml_sum
  • update count_equal to per row
  • update documentation to reflect these changes (the per row ops)

@JohannesGaessler
Copy link
Collaborator Author

That seems to cover it. There are C and CUDA implementations for COUNT_EQUAL that will need to be adjusted. If you are unfamiliar with CUDA I can help with that part.

@aksh16
Copy link

aksh16 commented Nov 27, 2024

I have played around a little bit with CUDA but I might reach out if I feel stuck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants