Parallelize unary tanh on cpu, generalize ADD to allow more shapes #580

audiovention · 2023-10-13T09:26:22Z

I'm working on a project which needed those operations.
tanh was parallelized in the same manner as other unary ops

ADD is generalized to allow for ggml_can_repeat constraint, instead of the ggml_can_repeat_rows
This was done adding two extra branches in the function, one of them is likely very slow and handles the most general case. The second is particularly optimized for my project's need (adding MxN and 1xP tensors) and uses ggml_vec_add1_f32.

ggerganov · 2023-10-20T10:25:39Z

src/ggml.c

+        } else {
+            // all are not contiguous
+            for (int ie = ie0; ie < ie1; ++ie) {
+                // src1 is broadcastable across src0 and dst in i1, i2, i3
+                const int64_t i03 = ie/(ne02*ne01*ne00);
+                const int64_t i02 = (ie - i03*ne02*ne01*ne00)/(ne01*ne00);
+                const int64_t i01 = (ie - i03*ne02*ne01*ne00 - i02*ne01*ne00);
+                const int64_t i00 = (ie - i03*ne02*ne01*ne00 - i02*ne01*ne00 - i01*ne00);
+
+                const int64_t i13 = i03 % ne13;
+                const int64_t i12 = i02 % ne12;
+                const int64_t i11 = i01 % ne11;
+                const int64_t i10 = i00 % ne10;
+
+                float * dst_ptr  = (float *) ((char *) dst->data  + i03*nb3  + i02*nb2  + i01*nb1  + i00*nb0 );
+                float * src0_ptr = (float *) ((char *) src0->data + i03*nb03 + i02*nb02 + i01*nb01 + i00*nb00);
+                float * src1_ptr = (float *) ((char *) src1->data + i13*nb13 + i12*nb12 + i11*nb11 + i10*nb10);
+
+                *dst_ptr = *src0_ptr + *src1_ptr;
            }


This branch will probably be very slow - most of the computation will probably go into computing the indices

audiovention and others added 4 commits October 12, 2023 21:04

generalize add implementation

ac31f8e

paralelize tanh

de6a28d

Merge branch 'ggerganov:master' into master

114c168

ggml : naming fixes

d73a460

ggerganov approved these changes Oct 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize unary tanh on cpu, generalize ADD to allow more shapes #580

Parallelize unary tanh on cpu, generalize ADD to allow more shapes #580

audiovention commented Oct 13, 2023

ggerganov Oct 20, 2023

Parallelize unary tanh on cpu, generalize ADD to allow more shapes #580

Are you sure you want to change the base?

Parallelize unary tanh on cpu, generalize ADD to allow more shapes #580

Conversation

audiovention commented Oct 13, 2023

ggerganov Oct 20, 2023

Choose a reason for hiding this comment