Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hadar/vecops #639

Merged
merged 47 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
f651c59
initial edits
Sep 10, 2024
64d4414
vector_sum issue
Sep 13, 2024
04351fb
for Miki
Sep 16, 2024
f3086d4
debugged reduction ops
Sep 16, 2024
2ab4488
added offset/stride to reduce ops
Sep 16, 2024
89e998a
implemented strides ops
Sep 17, 2024
9aaf944
vec_ops batch added
ShanieWinitz Oct 9, 2024
1488732
vec_ops - added: config.batch, parallel transpose, tests
ShanieWinitz Oct 12, 2024
de1fcbf
vecops with batch - documentation
ShanieWinitz Oct 13, 2024
3a943a5
formating
ShanieWinitz Oct 13, 2024
a013f46
Merge branch 'main' into hadar/vecops
HadarIngonyama Oct 21, 2024
98ca917
vectorVectorOps passes
HadarIngonyama Oct 21, 2024
0c6bc9a
mont + scalars passing
HadarIngonyama Oct 22, 2024
32e262b
bitrev passes
HadarIngonyama Oct 23, 2024
e8e1799
slice passes
HadarIngonyama Oct 28, 2024
1d1f84e
slice passes
HadarIngonyama Oct 29, 2024
0c609bf
reduction passes
HadarIngonyama Oct 29, 2024
dca2e5b
fix scalar columns batch
HadarIngonyama Oct 30, 2024
0728a06
remove same scalar bool
HadarIngonyama Oct 30, 2024
2590df0
fix API
HadarIngonyama Oct 30, 2024
2fd1fac
fix API
HadarIngonyama Oct 30, 2024
1bd7c05
non zero passes
HadarIngonyama Oct 30, 2024
0016149
slice and poly_dev apis deprecated use new ones with warning
yshekel Oct 31, 2024
916618c
poly eval WIP
HadarIngonyama Oct 31, 2024
6176a79
Merge remote-tracking branch 'refs/remotes/origin/hadar/vecops' into …
HadarIngonyama Oct 31, 2024
f033bdb
poly eval passes
HadarIngonyama Oct 31, 2024
35d2e23
fix types +
HadarIngonyama Oct 31, 2024
ecc054d
tidy up
HadarIngonyama Oct 31, 2024
d9a0b5f
Merge remote-tracking branch 'origin/main' into hadar/vecops
HadarIngonyama Oct 31, 2024
9798073
formatting and spelling
HadarIngonyama Oct 31, 2024
32bd780
ntt test
HadarIngonyama Oct 31, 2024
5291608
debug eval bug
HadarIngonyama Nov 2, 2024
b7b26ec
eval bug solved
HadarIngonyama Nov 2, 2024
baf3eb2
removed vec-ops example - doesn't compile and very similar to other e…
yshekel Nov 3, 2024
2ed4369
updated poly-div test and poly-eval fix for column mode
yshekel Nov 3, 2024
b7d62c8
updated for poly div
yshekel Nov 4, 2024
b361b0f
vector div for extension field and test fix for missing ext field apis
yshekel Nov 4, 2024
fd208f4
remove wrong file
yshekel Nov 4, 2024
4de758f
revert api headers
yshekel Nov 4, 2024
c9788e9
minor cleanup
yshekel Nov 4, 2024
0562f85
Merge remote-tracking branch 'origin/main' into hadar/vecops
yshekel Nov 4, 2024
fdc7a5c
update go vec-ops config struct
yshekel Nov 4, 2024
198d196
fix C++ example
yshekel Nov 4, 2024
8f827d6
vec_ops rust binding and tests (#642)
emirsoyturk Nov 4, 2024
0c25f75
formatting rust
yshekel Nov 4, 2024
dd6833b
extension field vec ops
yshekel Nov 4, 2024
fbb9f55
release script build v3.1
yshekel Nov 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions docs/docs/icicle/golang-bindings/vec-ops.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@

Icicle exposes a number of vector operations which a user can use:

* The VecOps API provides efficient vector operations such as addition, subtraction, and multiplication.
* MatrixTranspose API allows a user to perform a transpose on a vector representation of a matrix
* The VecOps API provides efficient vector operations such as addition, subtraction, and multiplication, supporting both single and batched operations.
* MatrixTranspose API allows a user to perform a transpose on a vector representation of a matrix, with support for batched transpositions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the rest of the operations


## VecOps API Documentation

Expand Down Expand Up @@ -121,6 +121,8 @@ type VecOpsConfig struct {
isBOnDevice bool
isResultOnDevice bool
IsAsync bool
batch_size int
columns_batch bool
Ext config_extension.ConfigExtensionHandler
}
```
Expand All @@ -132,6 +134,8 @@ type VecOpsConfig struct {
- **`isBOnDevice`**: Indicates if vector `b` is located on the device.
- **`isResultOnDevice`**: Specifies where the result vector should be stored (device or host memory).
- **`IsAsync`**: Controls whether the vector operation runs asynchronously.
- **`batch_size`**: Number of vectors (or operations) to process in a batch. Each vector operation will be performed independently on each batch element.
- **`columns_batch`**: true if the batched vectors are stored as columns in a 2D array (i.e., the vectors are strided in memory as columns of a matrix). If false, the batched vectors are stored contiguously in memory (e.g., as rows or in a flat array).
- **`Ext`**: Extended configuration for backend.

#### Default Configuration
Expand All @@ -148,6 +152,8 @@ This section describes the functionality of the `TransposeMatrix` function used

The function takes a matrix represented as a 1D slice and transposes it, storing the result in another 1D slice.

If VecOpsConfig specifies a batch_size greater than one, the transposition is performed on multiple matrices simultaneously, producing corresponding transposed matrices. The storage arrangement of batched matrices is determined by the columns_batch field in the VecOpsConfig.

### Function

```go
Expand Down
45 changes: 36 additions & 9 deletions docs/docs/icicle/primitives/vec_ops.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ The `VecOpsConfig` struct is a configuration object used to specify parameters f
- **`is_b_on_device: bool`**: Indicates whether the second input vector (`b`) is already on the device. If `false`, the vector will be copied from the host to the device. This field is optional.
- **`is_result_on_device: bool`**: Indicates whether the result should be stored on the device. If `false`, the result will be transferred back to the host.
- **`is_async: bool`**: Specifies whether the vector operation should be performed asynchronously. When `true`, the operation will not block the CPU, allowing other operations to proceed concurrently. Asynchronous execution requires careful synchronization to ensure data integrity.
- **`batch_size: int`**: Number of vectors (or operations) to process in a batch. Each vector operation will be performed independently on each batch element.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the assumption is that all the vectors are concatenated to 1 vector

- **`columns_batch: bool`**: True if the batched vectors are stored as columns in a 2D array (i.e., the vectors are strided in memory as columns of a matrix). If false, the batched vectors are stored contiguously in memory (e.g., as rows or in a flat array).
- **`ext: ConfigExtension*`**: Backend-specific extensions.

#### Default Configuration
Expand All @@ -28,14 +30,17 @@ static VecOpsConfig default_vec_ops_config() {
false, // is_b_on_device
false, // is_result_on_device
false, // is_async
1, // batch_size
false, // columns_batch
nullptr // ext
};
return config;
}
```

### Element-wise Operations

These functions perform element-wise operations on two input vectors `a` and `b`, producing an output vector.
These functions perform element-wise operations on two input vectors a and b. If VecOpsConfig specifies a batch_size greater than one, the operations are performed on multiple pairs of vectors simultaneously, producing corresponding output vectors.

#### `vector_add`

Expand Down Expand Up @@ -90,9 +95,31 @@ template <typename T>
eIcicleError convert_montgomery(const T* input, uint64_t size, bool is_into, const VecOpsConfig& config, T* output);
```

### Reduction operations

These functions perform reduction operations on vectors. If VecOpsConfig specifies a batch_size greater than one, the operations are performed on multiple vectors simultaneously, producing corresponding output values. The storage arrangement of batched vectors is determined by the columns_batch field in the VecOpsConfig.

#### `vector_sum`

Computes the sum of all elements in each vector in a batch.

```cpp
template <typename T>
eIcicleError vector_sum(const T* vec_a, uint64_t size, const VecOpsConfig& config, T* output);
```

#### `vector_product`

Computes the product of all elements in each vector in a batch.

```cpp
template <typename T>
eIcicleError vector_product(const T* vec_a, uint64_t size, const VecOpsConfig& config, T* output);
```

### Scalar-Vector Operations

These functions apply a scalar operation to each element of a vector.
These functions apply a scalar operation to each element of a vector. If VecOpsConfig specifies a batch_size greater than one, the operations are performed on multiple vector-scalar pairs simultaneously, producing corresponding output vectors.

#### `scalar_add_vec / scalar_sub_vec`

Expand Down Expand Up @@ -123,7 +150,7 @@ eIcicleError scalar_mul_vec(const T* scalar_a, const T* vec_b, uint64_t size, co

### Matrix Operations

These functions perform operations on matrices.
These functions perform operations on matrices. If VecOpsConfig specifies a batch_size greater than one, the operations are performed on multiple matrices simultaneously, producing corresponding output matrices.

#### `matrix_transpose`

Expand All @@ -138,7 +165,7 @@ eIcicleError matrix_transpose(const T* mat_in, uint32_t nof_rows, uint32_t nof_c

#### `bit_reverse`

Reorders the vector elements based on a bit-reversal pattern.
Reorders the vector elements based on a bit-reversal pattern. If VecOpsConfig specifies a batch_size greater than one, the operation is performed on multiple vectors simultaneously.

```cpp
template <typename T>
Expand All @@ -147,16 +174,16 @@ eIcicleError bit_reverse(const T* vec_in, uint64_t size, const VecOpsConfig& con

#### `slice`

Extracts a slice from a vector.
Extracts a slice from a vector. If VecOpsConfig specifies a batch_size greater than one, the operation is performed on multiple vectors simultaneously, producing corresponding output vectors.

```cpp
template <typename T>
eIcicleError slice(const T* vec_in, uint64_t offset, uint64_t stride, uint64_t size, const VecOpsConfig& config, T* vec_out);
eIcicleError slice(const T* vec_in, uint64_t offset, uint64_t stride, uint64_t size_in, uint64_t size_out, const VecOpsConfig& config, T* vec_out);
```

#### `highest_non_zero_idx`

Finds the highest non-zero index in a vector.
Finds the highest non-zero index in a vector. If VecOpsConfig specifies a batch_size greater than one, the operation is performed on multiple vectors simultaneously.

```cpp
template <typename T>
Expand All @@ -165,7 +192,7 @@ eIcicleError highest_non_zero_idx(const T* vec_in, uint64_t size, const VecOpsCo

#### `polynomial_eval`

Evaluates a polynomial at given domain points.
Evaluates a polynomial at given domain points. If VecOpsConfig specifies a batch_size greater than one, the operation is performed on multiple vectors simultaneously.

```cpp
template <typename T>
Expand All @@ -174,7 +201,7 @@ eIcicleError polynomial_eval(const T* coeffs, uint64_t coeffs_size, const T* dom

#### `polynomial_division`

Divides two polynomials.
Divides two polynomials. If VecOpsConfig specifies a batch_size greater than one, the operation is performed on multiple vectors simultaneously.

```cpp
template <typename T>
Expand Down
5 changes: 5 additions & 0 deletions docs/docs/icicle/programmers_guide/general.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ The configuration struct allows users to modify settings such as:

- Specifying whether inputs and outputs are on the host or device.
- Adjusting the data layout for specific optimizations.
- Setting batching parameters (batch_size and columns_batch) to perform operations on multiple data sets simultaneously.
- Passing custom options to the backend implementation through an extension mechanism, such as setting the number of CPU cores to use.

### Example (C++)
Expand All @@ -31,6 +32,8 @@ The configuration struct allows users to modify settings such as:
// Create config struct for vector add
VecOpsConfig config = default_vec_ops_config();
// optionally modify the config struct here
config.batch_size = 4; // Process 4 vector operations in a batch
config.columns_batch = true; // Batched vectors are stored as columns

// Call the API
eIcicleError err = vector_add(vec_a, vec_b, size, config, vec_res);
Expand All @@ -45,6 +48,8 @@ struct VecOpsConfig {
bool is_b_on_device; /**< True if `b` is on the device, false if it is not. Default value: false. OPTIONAL. */
bool is_result_on_device; /**< If true, the output is preserved on the device, otherwise on the host. Default value: false. */
bool is_async; /**< Whether to run the vector operations asynchronously. */
int batch_size; /**< Number of vector operations to process in a batch. Default value: 1. */
bool columns_batch; /**< True if batched vectors are stored as columns; false if stored contiguously. Default value: false. */
ConfigExtension* ext = nullptr; /**< Backend-specific extension. */
};
```
Expand Down
13 changes: 9 additions & 4 deletions docs/docs/icicle/rust-bindings/vec-ops.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Vector Operations API

Our vector operations API includes fundamental methods for addition, subtraction, and multiplication of vectors, with support for both host and device memory.
Our vector operations API includes fundamental methods for addition, subtraction, and multiplication of vectors, with support for both host and device memory, as well as batched operations.

## Vector Operations Configuration

The `VecOpsConfig` struct encapsulates the settings for vector operations, including device context and operation modes.
The `VecOpsConfig` struct encapsulates the settings for vector operations, including device context, operation modes, and batching parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove , before the "and"


### `VecOpsConfig`

Expand All @@ -17,6 +17,8 @@ pub struct VecOpsConfig {
pub is_b_on_device: bool,
pub is_result_on_device: bool,
pub is_async: bool,
pub batch_size: usize,
pub columns_batch: bool,
pub ext: ConfigExtension,
}
```
Expand All @@ -28,6 +30,9 @@ pub struct VecOpsConfig {
- **`is_b_on_device: bool`**: Indicates whether the input b data has been preloaded on the device memory. If `false` inputs will be copied from host to device.
- **`is_result_on_device: bool`**: Indicates whether the output data is preloaded in device memory. If `false` outputs will be copied from host to device.
- **`is_async: bool`**: Specifies whether the NTT operation should be performed asynchronously.
- **`batch_size: usize`**: Number of vector operations to process in a single batch. Each operation will be performed independently on each batch element.
- **`columns_batch: bool`**: true if the batched vectors are stored as columns in a 2D array (i.e., the vectors are strided in memory as columns of a matrix). If false, the batched vectors are stored contiguously in memory (e.g., as rows or in a flat array).

- **`ext: ConfigExtension`**: extended configuration for backend.

### Default Configuration
Expand All @@ -40,11 +45,11 @@ let cfg = VecOpsConfig::default();

## Vector Operations

Vector operations are implemented through the `VecOps` trait, providing methods for addition, subtraction, and multiplication of vectors.
Vector operations are implemented through the `VecOps` trait, providing methods for addition, subtraction, and multiplication of vectors. These methods support both single and batched operations based on the batch_size and columns_batch configurations.

### Methods

All operations are element-wise operations, and the results placed into the `result` param. These operations are not in place.
All operations are element-wise operations, and the results placed into the `result` param. These operations are not in place, except for accumulate.

- **`add`**: Computes the element-wise sum of two vectors.
- **`accumulate`**: Sum input b to a inplace.
Expand Down
17 changes: 7 additions & 10 deletions examples/c++/polynomial-multiplication/example.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -69,21 +69,18 @@ int main(int argc, char** argv)
ICICLE_CHECK(bn254_ntt(polyB.get(), NTT_SIZE, NTTDir::kForward, &ntt_config, d_polyB));

// (4) multiply A,B
VecOpsConfig config{
nullptr,
true, // is_a_on_device
true, // is_b_on_device
true, // is_result_on_device
false, // is_async
nullptr // ext
};
ICICLE_CHECK(bn254_vector_mul(d_polyA, d_polyB, NTT_SIZE, &config, d_polyRes));
VecOpsConfig config = default_vec_ops_config();
config.is_a_on_device = true;
config.is_b_on_device = true;
config.is_result_on_device = true;

ICICLE_CHECK(vector_mul(d_polyA, d_polyB, NTT_SIZE, config, d_polyRes));

// (5) INTT (in place)
ntt_config.are_inputs_on_device = true;
ntt_config.are_outputs_on_device = true;
ntt_config.ordering = Ordering::kMN;
ICICLE_CHECK(bn254_ntt(d_polyRes, NTT_SIZE, NTTDir::kInverse, &ntt_config, d_polyRes));
ICICLE_CHECK(ntt(d_polyRes, NTT_SIZE, NTTDir::kInverse, ntt_config, d_polyRes));

if (print) { END_TIMER(poly_multiply, "polynomial multiplication took"); }

Expand Down
Loading
Loading