Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make PDL be a C library with a Perl interface so it can be used from other C (or dynamic language) code #358

Open
2 of 10 tasks
mohawk2 opened this issue Dec 31, 2021 · 5 comments

Comments

@mohawk2
Copy link
Member

mohawk2 commented Dec 31, 2021

Tasks:

  • segregate the Perl-using C code (mostly pdlcore.c) completely away from the "PDL API" code, separating PDL_CORE_LIST into PDL_PERL_LIST and PDL_API_LIST - for compatibility reasons this will need to remain called "pdlcore.h" so may as well still call the .c the same, and probably merge pdlperl.h back in; the C API stuff would need to move to pdl.h
  • make macros that handle either allocating or using the default arrays such as def_dims
  • separate out a PDL_Value union type for the PDL_Anyval.value
  • use that to add a pdl.value entry to be used for ndarrays whose data segment is less than that number of bytes (which for small types could be several elements) obviously including scalar/single-value, using the alloc/default macro
  • switch PDL to just using realloc rather than pdl_grow/pdl_makescratchhash
  • have a pdl_impl_vtable * pdl.impl pointer, and void * pdl.impl_data that would replace the current datasv, sv, hdrsv (probably with a struct for the Perl implementation to use), implementing a Perl version
  • segregate remaining croak (etc)-using code into that vtable (especially pp_indterm)
  • that vtable might also want memory-management entries (malloc, realloc, free) unless the "switch to realloc" item renders it moot
  • eliminate the whole pdl.hdrsv from the C code, make the Perl object routinely be a hashref with a PDL and a hdr element so any copying could be done at Perl level
  • detection of POSIX threads could be de-Perl-ised

This is somewhat connected to the #349 ideas on making a broadcastloop vtable, but only somewhat.

@zmughal
Copy link
Member

zmughal commented Oct 18, 2022

PDL should also allow for custom allocators so that they can be used for situations where different allocators can be more efficient or integrate better with an existing external library.

For example, aligned memory (e.g., through C11's aligned_alloc, Windows-specific _aligned_malloc) can give a significant performance boost because it allows for using particular SSE/AVX instructions[*].

As discussed in IRC, I would like to be able to set this at runtime per-ndarray (internal C interface).

A related question is providing a high-level way to indicate that the output of operations between ndarrays (either another aligned-alloc-N-bytes ndarray or a regular-malloc ndarray) should be into a PDL that uses a specific allocator (my goal is that aligned-alloc-N-bytes ndarray would output to another aligned-alloc-N-bytes ndarray).

[*] Taking advantage of SSE/AVX for particular PDL ops is also something to look into and is likely a whole big project on its own.

@mohawk2
Copy link
Member Author

mohawk2 commented Oct 18, 2022

Would a way to get close(?) to this with current PDL be to make a PDL subclass (adjusting your suggested name to PDL::Aligned) which allocated memory in a suitable alignment, and set PDL_DONTTOUCHDATA? An ndarray of the appropriate size could be constructed using current code and passed as the output ndarray(s) of given operations.

An alternative approach might be just to use MALLOCDBG in the PDL config so that all memory is allocated with an alignment, in some way, or other means of globally setting allocate/free.

@mohawk2
Copy link
Member Author

mohawk2 commented Oct 18, 2022

SSE/AVX utilisation might be better captured on #349. Notes should include pointers (ha!) on how to do so from the C level.

@chrisarg
Copy link

chrisarg commented Dec 7, 2024

PDL should also allow for custom allocators so that they can be used for situations where different allocators can be more efficient or integrate better with an existing external library.

For example, aligned memory (e.g., through C11's aligned_alloc, Windows-specific _aligned_malloc) can give a significant performance boost because it allows for using particular SSE/AVX instructions[*].

As discussed in IRC, I would like to be able to set this at runtime per-ndarray (internal C interface).

A related question is providing a high-level way to indicate that the output of operations between ndarrays (either another aligned-alloc-N-bytes ndarray or a regular-malloc ndarray) should be into a PDL that uses a specific allocator (my goal is that aligned-alloc-N-bytes ndarray would output to another aligned-alloc-N-bytes ndarray).

[*] Taking advantage of SSE/AVX for particular PDL ops is also something to look into and is likely a whole big project on its own.

There are other possibilities for the allocators e.g. mimalloc that may be of interest.

@chrisarg
Copy link

chrisarg commented Dec 7, 2024

PDL should also allow for custom allocators so that they can be used for situations where different allocators can be more efficient or integrate better with an existing external library.

For example, aligned memory (e.g., through C11's aligned_alloc, Windows-specific _aligned_malloc) can give a significant performance boost because it allows for using particular SSE/AVX instructions[*].

As discussed in IRC, I would like to be able to set this at runtime per-ndarray (internal C interface).

A related question is providing a high-level way to indicate that the output of operations between ndarrays (either another aligned-alloc-N-bytes ndarray or a regular-malloc ndarray) should be into a PDL that uses a specific allocator (my goal is that aligned-alloc-N-bytes ndarray would output to another aligned-alloc-N-bytes ndarray).

[*] Taking advantage of SSE/AVX for particular PDL ops is also something to look into and is likely a whole big project on its own.

The goals of Task::MemManager are somewhat related (though it does the allocation in an agnostic fashion to the eventual use, and in one of the examples uses mkndarray to provide a view of the buffer through PDL (so perhaps it will not generate AVX code?)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants