-
-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance with System.Numerics and Multithreading #12
Comments
Compatibility Issues System.Numerics.Vector3 leverages SIMD (Single Instruction, Multiple Data) instructions for optimized vector operations, which can be very fast for certain operations. However, SIMD operations can significantly vary in performance depending on the data structures and types of operations involved. For example, using Vector3 may actually be slower for performing operations on small-sized vectors or simple scalar operations. Furthermore, SIMD extensions may not be supported on all hardware, making it unavailable in certain environments. |
I couldn't find topics related to downsides, I could imagine very few case Vector3 would be slower but overall I thought it would be worth it. |
Could you please provide your environment? |
I edited my comment in case it was misinterpreted. |
I've added a new branch with a version that uses SIMD. While it needs testing on various architectures, it has already increased performance by over 50% on my current laptop. I'll continue with more R&D, and once I'm confident it's safe, I'll plan to merge it into the main branch. |
Thanks, great news. |
Hello folks! Any updates on the SIMD support? We are using it on the server side for an unannounced MMO and results are good with this port. However, we would like to indeed leverage SIMD on this. On that subject - are Thanks! |
|
Thanks for the reply @ikpil We only use Is that enough to protect the Write operations but Read from any thread without "lock"? Again, thanks for the great work on this port! |
I haven't tested it, but just take a look at the feeling. public class DtCrowdManager
{
private DtCrowd _crowd;
private ConcurrentQueue<Action> _requests;
public DtCrowdManager(DtCrowd crowd)
{
_crowd = crowd;
_requests = new ConcurrentQueue<Action>();
}
// DtCrowdAgent - should only read.
// AddAsync - thread-safe
public Task<DtCrowdAgent> AddAsync(RcVec3f pos, DtCrowdAgentParams option)
{
var tcs = new TaskCompletionSource<DtCrowdAgent>(TaskCreationOptions.RunContinuationsAsynchronously);
_requests.Enqueue(() =>
{
var ag = _crowd.AddAgent(RcVec3f.Zero, null); // ..
tcs.SetResult(ag);
});
return tcs.Task;
}
// RemoveAsync - thread-safe
public Task<bool> RemoveAsync(DtCrowdAgent ag)
{
var tcs = new TaskCompletionSource<bool>(TaskCreationOptions.RunContinuationsAsynchronously);
_requests.Enqueue(() =>
{
_crowd.RemoveAgent(ag);
tcs.SetResult(true); // ...
});
return tcs.Task;
}
// It should be called only from one thread.
public void Update(float dt)
{
while (_requests.TryDequeue(out var action))
{
action.Invoke();
}
_crowd.Update(dt, null);
}
} |
We use DotRecast in a server which is based on Microsoft Orleans. Locks in that context are extremely harmful. So I guess in our case the |
Any update on this? I'm not very concerned about the performance difference between vector types, but having to convert all of my vectors anytime I interact with this library makes it a lot more painful than necessary. Most game- and graphics-related libraries are using System.Numerics at this point, and it can't be overstated how much more convenient things are when everything is consistent. |
In conclusion, I plan to make the change around November 2024 when dotnet 6 support ends. The issue here is a crash occurring due to memory corruption during SIMD operations in a specific environment. Here's how I tested it:
So, I switched to using RcVec3f. Here are similar reported issues: |
The work to make the change is already completed, and I'm concerned that if we switch now, there might still be people using early versions of dotnet 6 who could experience crashes. 😓 |
현재까지 업데이트를 잘 하고 있는 프로젝트를 운영하셔서 저에게는 너무 큰 공부가 되어서 감사드리며, As we use the server, entities (agents) are managed through space partitioning (quadtree or sectors). The update tick itself is called for each partitioned space, and agents are updated accordingly. In the DtCrowd code and the code you provided, all agents are processed through GetActiveAgents within DtCrowd. If agents are partitioned by space, what would be the best approach to handle this? I would like to ask if it is structurally feasible to override DtCrowd's Update or GetActiveAgents to handle target agents. Additinally, I found your another post in Unity Forum. The first approach is to implement it directly.
It seems your approach would be directly method without DtCrowd. Isn't it? And you mentioned about partitioning for multi threads, Could you explain more about both approaching? |
Could you provide more details on what it means for agents to be partitioned by space? |
의사전달을 위해, 조금더 자세히 상황설명을 해드리게 됩니다(한글로 쓰고 번역을 하게 됩니다)
하나의 네비메시를 포함한 Field는 특정 간격(10m)으로 Sector별로 공간분할이 되며, Entity는 Position에 따라서 Sector에 속하고 있습니다.(Position에 따라 Sector간 이동됨) 서버 Sector들의 Update함수 호출은 Entity간 영향범위가 없는 거리에 따라, 적당히 스케쥴링되어 동시에 Multi Thread Task에서 호출하고 있습니다. ex) 1번부터 100번까지의 Sector가 있는상황에서, 10개의 Thread Task가 같은틱에서 서로간 영향없는 Sector들의 Update 함수를 호출. 다만 현재의 DtCrowd를 보면, 하나의 DtCrowd개체(Agent 리스트를 포함)에서 한번의 update틱에서 로직수행을 하고 있어서 Agent들의 상호간 처리를 자연스럽게 하는 DtCrowd의 많은기능(Steering, Collision)을 Multi Thread에서 쓸 수 있을까 고민하던 찰나였었고, 만약 하나의 DTNavMesh에서 DtCrowd의 MultiThread 단순한 접근이 힘들다고 하면, 저는 다음과 같은 선택을 해야 할 것 같습니다.
위의 Idea에 대한 것은 제가 현재까지 생각해본 내용이지만, To facilitate communication, let me provide a more detailed explanation of the situation.
A Field containing a NavMesh is spatially partitioned by sectors at specific intervals, and entities belong to sectors based on their positions. The position of an entity is the same as the position of the CrowdAgent (reference) it holds (referenced). The server's update function is scheduled and called simultaneously in a multi-threaded task in an order that ensures no influence range between entities within the grid. For example, in a situation with sectors numbered from 1 to 100, 10 thread tasks call the update functions of sectors without influence range. Currently, DtCrowd performs logic in a single update tick with one DtCrowd instance (including a list of agents). In a situation like mine, where an update tick is needed for each sector (agents without influence range in the same tick), it seems impossible to split the update of agents in DtCrowd. Hence, I am asking for advice. I was considering whether it is possible to use many of DtCrowd's features (steering, collision) in a multi-threaded environment, given that they handle agent interactions naturally. If it is difficult to achieve simple multi-threaded access with DtCrowd in a single DTNavMesh, I might need to make the following choices:
|
DtCrowd 예제는 많은 기능들이 어떻게 상호 연결되는지를 보여주며, 스레드 안전성(thread-safety)을 고려하지 않았습니다. 전체적으로 두 가지 주요 접근 방식이 있습니다: 에이전트 분할 기능 분할 개발의 편의성과 생산성 측면에서 1-1과 1-4가 가장 쉽습니다. 2-1은 추가 연구가 필요합니다. 부족한 내용이지만 도움이 되셨으면 합니다. The DtCrowd example shows how many features are interconnected, and it does not consider thread-safety. There are many ways to develop crowd processing with multi-threading, but the methods I know do not deviate significantly from what @kaoraswoo mentioned. Agent Splitting Functionality Splitting In my experience, 2-1 requires further research. Although the content is lacking, I hope it was helpful. |
Another thing I had done which improves performance and readability was converting float collections to Vector3 collections. |
First of all thanks for the port.
I refactored to use Vector3 instead of RcVec3f and the performance I had was like 4x faster on DtCrowd.
Parallelism also improved a lot.
Why is RcVec3f being used instead?
The text was updated successfully, but these errors were encountered: