Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve data races found by ThreadSanitizer #121

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

eszpotanski
Copy link

This PR fixes following data races found by ThreadSanitizer when OpenSTA is run by OpenROAD-flow-scripts:

  • Accessing tags_ and tag_groups_ in sta::Search while they're being reallocated in another thread. It's not enough to delay the swap, as reading from an array is two operations on x86 (load the pointer, then load the array element). The array may get swapped out and freed in between those. Example warning:
    WARNING: ThreadSanitizer: data race (pid=2114735)
      Write of size 8 at 0x725c00000928 by thread T12 (mutexes: write M0):
        #0 sta::Search::findTagGroup(sta::TagGroupBldr*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Search.cc:2661 (openroad+0x10360e1)
        #1 sta::Search::setVertexArrivals(sta::Vertex*, sta::TagGroupBldr*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Search.cc:2683 (openroad+0x1040ee3)
        #2 sta::ArrivalVisitor::visit(sta::Vertex*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Search.cc:1172 (openroad+0x1044037)
        [...]
    
      Previous read of size 8 at 0x725c00000928 by thread T2:
        #0 sta::Search::tagGroup(sta::Vertex const*) const OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Search.cc:2790 (openroad+0x1046553)
        #1 sta::PathVisitor::visitEdge(sta::Pin const*, sta::Vertex*, sta::Edge*, sta::Pin const*, sta::Vertex*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Search.cc:2041 (openroad+0x1046553)
        #2 sta::PathVisitor::visitFaninPaths(sta::Vertex*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Search.cc:2006 (openroad+0x102bf95)
        [...]
    
    
  • Accessing and modifying bit fields from multiple threads. It's not possible to write individual bits to RAM, the best you can do is read a byte, do a bit operation on it, and write it back. Moreover, compilers can (and do) read up to 8 bytes, do bit operations on that, and write the whole 8 bytes back. If between reading and writing a field we're not even accessing (seemingly) changes, then the write-back overwrites that change. Example warning:
    WARNING: ThreadSanitizer: data race (pid=2081413)
      Read of size 3 at 0x728804e234f0 by thread T2:
        #0 sta::Vertex::level() const OpenROAD-flow-scripts/tools/OpenROAD/src/sta/include/sta/Graph.hh:286 (openroad+0xfd2b06)
        #1 sta::BfsFwdIterator::enqueueAdjacentVertices(sta::Vertex*, sta::SearchPred*, int) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Bfs.cc:367 (openroad+0xfd2b06)
        #2 sta::BfsIterator::enqueueAdjacentVertices(sta::Vertex*, sta::SearchPred*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Bfs.cc:127 (openroad+0xfd0482)
        #3 sta::ArrivalVisitor::visit(sta::Vertex*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Search.cc:1167 (openroad+0x1043fe0)
        [...]
    
      Previous write of size 6 at 0x728804e234f0 by thread T15:
        #0 sta::Vertex::setTagGroupIndex(int) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/graph/Graph.cc:1263 (openroad+0xebcd28)
        #1 sta::Search::setVertexArrivals(sta::Vertex*, sta::TagGroupBldr*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Search.cc:2727 (openroad+0x104100a)
        #2 sta::ArrivalVisitor::visit(sta::Vertex*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Search.cc:1172 (openroad+0x1044037)
        [...]
    
  • Indexing and inserting in an ObjectTable or ArrayTable in sta::Graph from multiple threads. Example warning:
    WARNING: ThreadSanitizer: data race (pid=2081413)
    Write of size 8 at 0x725800001310 by thread T14 (mutexes: write M0):
      #0 sta::ArrayTable<float>::pushBlock(sta::ArrayBlock<float>*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/include/sta/ArrayTable.hh:172 (openroad+0xec59e3)
      #1 sta::ArrayTable<float>::makeBlock(unsigned int) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/include/sta/ArrayTable.hh:150 (openroad+0xec145b)
      #2 sta::ArrayTable<float>::make(unsigned int, float*&, unsigned int&) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/include/sta/ArrayTable.hh:134 (openroad+0xec145b)
      [...]
    
    Previous read of size 8 at 0x725800001310 by thread T6:
      #0 sta::ArrayTable<float>::pointer(unsigned int) const OpenROAD-flow-scripts/tools/OpenROAD/src/sta/include/sta/ArrayTable.hh:201 (openroad+0xebaa51)
      #1 sta::Graph::arrivals(sta::Vertex*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/graph/Graph.cc:589 (openroad+0xebaa51)
      #2 sta::PathVertex::arrival(sta::StaState const*) const OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/PathVertex.cc:232 (openroad+0x1002349)
      #3 sta::PathVisitor::visitFromPath(sta::Pin const*, sta::Vertex*, sta::RiseFall const*, sta::PathVertex*, sta::Edge*, sta::TimingArc*, sta::Pin const*, sta::Vertex*, sta::RiseFall const*, sta::MinMax const*, sta::PathAnalysisPt const*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/Search.cc:2106 (openroad+0x10391cb)
      [...]
    
  • A race between PathGroup::savable which accesses PathEnds and PathGroup::prune() which deletes PathEnds. Example warning:
    WARNING: ThreadSanitizer: data race (pid=2898495)
      Write of size 4 at 0x72200032a4cc by thread T13 (mutexes: write M0):
        #0 sta::PathGroup::prune() OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/PathGroup.cc:155 (openroad+0xffa7b3)
        #1 sta::PathGroup::insert(sta::PathEnd*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/PathGroup.cc:128 (openroad+0xffaa97)
        #2 sta::MakePathEnds1::vertexEnd(sta::Vertex*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/PathGroup.cc:578 (openroad+0xffab62)
        [...]
    
      Previous read of size 4 at 0x72200032a4cc by thread T14:
        #0 sta::delayLessEqual(float const&, float const&, sta::StaState const*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/graph/DelayFloat.cc:120 (openroad+0xeb5dad)
        #1 sta::PathGroup::savable(sta::PathEnd*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/PathGroup.cc:104 (openroad+0xff19ea)
        #2 sta::MakePathEnds1::visitPathEnd(sta::PathEnd*, sta::PathGroup*) OpenROAD-flow-scripts/tools/OpenROAD/src/sta/search/PathGroup.cc:553 (openroad+0xff8764)
        [...]
    

These changes incur a significant slowdown (between 10% and 40%, depending on number of threads, used design or stage).

Note that these are only the races found by ThreadSanitizer, not necessarily all of them.

@CLAassistant
Copy link

CLAassistant commented Nov 7, 2024

CLA assistant check
All committers have signed the CLA.

graph/Graph.cc Outdated Show resolved Hide resolved
graph/Graph.cc Outdated
Comment on lines 1369 to 1382
unsigned char expected = bfs_in_queue_;
unsigned char desired;
do {
if ((value && IN_QUEUE(expected, index)) || (!value && !IN_QUEUE(expected, index))) {
return false;
}
desired = expected;
SET_IN_QUEUE(value ? desired : expected, index);
CLEAR_IN_QUEUE(value ? expected : desired, index);
} while (!bfs_in_queue_.compare_exchange_weak(expected, desired));
return true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this compare to just locking. I wonder if the additional memory locality and smaller memory footprint make up for the coarser lock

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Locking is slightly slower - a few benchmarks show slowdown between 0.5% and 2%

@eszpotanski eszpotanski force-pushed the eszpot/thread-safety branch 2 times, most recently from 30209e0 to 1a3f0b7 Compare November 9, 2024 08:48
@QuantamHD
Copy link
Contributor

@jjcherry56 Do these changes seem reasonable, or is there a better way to address these data races? The atomic bitset seems a bit complicated to me, but it might be worth the speed up.

Signed-off-by: Eryk Szpotanski <[email protected]>
Co-authored-by: Krzysztof Bieganski <[email protected]>
@eszpotanski
Copy link
Author

As b4be3c5 fixed part of the data races, I've updated this PR to contain fixes for remaining ones - caused by:

  • accessing and modifying bit fields,
  • indexing and inserting in an ObjectTable or ArrayTable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants