Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New ungrouping logic #2552

Conversation

openroad-robot
Copy link
Contributor

Get area estimates from Yosys for unsynthesized modules and use those to make ungrouping decisions, replacing the existing solution of doing synthesis in two passes (first fully hierarchical to get area numbers, then with selective flattening)

.gitmodules Outdated Show resolved Hide resolved
@oharboe
Copy link
Collaborator

oharboe commented Nov 6, 2024

Not to be a party pooper, but is this now moot?

With the latest fixes to macro placement, can we find a design with autoplaced macros that is better off with SYNTH_HIERARCHICAL=1 than 0?

If macro placement is equally good with SYNTH_HIERARCHICAL=0, what does turning it on help?

Running time of synthesis?

megaboom is a happy case for SYNTH_HIERARCHICAL=0, it reduces number of instances by 50%

@povik
Copy link
Contributor

povik commented Nov 6, 2024

@maliberty please comment

@oharboe
Copy link
Collaborator

oharboe commented Nov 6, 2024

@maliberty please comment

Just to be clear: I think this is a MUCH better solution than the two phase synthesis to implement a keep/flatten policy and I think we should merge it.

I am just wondering what the problem we have today with quality of results and running time is.

@povik
Copy link
Contributor

povik commented Nov 6, 2024

Sure, my understanding from Matt is SYNTH_HIERARCHY=1 is still a very important mode for good macro placement, in addition to being useful for applying constraints. Synthesis runtime is less of a concern.

@maliberty
Copy link
Member

It enables constraints on hierarchical ports once we have that enabled in sta. It also allows for GUI visualization. As for 50% saving, I think that is the result of using a poor size setting.

@oharboe
Copy link
Collaborator

oharboe commented Nov 6, 2024

It enables constraints on hierarchical ports once we have that enabled in sta.

Silly questions:

what is this used for and is there an example in ORFS?

Also, now that flattened register names are stable(it seems to me), why is this unattainable together with flattened synthesis?

It also allows for GUI visualization.

Agreed, looks nice, but I am curious how this helps quality of results?

As for 50% saving, I think that is the result of using a poor size setting.

For now, I think I have dispelled such hopes for megaboom.

I did a MAX_UNGROUP_SIZE sweep: maximum saving and WNS only materializes when flattening everything and using macro placement from hierarchical synthesis:

The-OpenROAD-Project/megaboom#191

I think doing a fresh sweep of parameters might as well await fixes in global placement to problems that @jeffng-or identified as well as the macro placement fixes.

Even so, my hopes are not high that tinkering with MAX_UNGROUP_SIZE will be meaningful(keep hierarchical synthesis and compete on quality of results).

Possibly a manual flattening/ungroup/keep policy could do the trick. A manual flattening policy together with a manual .sdc could flatten everything that is not needed by .sdc and it would be single pass(possibly after a first pass to get some statistics). Possibly, depending on usecase, trivial to articulate by the time one has written the .sdc and as stable as the
.sdc once one works on RTL as much as one does at this stage of development.

@maliberty
Copy link
Member

what is this used for and is there an example in ORFS?

There is no example today as we don't support it. It is common to write SDC in terms of RTL ports as they are stable across synthesis and other optimizations. Think false-paths as an example. It is a regular request as proprietary tools do support this so people have SDCs requiring such.

@maliberty
Copy link
Member

For now, I think I have dispelled such hopes for megaboom.

Its only one test case and there is a lot of work on macro placement still to do. Even so you show a small amount of ungrouping can have a small effect (<50%).
image

@oharboe
Copy link
Collaborator

oharboe commented Nov 6, 2024

what is this used for and is there an example in ORFS?

There is no example today as we don't support it. It is common to write SDC in terms of RTL ports as they are stable across synthesis and other optimizations. Think false-paths as an example. It is a regular request as proprietary tools do support this so people have SDCs requiring such.

ORFS mock-cpu has(more like is) a clock crossing async fifo that could use this. the .sdc file is missing some sophistication in the articulation of the false paths for the fifo clock crossing.

Though I think it could be articulated today using flip flops and flattening instead of ports, though at the cost of having a less stable harder to articulate .sdc file(uses post synthesis names. names would vary between flattening and not, though that applies to ports of instance too).

Right?

Anyway: a specific test case in ORFS as it becomes possible will be clarifying.

@maliberty
Copy link
Member

Not all constraints are easily formulated in terms of flops (one pin could lead to an arbitrary number of flops). Requiring people to rewrite there SDC wins few friends.

@oharboe
Copy link
Collaborator

oharboe commented Nov 7, 2024

Not all constraints are easily formulated in terms of flops (one pin could lead to an arbitrary number of flops). Requiring people to rewrite there SDC wins few friends.

I see. What do you think about allowing a manual policy?

If you are writing an .sdc file, you know what modules you care to keep?

@maliberty
Copy link
Member

I think a manual policy is fine. Ideally yosys would read the sdc and automatically keep necessary modules but that is too far off for now.

@oharboe
Copy link
Collaborator

oharboe commented Nov 7, 2024

I think a manual policy is fine. Ideally yosys would read the sdc and automatically keep necessary modules but that is too far off for now.

Sounds good. I think working through a few examples in ORFS will be illuminating on the use case and that the implementation should be trivial once the use cases are clear.

@oharboe
Copy link
Collaborator

oharboe commented Nov 7, 2024

Just to be clear: the best megaboom results require synthesis three times today.

The flow below is automated in megaboom upon changes to RTL or updates to ORFS/OpenROAD/yosys:

  • synthesis twice to get best macro placement, which is written out to a file via write_macro_placement
  • flattened synthesis after which the macro placement from above is used

It seems to me that this flow should give the best quality of results.

However, the WNS is just a hair(2%) better with macro placement based on hierarchical synthesis vs flattened synthesis. 2% is probably below the the "inconclusive" threshold.

Again: once global and macro placement fixes are in, I will do a new sweep in megaboom.

The placement density of megaboom at 0.25(due to macro and global placement problems) is also so low that I am loath to conclude much. We could be looking at pathological routing based on added wirelengths alone that is mudding the picture.

Also, megaboom is documented to require retiming. without which synthesis creates pathological logic depths, which now clearly appear in WNS: combinational multiplication with three pipeline stages which is rewritten to pipelined multiplier by commercial tools in the 28nm 1000ps tapeout.

@oharboe
Copy link
Collaborator

oharboe commented Nov 18, 2024

Just FYI, macro placement on megaboom with flattened synthesis gives much worse results than hierarchical synthesis and macro placement The-OpenROAD-Project/megaboom#206 (comment)

povik added 2 commits January 6, 2025 14:00
Signed-off-by: Martin Povišer <[email protected]>
@openroad-ci openroad-ci force-pushed the secure-flow-new_ungrouping branch from b53bf39 to 6900957 Compare January 6, 2025 16:53
povik added 2 commits January 7, 2025 14:56
asap7/aes-block

| Metric                                        | Old      | New      | Type     |
| ------                                        | ---      | ---      | ----     |
| synth__design__instance__area__stdcell        |  2310.54 |  2293.17 | Tighten  |
| placeopt__design__instance__count__stdcell    |    12203 |    11853 | Tighten  |
| cts__design__instance__count__setup_buffer    |     1061 |     1031 | Tighten  |
| cts__design__instance__count__hold_buffer     |     1061 |     1031 | Tighten  |
| detailedroute__route__wirelength              |    98764 |    88188 | Tighten  |
| finish__timing__setup__ws                     |  -164.76 |  -198.03 | Failing  |
| finish__design__instance__area                |     7462 |     7443 | Tighten  |
| finish__timing__drv__setup_violation_count    |      531 |      515 | Tighten  |

asap7/riscv32i

| Metric                                        | Old      | New      | Type     |
| ------                                        | ---      | ---      | ----     |
| detailedroute__route__wirelength              |   111130 |   144993 | Failing  |
| finish__timing__setup__ws                     |  -130.12 |  -113.89 | Tighten  |
| finish__timing__drv__setup_violation_count    |      764 |      585 | Tighten  |
| finish__timing__wns_percent_delay             |   -15.61 |   -14.39 | Tighten  |

gf180/ibex

| Metric                                        | Old      | New      | Type     |
| ------                                        | ---      | ---      | ----     |
| placeopt__design__instance__count__stdcell    |    18156 |    18085 | Tighten  |
| cts__design__instance__count__setup_buffer    |     1579 |     1573 | Tighten  |
| cts__design__instance__count__hold_buffer     |     1579 |     1573 | Tighten  |
| globalroute__antenna_diodes_count             |       18 |        3 | Tighten  |
| detailedroute__antenna_diodes_count           |        5 |        9 | Failing  |
| finish__design__instance__area                |   936597 |   919894 | Tighten  |
| finish__timing__drv__setup_violation_count    |      794 |      786 | Tighten  |

nangate45/bp_fe_top

| Metric                                        | Old      | New      | Type     |
| ------                                        | ---      | ---      | ----     |
| synth__design__instance__area__stdcell        | 244461.52 | 241575.35 | Tighten  |
| placeopt__design__instance__area              |   263557 |   262272 | Tighten  |
| placeopt__design__instance__count__stdcell    |    45648 |    40888 | Tighten  |
| cts__design__instance__count__setup_buffer    |     3969 |     3556 | Tighten  |
| cts__design__instance__count__hold_buffer     |     3969 |     3556 | Tighten  |
| detailedroute__route__wirelength              |  2733057 |  2254141 | Tighten  |
| finish__timing__setup__ws                     |    -0.16 |    -0.12 | Tighten  |
| finish__design__instance__area                |   265846 |   264098 | Tighten  |
| finish__timing__drv__setup_violation_count    |     1985 |     1778 | Tighten  |
| finish__timing__drv__hold_violation_count     |      678 |     1194 | Failing  |
| finish__timing__wns_percent_delay             |   -16.09 |   -13.02 | Tighten  |

nangate45/bp_multi_top

| Metric                                        | Old      | New      | Type     |
| ------                                        | ---      | ---      | ----     |
| synth__design__instance__area__stdcell        | 633565.05 | 586679.15 | Tighten  |
| placeopt__design__instance__area              |   662857 |   623022 | Tighten  |
| placeopt__design__instance__count__stdcell    |   187590 |   150268 | Tighten  |
| cts__design__instance__count__setup_buffer    |    16312 |    13067 | Tighten  |
| cts__design__instance__count__hold_buffer     |    16312 |    13067 | Tighten  |
| detailedroute__route__wirelength              |  5517943 |  4895820 | Tighten  |
| finish__timing__setup__ws                     |    -2.16 |    -3.89 | Failing  |
| finish__design__instance__area                |   670565 |   630222 | Tighten  |
| finish__timing__drv__setup_violation_count    |     8156 |     6533 | Tighten  |

sky130hd/ibex

| Metric                                        | Old      | New      | Type     |
| ------                                        | ---      | ---      | ----     |
| globalroute__antenna_diodes_count             |       46 |       78 | Failing  |
| detailedroute__route__wirelength              |   814720 |   811317 | Tighten  |
| finish__timing__setup__ws                     |    -1.88 |    -1.21 | Tighten  |
| finish__timing__drv__setup_violation_count    |      948 |      937 | Tighten  |

Signed-off-by: Martin Povišer <[email protected]>
@povik
Copy link
Contributor

povik commented Jan 7, 2025

Metrics move

nangate45/swerv_wrapper

failing on congestion

asap7/aes-block

Metric Old New Type
synth__design__instance__area__stdcell 2310.54 2293.17 Tighten
placeopt__design__instance__count__stdcell 12203 11853 Tighten
cts__design__instance__count__setup_buffer 1061 1031 Tighten
cts__design__instance__count__hold_buffer 1061 1031 Tighten
detailedroute__route__wirelength 98764 88188 Tighten
finish__timing__setup__ws -164.76 -198.03 Failing
finish__design__instance__area 7462 7443 Tighten
finish__timing__drv__setup_violation_count 531 515 Tighten

asap7/riscv32i

Metric Old New Type
detailedroute__route__wirelength 111130 144993 Failing
finish__timing__setup__ws -130.12 -113.89 Tighten
finish__timing__drv__setup_violation_count 764 585 Tighten
finish__timing__wns_percent_delay -15.61 -14.39 Tighten

gf180/ibex

Metric Old New Type
placeopt__design__instance__count__stdcell 18156 18085 Tighten
cts__design__instance__count__setup_buffer 1579 1573 Tighten
cts__design__instance__count__hold_buffer 1579 1573 Tighten
globalroute__antenna_diodes_count 18 3 Tighten
detailedroute__antenna_diodes_count 5 9 Failing
finish__design__instance__area 936597 919894 Tighten
finish__timing__drv__setup_violation_count 794 786 Tighten

nangate45/bp_fe_top

Metric Old New Type
synth__design__instance__area__stdcell 244461.52 241575.35 Tighten
placeopt__design__instance__area 263557 262272 Tighten
placeopt__design__instance__count__stdcell 45648 40888 Tighten
cts__design__instance__count__setup_buffer 3969 3556 Tighten
cts__design__instance__count__hold_buffer 3969 3556 Tighten
detailedroute__route__wirelength 2733057 2254141 Tighten
finish__timing__setup__ws -0.16 -0.12 Tighten
finish__design__instance__area 265846 264098 Tighten
finish__timing__drv__setup_violation_count 1985 1778 Tighten
finish__timing__drv__hold_violation_count 678 1194 Failing
finish__timing__wns_percent_delay -16.09 -13.02 Tighten

nangate45/bp_multi_top

Metric Old New Type
synth__design__instance__area__stdcell 633565.05 586679.15 Tighten
placeopt__design__instance__area 662857 623022 Tighten
placeopt__design__instance__count__stdcell 187590 150268 Tighten
cts__design__instance__count__setup_buffer 16312 13067 Tighten
cts__design__instance__count__hold_buffer 16312 13067 Tighten
detailedroute__route__wirelength 5517943 4895820 Tighten
finish__timing__setup__ws -2.16 -3.89 Failing
finish__design__instance__area 670565 630222 Tighten
finish__timing__drv__setup_violation_count 8156 6533 Tighten

sky130hd/ibex

Metric Old New Type
globalroute__antenna_diodes_count 46 78 Failing
detailedroute__route__wirelength 814720 811317 Tighten
finish__timing__setup__ws -1.88 -1.21 Tighten
finish__timing__drv__setup_violation_count 948 937 Tighten

@povik povik requested review from maliberty and oharboe January 7, 2025 14:06
Copy link
Collaborator

@oharboe oharboe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@maliberty
Copy link
Member

swerv_wrapper nangate45 needs addressing

# find a reference nand2 gate
set found_cell ""
set found_cell_area ""
foreach cell [tee -q -s result.string select -list-mod =*/a:lut=4'b0111 %m] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this magic about lut? Is this looking for a nand2 as the message below suggests? A comment would be helpful for non-yosys experts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let me add it

export MAX_UNGROUP_SIZE ?= 100
export MAX_UNGROUP_SIZE ?= 1000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this due to a difference in the estimated area? Why only in asap7?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asap7 is one of those which moved the most, MAX_UNGROUP_SIZE is now in number of nand2 gates, not in platform's area units

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like it would affect all platforms. Is a doc update needed as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually asap7 and nangate45 are the only platforms with a default, and for nangate45 one area unit is somewhat close to one nand2 gate so the value is kept

povik added 2 commits January 7, 2025 21:20
| Metric                                        | Old      | New      | Type     |
| ------                                        | ---      | ---      | ----     |
| synth__design__instance__area__stdcell        | 737362.47 | 725447.06 | Tighten  |
| placeopt__design__instance__area              |   783985 |   780290 | Tighten  |
| placeopt__design__instance__count__stdcell    |   123213 |   116450 | Tighten  |
| cts__design__instance__count__setup_buffer    |    10714 |    10126 | Tighten  |
| cts__design__instance__count__hold_buffer     |    10714 |    10126 | Tighten  |
| detailedroute__route__wirelength              |  5712360 |  5355926 | Tighten  |
| finish__timing__setup__ws                     |    -1.05 |    -0.67 | Tighten  |
| finish__design__instance__area                |   790704 |   787338 | Tighten  |
| finish__timing__drv__setup_violation_count    |     5357 |     5063 | Tighten  |
| finish__timing__drv__hold_violation_count     |     1089 |      599 | Tighten  |
| finish__timing__wns_percent_delay             |    -35.9 |   -32.44 | Tighten  |

Signed-off-by: Martin Povišer <[email protected]>
Signed-off-by: Martin Povišer <[email protected]>
@povik
Copy link
Contributor

povik commented Jan 7, 2025

swerv_wrapper nangate45 needs addressing

Resolved by disabling hierarchical synthesis

Metric Old New Type
synth__design__instance__area__stdcell 737362.47 725447.06 Tighten
placeopt__design__instance__area 783985 780290 Tighten
placeopt__design__instance__count__stdcell 123213 116450 Tighten
cts__design__instance__count__setup_buffer 10714 10126 Tighten
cts__design__instance__count__hold_buffer 10714 10126 Tighten
detailedroute__route__wirelength 5712360 5355926 Tighten
finish__timing__setup__ws -1.05 -0.67 Tighten
finish__design__instance__area 790704 787338 Tighten
finish__timing__drv__setup_violation_count 5357 5063 Tighten
finish__timing__drv__hold_violation_count 1089 599 Tighten
finish__timing__wns_percent_delay -35.9 -32.44 Tighten

@povik povik requested a review from maliberty January 7, 2025 20:40
@povik
Copy link
Contributor

povik commented Jan 7, 2025

Not sure if there's a good reason to continue having a per platform default for MAX_UNGROUP_SIZE instead of a global one. I'm happy to address that in a separate PR

@maliberty
Copy link
Member

Not sure if there's a good reason to continue having a per platform default for MAX_UNGROUP_SIZE instead of a global one. I'm happy to address that in a separate PR

I don't see a good reason to have it per platform.

@povik
Copy link
Contributor

povik commented Jan 10, 2025

Gentle ping. Should I do more here?

@maliberty maliberty merged commit faaca5d into The-OpenROAD-Project:master Jan 10, 2025
6 checks passed
@maliberty maliberty deleted the secure-flow-new_ungrouping branch January 10, 2025 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants