-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing Widths from Execution Instrumentation #114
Comments
@azidar I am a little worried about ports and nodes in sub-modules. My worries specifically:
|
One other thing I've thought about -- it'd be good if you could optimize a particular subcircuit and print out some kind of annotation list so that the circuit doesn't need to be reoptimized in a larger circuit. This is all food for thought and doesn't need to be implemented now. For demonstration purposes, running with a single module is good enough... But for long-term viability, it's important to consider sub-modules. |
You are correct with For splitting an instance, this may actually be relatively straightforward:
I'm not aware of anything that currently does this, but |
@chick I think this is instructive. I've updated the ranges branch of Chisel3DSPDependencies to all of the latest branches I'm using. If you get the chance, you should try to pull from FFTGen and run: sbt "test-only dsptools.intervals.tests.IAArithSpec" As an example:
Simple range inference seems to expect a 13-wide SInt, but then tests (exhaustive inputs, unless I screwed up) indicate only 11 bits are needed. Now I need to see what went wrong. :( But Definitely as a starting point, it'd be good to also report which nodes are inefficiently allocated. Also, stddev of 0 is... hmm...
|
@chick @azidar -- this is actually SUPER useful to sanity check firrtl range inference stuff, esp if you can add another column that just says UNDERCONSTRAINED or something when ti's width is larger than required from simulation. --> |
@azidar Here's my CHIRRTL. If you can make sense of why the constraint solver expects t1 to be 13 bits rather than 11, that'd be great! t1, t2 constraints should definitely match firrtl-interpreter results. t3, t4 shouldn't since there's some cancellation... I'll follow up with the range/width I expect from every intermediate node, given Interval analysis.
|
Working transform that will adjust widths of registers, ports and wires through the annotation system. Second piece of the augmented tool chain that will ultimately take advantage of firrt interpreters instrumentation output. Adjusting widths according to data gathered thereby. Part of Issue #114
Working transform that will adjust widths of registers, ports and wires through the annotation system. Second piece of the augmented tool chain that will ultimately take advantage of firrt interpreters instrumentation output. Adjusting widths according to data gathered thereby. Part of Issue #114
Good news is my BitReducer caught it:
|
With this we now have most of the machinery for Issue #114
Current intervals-oct now has new tools
There's more to do. Not sure how to prioritize.
|
Sorry, was out today. :x So currently you
You're asking if
I think being able to switch between min/max and sigma + clip would be more interesting to me personally. If you have time tomorrow, we can Google Hangouts or something to talk more, but even with just this, I'm super excited! :D |
@chick I got rid of |
Sorry, I was out most of today too. I think IntelliJ auto generated that import for me at some point. |
Yeah no worries. I made like 2 comments on your commits? From looking at them, I understand that you have all of the pieces but haven't actually done the whole width adjusted firrtl + DspTester bit? Maybe that's the highest priority. Sigma clipper is cute and useful, but that can wait... Running into some other problems with interval assumptions, so thinking about what would be best to have... |
Yes, I will hack out something that will do following:
Sound right? This will just be some big ugly hacked version of Dsptools Driver and iotesters Driver. |
Yeah, ok... well, hacked plumbing is better than no plumbing. I hand-calculated optimal bitwidths for my circuit and am double checking that ranges actually generated the right widths (not accounting for stuff like (b * a) - a having some cancellations -- which only interpreter can detect). |
See InstrumentingSpec:"run with with bits reduced" for example of use New executeWithBitReduction acts like ordinary dsptools.Driver.execute Adds in the following - Run with interpreter bit instrumentation - Analyzes report, creating annotations to reduce size - Runs transform to reduce bits in low Firrtl - Re-runs the tests re-instrumenting Produces files - <dut-name>.signal-bitsizes.csv - <dut-name>.signal-bitsizes-2.csv - <dut-name>.bit-reduced.fir
- Fixed changes to sizes in sub-module being lost - Added warnings if annotation to change wire not used or used more than once - Added a executeWithBitReduction to IAArithSpec, it works - Fix spelling in createAnnotationIfAppropritate
@shunshou I just pushed a working example in |
I assume that's a firrtl bug? What does that mean? I'm having problems, for example, getting MatMul to work with DspReal. Everything looks OK, but the data isn't getting fed in. Probably something missing on my end (with my custom TestModule), but I have no idea what... |
It's my bug (or missing feature). I should have been more specific, I meant that if the bit reducer finds signals who's width should change but not by the same amount and that are the same signal but in different instances of the same module, the ChangeWidth transform will not work right. |
@chick do you think you can add a quick n sigma (or 3 sigma for now) thing to the resizer? Basically, I just realized, to be exhaustive for even a 4x4 matrix multiply, you'd need something like 8 ^ 16 (assuming 8 bit input) inputs, so I think the "best bet" is to sample some 1000 inputs or so and 3sigma the bitwidths... |
@shunshou Sure, I need to finish something up right now. Should have it in an hour or two. |
:) Still debugging DspReal bit so whenever is fine. |
@shunshou What is the proscribed behavior if you say reduce to 3σ and that number is |
So rather than taking min(mean + 3sigma, maxSeen), you'd want to start at the inputs and go to the outputs, replacing the wider bitwidth nodes i.e. say a = UInt<30> with narrower widths i.e. say a_3sigma = a.clip(interval_range_equivalent_to_8-bit) assuming a_3sigma is UInt<8>, which is 3 sigma out. Also fyi solved the DspReal problem. Chisel didn't seem to catch a binding error or the tester didn't catch that I was poking an output. Didn't delve deep enough to figure out. |
hmm, can i do this all at one time or do I have to identify the earliest nodes in the dependency wrap their assignment with a clip of the determined size, ( I think by low Firrtl there can only be one (like highlander)). Seems like I might want to re-run the instrumentation/bit-reduction again, seems likely the clipping could change the σ of its dependents. This could take a while, if it's the right way |
So if you don't start at the beginning, the width might actually propagate and get unnecessarily large in previous nodes before it's trimmed at a given node. Maybe the clipping behavior is better discussed in a meeting w/ Paul and co... Let's just call that a stretch goal... |
Dumb question: What happens if 3sigma is further out than the calculate range inference? |
Currently I take the min of the 3σ max and the maxSeen. and vice versa on the low. |
How many iterations are needed to converge? Also, in the end, I need an output Verilog file... After it thinks its done, it should rerun tests with Verilator + generate Verilog. |
@chick check out sbt "test-only dsptools.toys.DCTMatMulSpec"
When using the reduction pass. (It's in the MatrixOps.scala file) Other thing: After bitwidth reduction, generate Verilog + run through Verilator. |
I'm struggling with getting the verilog to run after the firrtl's been bit reduced. |
Based on discussion with @azidar today.
|
@chick any luck? :o |
Just pushed a new version that seem to run ok with you DCT example. There seems to be some magic operating. When I added a couple of passes that Adam recommended everything seemed to start working, even though there is one more pass that I am supposed to implement. Also, I had to disable the updating of top level IO's because i could not get it to run with verilator when i did that. It may yet be fixable but I don't know specifically the mechanism of the failure. BTW, seems to not run if bins set to zero. I'm going to look into that right now. |
Cool. Yeah, the TestModule IO's don't need to be updated. |
I have not figured out the broken problem but I have a new strategy suggested by Paul
rather than changing dog instead we inject additional code along the lines of
@shunshou @azidar Any thoughts? I think I will go ahead with this as an experiment as I am a bit stuck at the moment figuring out what's going wrong with the SystolicMatMul @grebe did I miss anything here. |
Why are you clipping? More specifically, what are the semantics of the bit-reducer transform when given an out-of-bound value? Is this treated as "user error" or should the circuit act gracefully by clipping? If its the former, then you shouldn't clip (and that's what I thought we were doing). The solution I'll outline below only fixes where an trimmed wire is referenced, not where it is declared. I think both will work. However, both solutions need to correctly handle To synchronize everyone, here is my solution. For example:
First, we trim x to 3 bits and y to 2 bits:
However, now the bits operator references non-existent bits. Thus, we need to do something more complicated, namely return x's msb since its an SInt.
Finally, we need to trim the assignment to y:
Now we have a correct FIRRTL circuit, but its unoptimized. Namely, we could simplify the last expression to:
However, this optimization IS NOT ALWAYS VALID. Thus, if we just jump from the first step to the last step, in some cases it will not be correct. Thus, you need to do each of these transformations in sequence, without shortcuts. The transformation steps are as follows:
|
I'd say do your solution first - if we see a positive synthesis result, then we can call it a day. Otherwise we will need to do the optimization ourselves. |
I put clip in there because I was also considering the use of n·σ bit reductions, where we know that we will be pushing in values that are bigger than the bits we have re-allocated. |
Clip will (at best) add a mux for every trimmed wire, which could theoretically cause a HUGE explosion in area costs. I'm also a lot less certain that a CAD tool can optimize around the mux. For now, I would trim with bits instead of clip. Then, if that works, we can try the n·σ bit reductions with a clip. |
What is the operation of clip in this case? When simply trimming widths so
that tests pass (although some other inputs not tested might fail) you
should not be injecting any more logic. If you’re doing this at the
interval level and want to reassign ranges, try using the reassign interval
version of clip I created (or it might’ve been wrap; same op but different
constant args)
…On Wednesday, November 15, 2017, Adam Izraelevitz ***@***.***> wrote:
Clip will (at best) add a mux for every trimmed wire, which could
theoretically cause a HUGE explosion in area costs. I'm also a lot less
certain that a CAD tool can optimize around the mux. For now, I would trim
with bits instead of clip. Then, if that works, we can try the n·σ bit
reductions with a clip.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#114 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGTTFtIooxLGaL_u4VcCHfN2wKNZSMqMks5s2iFqgaJpZM4QGsU9>
.
|
Not going to do clips at this time! |
@azidar def decreaseTypeWidth(originalType: Type, delta: Int): Type = {
originalType match {
case SIntType(IntWidth(oldWidth)) => SIntType(IntWidth(oldWidth - delta))
case UIntType(IntWidth(oldWidth)) => UIntType(IntWidth(oldWidth - delta))
case other => other
}
}
def signExtend(numberToDo: Int, firstArg: Expression, lastArg: Expression, tpe: Type): Expression = {
if(numberToDo <= 1) {
DoPrim(Cat, Seq(firstArg, lastArg), Seq(), tpe)
}
else {
DoPrim(
Cat,
Seq(firstArg, signExtend(numberToDo - 1, firstArg, lastArg, tpe)),
Seq(),
decreaseTypeWidth(tpe, delta = numberToDo - 1)
)
}
}
def constructSmallerIntermediates(
wire: Statement with IsDeclaration,
tpe: Type,
changeRequest: ChangeRequest
): Block = {
val reduced = DefWire(wire.info, wire.name + "__reduced", changeTpe(tpe, changeRequest))
val msb = DefWire(wire.info, wire.name + "__msb", UIntType(IntWidth(1)))
val extendedSign = signExtend(
(typeToWidth(tpe) - changeRequest.newWidth).toInt,
WRef(msb),
DoPrim(AsUInt, Seq(WRef(reduced)), Seq(), UIntType(IntWidth(changeRequest.newWidth))),
tpe
)
Block(Seq(
wire,
reduced,
msb,
Connect(
wire.info, WRef(msb),
tpe match {
case _: UIntType => UIntLiteral(BigInt(0), IntWidth(1))
case _: SIntType => DoPrim(Head, Seq(WRef(reduced)), Seq(BigInt(1)), UIntType(IntWidth(1)))
}
),
Connect(
wire.info, WRef(wire.name, tpe, WireKind),
tpe match {
case _: UIntType =>
extendedSign
case _: SIntType =>
DoPrim(AsSInt, Seq(
extendedSign
), Seq(), tpe)
}
)
))
}
def changeWidthsInStatement(statement: Statement): Statement = {
val resultStatement = statement map changeWidthsInStatement map changeWidthsInExpression
resultStatement match {
case connect: Connect =>
changeRequests.get(expand(connect.loc.serialize)) match {
case Some(changeRequest) =>
val newLoc = connect.loc match {
case w: WRef => w.copy(name = w.name + "__reduced")
case s => s
}
// logger.info(s"Changing:Connect ${register.name} new width ${changeRequest.newWidth}")
Block(Seq(
connect.copy(loc = newLoc)
))
case _ => connect
}
case register: DefRegister =>
changeRequests.get(expand(register.name)) match {
case Some(changeRequest) =>
constructSmallerIntermediates(register, register.tpe, changeRequest)
// logger.info(s"Changing:DefReg ${register.name} new width ${changeRequest.newWidth}")
case _ => register
}
case wire: DefWire =>
changeRequests.get(expand(wire.name)) match {
case Some(changeRequest) =>
constructSmallerIntermediates(wire, wire.tpe, changeRequest)
// logger.info(s"Changing:DefReg ${wire.name} new width ${changeRequest.newWidth}")
case _ => wire
} and case connect: Connect =>
changeRequests.get(expand(connect.loc.serialize)) match {
case Some(changeRequest) =>
val newLoc = connect.loc match {
case w: WRef => w.copy(name = w.name + "__reduced")
case s => s
}
... Am I missing the sign bit (off-by-one error?) or can you see anything else that looks like I might be wrong. Is there are more idiomatic or elegant way of constructing the sign extension |
Shouldn't signExtend do nothing if nothingToDo == 0, not sign extend it? I'd also not worry about setting the types correctly, and just run InferTypes at the end. Also, in the following line, why are you adding 2?
|
@shunshou @azidar I have not been able to construct a test yet, but I have convinced myself that the errors I get when I run Angie's tests is do to different instantiations of the same module getting different bit reductions for wires and registers within the different instances. I realized this might be a problem earlier, but I kind of forgot about it. |
@shunshou @azidar Adding NoDedup per chisel3 chiselTests/AnnotationNoDedup.scala I was able to run two of Angie's tests
I'll try more soon. The changes for this are beginning to sprawl. Needs some hours of cleanup to tie everything up. |
Ok let me know when you have a stable version and I’ll push it through
later tonight for me.
…On Thursday, November 16, 2017, Chick Markley ***@***.***> wrote:
@shunshou <https://github.com/shunshou> @azidar
<https://github.com/azidar> Adding NoDedup per chisel3
chiselTests/AnnotationNoDedup.scala I was able to run one of Angie's tests
test-only dsptools.toys.SystolicDCTMatMulSpec -- -z UNFILTERED
I'll try more soon. The changes for this are beginning to sprawl. Needs
some hours of cleanup to tie everything up.
Sadly, I think a lot of my earlier code was mostly working but also
getting bit by the lack of NoDedup.
We will need to run this new version through synthesis to see if this
current bit-reduction technique is actually saving us anything.
Besides going through more examples what's the next best thing to work on.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#114 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGTTFn7jaTIhZol89rDfL4uFKvmSbDSYks5s28HNgaJpZM4QGsU9>
.
|
I pushed a version that the aforementioned tests run on. The DCT tests still fail but it's because of some verilog error.
That line of the verilog file is
That's probably all I am going to get done tonight, I'll get back on that horse in the morning |
@shunshou
Heading off-line. |
Ok I will report back by the time you wake up hopefully.
…On Thursday, November 16, 2017, Chick Markley ***@***.***> wrote:
@shunshou <https://github.com/shunshou>
My foot got caught in the stirrup attempting to dismount. Looks like the
verilog error occurred when I reduced a signal down to 1 bit. I did a quick
hack to not reduce anything to below 4 bits and the following tests ran
sbt 'test-only dsptools.toys.DCTMatMulSpec -- -z "RANDOM FILTERED"' > DCT.RF.fudge0
sbt 'test-only dsptools.toys.DCTMatMulSpec -- -z "UNFILTERED"' > DCT.UNF.fudge0
Heading off-line.
If you want to add any more tests that use submodules you have to look at
test/scala/toys/SystolicMatMul.scala to see how I prevented
de-duplication.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#114 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGTTFml6xOYfGeRtLmWi05FDVaatHrTqks5s2813gaJpZM4QGsU9>
.
|
@chick looks like now the non systolic tests fail?
…On Thu, Nov 16, 2017 at 3:43 PM, Angie ***@***.***> wrote:
Ok I will report back by the time you wake up hopefully.
On Thursday, November 16, 2017, Chick Markley ***@***.***>
wrote:
> @shunshou <https://github.com/shunshou>
> My foot got caught in the stirrup attempting to dismount. Looks like the
> verilog error occurred when I reduced a signal down to 1 bit. I did a quick
> hack to not reduce anything to below 4 bits and the following tests ran
>
> sbt 'test-only dsptools.toys.DCTMatMulSpec -- -z "RANDOM FILTERED"' > DCT.RF.fudge0
> sbt 'test-only dsptools.toys.DCTMatMulSpec -- -z "UNFILTERED"' > DCT.UNF.fudge0
>
> Heading off-line.
> If you want to add any more tests that use submodules you have to look at
> test/scala/toys/SystolicMatMul.scala to see how I prevented
> de-duplication.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <#114 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AGTTFml6xOYfGeRtLmWi05FDVaatHrTqks5s2813gaJpZM4QGsU9>
> .
>
|
Oops looks like I had the wrong setting... Sorry!
…On Fri, Nov 17, 2017 at 4:29 AM, Angie ***@***.***> wrote:
@chick looks like now the non systolic tests fail?
On Thu, Nov 16, 2017 at 3:43 PM, Angie ***@***.***> wrote:
> Ok I will report back by the time you wake up hopefully.
>
> On Thursday, November 16, 2017, Chick Markley ***@***.***>
> wrote:
>
>> @shunshou <https://github.com/shunshou>
>> My foot got caught in the stirrup attempting to dismount. Looks like the
>> verilog error occurred when I reduced a signal down to 1 bit. I did a quick
>> hack to not reduce anything to below 4 bits and the following tests ran
>>
>> sbt 'test-only dsptools.toys.DCTMatMulSpec -- -z "RANDOM FILTERED"' > DCT.RF.fudge0
>> sbt 'test-only dsptools.toys.DCTMatMulSpec -- -z "UNFILTERED"' > DCT.UNF.fudge0
>>
>> Heading off-line.
>> If you want to add any more tests that use submodules you have to look
>> at test/scala/toys/SystolicMatMul.scala to see how I prevented
>> de-duplication.
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <#114 (comment)>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AGTTFml6xOYfGeRtLmWi05FDVaatHrTqks5s2813gaJpZM4QGsU9>
>> .
>>
>
|
It monitors and bins signal values Update node widths with Firrtl Transform Working transform that will adjust widths of registers, ports and wires through the annotation system. Second piece of the augmented tool chain that will ultimately take advantage of firrt interpreters instrumentation output. Adjusting widths according to data gathered thereby. Part of Issue #114 Update node widths with Firrtl Transform Working transform that will adjust widths of registers, ports and wires through the annotation system. Second piece of the augmented tool chain that will ultimately take advantage of firrt interpreters instrumentation output. Adjusting widths according to data gathered thereby. Part of Issue #114 A little bit of cleanup Working on bit reduction calculator BitReducer does simple min/max to determine new bit size Can be made more advanced. Added writer to write bit changing annotations to a file With this we now have most of the machinery for Issue #114 remove tools.jar dependency fix context vs. not inconsistencies in type classes. add support for conversion to interval. add preliminary support for interval type class forgot to fix some context ops in intervals Minor changes fix syntax errors for compilation few minor syntax errors in intervaltype. add clip op to type classes. change 1 << n to BigInt(1) << n fixedprecisionchangerspec checks out migrate dsp tools/interval tests from fft project symlink local tests dsptools.math was conflicting with scala.math macros need to be enabled for chiselName. also, another scala.math fix scala strassen winograd matrix multiplication remove commented out typeclass stuff (to be revisited) lots of prep work to get matrix ops working. still debugging... minor update save minor changes for debug save temporary progress. too many wires... pipelining works -- hardcoded This creates a toolchain that unites the machinery for Issue #114 See InstrumentingSpec:"run with with bits reduced" for example of use New executeWithBitReduction acts like ordinary dsptools.Driver.execute Adds in the following - Run with interpreter bit instrumentation - Analyzes report, creating annotations to reduce size - Runs transform to reduce bits in low Firrtl - Re-runs the tests re-instrumenting Produces files - <dut-name>.signal-bitsizes.csv - <dut-name>.signal-bitsizes-2.csv - <dut-name>.bit-reduced.fir debugged matrix lit not working -- breeze orders things differently Some fixes for Issue #114 - Fixed changes to sizes in sub-module being lost - Added warnings if annotation to change wire not used or used more than once - Added a executeWithBitReduction to IAArithSpec, it works - Fix spelling in createAnnotationIfAppropritate adam fixed firrtl bug -- interval 4x4 and 8x8 work now minor changes. trying to debug dspreal fixed dspreal bug with matrix ops. dspreal direction wasn't getting propagated properly to TestModule, and poking a non-input didn't result in compile-time failure (only observed in sims) separate out matmul tests changes for dct Added bit reduction by standard deviation. Now you can specify a multiplier of the standard deviation. The bit reduction will apply to the min and the max being determined by the mean ± (multiplier * σ) if these numbers exceed the min and max seen, then the more restrictive limit will apply. More testing required Undo accidental changes added systolic array matmul. dct constraint bug... updates for getting random inputs working; bit reduction errors out In process refactoring of change widths. In process refactoring of change widths. filtering should work now. waiting on bit reduction Added ToWorkingIr and InferWidths to ChangeWidthTransform Changed to not change width's of IO's, could not get verilator to run when I did that. Seems to fix Angie's problems ChangeWidthTransform broke because DefInstance became WDefInstance because of ToWorkingIR Fix Bit operation errors Add in a few more passes to try and fix up Bit prim ops whose args are out of syncs with bit reduced signals A bit of style and dead code cleanup for previous commit more benchmarks bump numtests change tolerance for 8-bit changed bitwidths. interpreter stuff not saving. start working on fir filter example. tested real/fixed conversions. fir working; needs cleanup working fir example start working on clicking when limiting to n*sigma Added some BitWidth convenience tools Added HTML BitWitdh report to BitReducer Changed default binning to 16 Added UNFILTERED to two test names to make it easier to run just that one. minor code cleanup for fir demo prep for interpreter tests New strategy for reducing wires, does not actually reduce them but creates shadow reduced wire with name <old-name>__reduced and assigns to original wire with sign or zero extension <old-name>__msb joined by recursive cat operators. Added NODedupAnnotator to SystolicMatMul, this fixes blow ups due to a given wire getting two different bit reductions for different instances it appears in Right now I am not reducing IO's in submodules, doing so breaks IO naming because of the __reduced. Added UNFILTERED to two test names so I could select them easier in sbt Added some incomplete tests of things as I was tracking down bugs Don't reduce anything to less than 4 bits. Hack fix for now for verilog complaining about something that was reduced to 1 bit made some minor changes; looks like bit reduction still doesn't work. see help files in top-level dir minor changes use different multiply alg in matrixops Auto pad mixed radix representation
This project aims to apply information found using the firrtl interpreter's run-time instrumentation of values passing through a node.
STILL TO DO:
The text was updated successfully, but these errors were encountered: