-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.html
1355 lines (896 loc) · 95 KB
/
README.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<h1>tri2b and quad4me: clockless arbitrated bit-level serial protocols</h1>
<p><strong>tri2b</strong> and <strong>quad4me</strong> are two clockless, bit-level serial communications protocols supporting arbitration between simultaneous data senders, without requiring any timing or CPU performance guarantees from their software implementations or hardware platforms</p>
<h2>Contents <a name="contents"></a></h2>
<ul>
<li><a href="#no_warranty">No Warranty</a></li>
<li><a href="#description">Description</a>
<ul>
<li><a href="#features">Features</a></li>
<li><a href="#drawbacks">Drawbacks</a></li>
<li><a href="#hardware_requirements">Hardware requirements</a></li>
</ul>
</li>
<li><a href="#motivation_and_background">Motivation and Background</a>
<ul>
<li><a href="#why_not_bit_bang_i2c">Why not bit-bang I2C?</a></li>
<li><a href="#what_about_CAN_bus">What about CAN bus?</a></li>
<li><a href="#why_the_silly_names">Why the silly names?</a></li>
</ul>
</li>
<li><a href="#the_protocols">The Protocols</a>
<ul>
<li><a href="#two_level_state_machine">Two-level state machine</a>
<ul>
<li><a href="#states">States</a></li>
<li><a href="#phases">Phases</a></li>
</ul>
</li>
<li><a href="#tri2b_protocol">tri2b protocol</a></li>
<li><a href="#quad4me_protocol">quad4me protocol</a></li>
<li><a href="#edge_vs_level_based">Edge- vs level-based</a></li>
<li><a href="#arbitration_phase">Arbitration phase</a></li>
<li><a href="#almost_timing_free">(Almost) Timing-free</a>
<ul>
<li><a href="#one_line_held_low_at_idle">One line held low at idle</a></li>
<li><a href="#rise_time">Rise time</a></li>
</ul>
</li>
<li><a href="#the_failed_promise_of_hardware_swapover">The failed promise of hardware swapover</a></li>
<li><a href="#to_interrupt_or_not_to_interrupt">To interrupt or not to interrupt</a></li>
<li><a href="#tri2b_or_quad4me_which_one">tri2b or quad4me – which one?</a></li>
</ul>
</li>
<li><a href="#the_example_implementations_and_testbed">The Example Implementations and Testbed</a>
<ul>
<li><a href="#using_porting_the_code">Using/porting the code</a></li>
<li><a href="#the_code_itself">The code itself</a>
<ul>
<li><a href="#c_coders">C coders</a></li>
<li><a href="#c_plus_plus_coders">C++ coders</a></li>
<li><a href="#assembly_coders">Assembly coders</a></li>
<li><a href="#python_coders">Python coders</a></li>
<li><a href="#other_coders">Other coders</a></li>
</ul>
</li>
<li><a href="#the_example_build_system">The example build system</a></li>
<li><a href="#prerequisites_dependencies">Prerequisites/dependencies</a></li>
<li><a href="#repository_directories_and_files">Repository directories and files</a>
<ul>
<li><a href="#derived_class_methods">Base class methods implemented in derived classes</a></li>
</ul>
</li>
<li><a href="#the_testbed">The testbed</a></li>
<li><a href="#build_variants">Build variants</a>
<ul>
<li><a href="#protocol_variants">Protocol variants</a></li>
<li><a href="#testbed_variants">Testbed variants</a></li>
<li><a href="#debug_variants">Debug variants</a></li>
</ul>
</li>
<li><a href="#gdb_test_environment">GDB test environment</a></li>
<li><a href="#enhancements_improvements">Example implementation enhancements/improvements</a></li>
</ul>
</li>
<li><a href="#RFIIE">Requests For Improvements, Information, and Enhancements (RFIIE)</a>
<ul>
<li><a href="#RFIIE_generic_request">Generic Request</a></li>
<li><a href="#RFIIE_non_request"><strong>Non-</strong> Request</a>
<ul>
<li><a href="#RFIIE_code_formatting">Code Formatting</a></li>
</ul>
</li>
<li><a href="#specific_requests">Specific Requests</a></li>
</ul>
</li>
</ul>
<h2>No Warranty <a name="no_warranty"></a></h2>
<p>This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.</p>
<p>This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.</p>
<p>You should have received a copy of the <a href="LICENSE.txt">GNU General Public License</a>( along with this program. If not, see <a href="https://www.gnu.org/licenses/gpl.html">https://www.gnu.org/licenses/gpl.html</a></p>
<h2>Description <a name="description"></a></h2>
<h4>Features <a name="features"></a></h4>
<p><strong>tri2b</strong> and <strong>quad4me</strong> support the following capabilities:</p>
<ul>
<li><p>multiple <em>nodes</em> (see <a href="#nodes_vs_masters">below</a>) on a single, multi-wire serial bus</p></li>
<li><p>all nodes receive all messages sent by any (all) other nodes</p></li>
<li><p>at bus idle, any number of nodes can simultaneously attempt to send a message</p></li>
<li><p>arbitration to determine which node’s simultaneously-sent message acquires the bus (to the exclusion of all others) controlled by sending node’s priority</p></li>
<li><p>optional: node priorities can be dynamic, changing over time via algorithm (e.g. least-recently-used)</p></li>
<li><p>the protocols are clockless, no pre-determined bit-clock rate</p></li>
<li><p>throughput is determined by node CPU and I/O speed, limited by open-drain hardware line rise time (see <a href="#rise_time">Rise time</a>)</p></li>
<li><p>there are (almost) no requirements on CPU timing/speed – all nodes wait for all others to respond to protocol state transitions (see <a href="#almost_timing_free">(Almost) timing-free</a>)</p></li>
</ul>
<h4>Drawbacks <a name="drawbacks"></a></h4>
<p><strong>tri2b</strong> and <strong>quad4me</strong> have several significant drawbacks compared to other serial communications protocols:</p>
<ul>
<li><p>“bit-banged” in software; no hardware support (see <a href="#RFIIE_hardware_support">RFIIE: Hardware Support</a> and <a href="#the_failed_promise_of_hardware_swapover">hardware swapover</a>).</p></li>
<li><p>require more hardware lines (3 for tri2c, 4 for quad4me) than other protocols (e.g. I2C). See <a href="#RFIIE_fewer_lines">RFIIE: Fewer lines</a></p></li>
<li><p>one of the open-drain hardware lines must be held low when bus is idle (no communications taking place). See <a href="#RFIIE_no_line_low_at_idle">RFIEE: No line low at idle</a></p></li>
<li><p>“feature” <a name="any_node_hangs"></a> of no timing requirements is also a drawback: any node can “hang” the bus/protocol indefinitely by not/slowly responding. Like I2C “clock stretching” but worse – the protocols are “stretched” by default.</p></li>
<li><p>tri2b requires reliable detection of rising edges on the hardware lines. See <a href="#tri2b_or_quad4me_which_one">tri2b or quad4me – which one?</a></p></li>
<li><p>did I mention the need for three (or four lines) instead of two? ;)</p></li>
</ul>
<h4>Hardware requirements <a name="hardware_requirements"></a></h4>
<ul>
<li><p>Three (tri2b) or four (quad4me) open-drain GPIO ports</p></li>
<li><p>GPIO ports must support simultaneous writing to – setting output low or high (high-Z) – and reading the input state of attached line independently of the output/write setting.</p></li>
<li><p>tri2b: <a name="edge_detection_requirement"></a> Two of the GPIO ports must support hardware detection of falling edges on their communication lines. See <a href="#edge_vs_level_based">Edge- vs level-based</a>, below.</p></li>
<li><p>Both tri2b and quad4me (if interrupt-driven, see <a href="#to_interrupt_or_not_to_interrupt">To interrupt or not to interrupt</a>, below): Support dynamic enabling/disabling interrupts on falling edges, independent and non-conflicting with the rising edge detection requirement of tri2b.</p></li>
</ul>
<h2>Motivation and Background <a name="motivation_and_background"></a></h2>
<p>(Material in this section is not necessary to understand the protocols or their implementations – skip ahead to <a href="#the_protocols">The Protocols</a>. On the other hand, a quick skimming over wouldn’t hurt.)</p>
<p>What? You’re still here. OK, you asked for it …</p>
<p>When I began designing the hardware project for which tri2b and quad4me were eventually developed, I quickly realized it required the simultaneous sending, arbitration, dynamic prioritization, and all-nodes-receive-all-messages capabilities which they now provide.</p>
<p>Those needs were all driven by the system’s requirement for low latency <a name="latency"></a> above all else – even data throughput, although throughput obviously figures into the total start-to-finish latency.</p>
<p>Being somewhat new to hardware development, I knew little about I2C and SPI other than their existence. My thought at the time was, “My requirements must be fairly standard. This has to be a solved problem in the industry. I’ll just use whatever off-the-shelf technology fits best. That part of the project, at least, will be easy.”</p>
<p>How wrong I was.</p>
<p>I quickly rejected SPI due to its requirement for individual select lines, one for each node, in addition to its clock and data lines. But reading about I2C I immediately came across, and was encouraged by:</p>
<blockquote><p><em>NXP Semiconductors UM10204 I2C-bus specification and user manual Rev. 6, 4 April 2014</em> (<a href="#i2c_standard">1</a>)</p>
<p>2. I2C-bus features <em>(page 3 of UM10204)</em></p>
<ul>
<li>It is a true multi-master bus including collision detection and arbitration to prevent data corruption if two or more masters simultaneously initiate data transfer.</li>
</ul>
</blockquote>
<p>and:</p>
<blockquote><p>3.1.8 Arbitration <em>(page 11 of UM10204)</em></p>
<p>Arbitration, like synchronization, refers to a portion of the protocol required only if more than one master is used in the system. Slaves are not involved in the arbitration procedure. A master may start a transfer only if the bus is free. Two masters may generate a START condition within the minimum hold time (t HD;STA ) of the START condition which results in a valid START condition on the bus. Arbitration is then required to determine which master will complete its transmission.</p>
<p>Arbitration proceeds bit by bit. During every bit, while SCL is HIGH, each master checks to see if the SDA level matches what it has sent. This process may take many bits. Two masters can actually complete an entire transaction without error, as long as the transmissions are identical. The first time a master tries to send a HIGH, but detects that the SDA level is LOW, the master knows that it has lost the arbitration and turns off its SDA output driver. The other master goes on to complete its transaction.</p>
<p>No information is lost during the arbitration process. A master that loses the arbitration can generate clock pulses until the end of the byte in which it loses the arbitration and must restart its transaction when the bus is free. If a master also incorporates a slave function and it loses arbitration during the addressing stage, it is possible that the winning master is trying to address it. The losing master must therefore switch over immediately to its slave mode.</p></blockquote>
<p><a name="i2c_standard"></a> <em>(1) It is my understanding that Philips developed I2C, and NXP acquired Philips, so I treat this document as somewhat of an official standard. Please correct me if I’m wrong.</em></p>
<p>Sounded great. Just what I needed.</p>
<p>Unfortunately …</p>
<p>After attempting to implement this on several MCUs for more weeks than I care to admit (let’s not say “months”, okay? no mention of “months”) I was never able to get “I2C multi-master” mode to work reliably. I won’t name the chips and their manufacturers except to say “NXP LPC812”, “NXP LPC824”, “STM32L031F4P6”, and “STMF103xx”. See <a href="#RFIIE_low_end_arm_mcu_with_working_multi_master_i2c_peripheral_and_library">RFIIE: Low-end ARM MCU with working multi-master I2C peripheral and library</a></p>
<p>To be fair, there were some warning signs. For example, <em>ST RM0008 Reference manual, August 2017</em>, despite claiming on page 752, <em>26.2 I2C main features</em>:</p>
<blockquote><ul>
<li>Multimaster capability: the same interface can act as Master or Slave</li>
</ul>
</blockquote>
<p>has on page 766, <em>26.3.4 Error conditions, Arbitration lost (ARLO):</em></p>
<blockquote><ul>
<li>the I2C Interface goes automatically back to slave mode (the MSL bit is cleared). When the I2C loses the arbitration, it is not able to acknowledge its slave address in the same transfer, but it can acknowledge it after a repeated Start from the winning master.</li>
</ul>
</blockquote>
<p>I’m still willing to believe my failure to get this working was due to my inability to decipher the (miserable excuses that pass for) documentation and/or reverse-engineer the chips. But after weeks of effort (not “months”, nobody said anything about “months”) I broke down and asked two friends, who between them have almost 60 years of embedded programming experience and are among the smartest people I’ve ever met. Both said essentially the same thing, which was: “I’ve never heard of anyone using I2C multi-master mode. Given all the bugs that are in these kinds of chips, I’m not surprised that it doesn’t work.”</p>
<p>Thanks a lot. Could have saved me, uhh, “weeks”, of beating my head against the wall/chips. Again in the interest of full disclosure, one of the friends, who, when I started the project and told him, “I’m not afraid of embedded development: I’ve had to code against some of the worst, buggy, un- and mis-documented software APIs ever written,” replied back to me, “The embedded world is worse!”</p>
<p>I didn’t believe him at the time.</p>
<p>I was wrong.<br/>
<br></p>
<p><a name="required_i2c_features"></a>
Also admittedly, my requirements push I2C multi-master to the limits. The “all nodes receive all messages” and “arbitration is based on node priority” means that either:</p>
<ol>
<li><p>All nodes send to, and receive on, the I2C “general call” address, with my scheme’s priority in the next, post-address, data byte. This means the arbitration loss and switch-over to receiver (“slave”) mode has to take place not on the address byte but on a later one, which is probably even farther down the rabbit hole of I2C edge-case features. Or …</p></li>
<li><p>The chip/peripheral needs to support multiple receive addresses. This way the I2C address could be used as my scheme’s priority. Some chips implement this, but in limited ways that make them unsuited for my purposes. For example, STMF103xx supports two addresses – not enough. NXP’s LPC824 supports four addresses, again not enough, but one of them can be generalized with a mask of “don’t care” bits. (Other chips have this and/or a min-max range of addresses.) That would work, but at least the LPC824 (and others I’ve looked at) only indicate <em>that</em> one of the masked/range of addresses matched and data is being received – not <em>which</em> address, so again are useless for my needs.</p></li>
</ol>
<p>Finally … why not use I2C the way it was intended to be used? One master, multiple slaves, master polls slaves (setting the “read” bit in the address byte) and slaves respond by sending data using the “slave send to master” protocol. One word: <a href="#latency">Latency</a> (see above). I have lots of nodes and can’t afford the latency of round-robin polling each in sequence just to get data from one.</p>
<h4>Why not bit-bang I2C? <a name="why_not_bit_bang_i2c"></a></h4>
<p>So … if I2C multi-master is great (design) but fails (implementation), and I’m going to have to write software bit-banging code anyway, why not do it for the well-known, tested, possibly working in some hardware (for compatibility) I2C protocol instead of “rolling” my own?</p>
<p>Good question. Here’s the hopefully good answer …</p>
<p>Again looking at <em>NXP Semiconductors UM10204 I2C-bus specification and user manual Rev. 6, 4 April 2014</em>, this time <em>page 11, Section 3.1.7 Clock synchronization, Fig 7. Clock synchronization during the arbitration procedure</em>.</p>
<p>Without endlessly quoting from the document (read it – it’s good stuff), clock synchronization is the first part of the arbitration process. When one or more “masters” lower the SCL line to initiate a transaction, all other masters which wish to compete in arbitration detect this and lower their outputs to the SCL line to match.</p>
<p>When each master’s SCL LOW time period expires it raises its SCL output. Due to the “wired-AND” nature of the open-drain line, the line doesn’t go high until all have raised their outputs. When each master sees the line go high, it begins timing its SCL HIGH period, and, again, when this expires it lowers its output again.</p>
<p>In this way, the LOW period lasts from the first master to go low until the last to go high, and the HIGH period from the last to go high until the first to go low. This achieves the clock synchronization, and logical arbitration based on address bit values is layered on top of this lower-level clock timing.</p>
<p>This works well in hardware, both by virtue of logic speeds (much faster than software) and because the MCU’s I2C peripheral subsystem is doing nothing except watching for and executing the protocol.</p>
<p>But consider trying to emulate it in bit-banging software. One or more nodes may be “busy” when the SCL line first goes low – either due to executing some other code in a polling loop, or interrupt latency (raw latency, because another interrupt is executing at higher priority, etc). That/those node(s) might respond to the the SCL falling edge late, <em>after</em> other, non-slow masters allow the SCL line to go high. The slow nodes could then lower their SCL
outputs on the next (or later) clock cycle without knowing they’re late, and put their data bits on the I2C SDA line at the wrong time.</p>
<p>This is the “timing/performance requirement” that tri2b and quad4me were designed to eliminate. Nodes (“masters”) can respond to protocol changes on the hardware lines as quickly or slowly as they choose, and the protocols remain in synchronization. See <a href="#the_protocols">The Protocols</a>.</p>
<h4>What about CAN bus? <a name="what_about_CAN_bus"></a></h4>
<p>Basically another good question. This section is an attempt to answer it proactively.</p>
<p>CAN bus – in principle – would solve all my problems. In practice there are reasons why it doesn’t. In roughly more-important-to-less order:</p>
<ol>
<li><p>CAN bus is very poorly supported, especially on low-end, low-pin-count chips. The last time I checked, DigiKey listed only one TSSOP-20 chip with CAN. I’ve heard this may be due to restrictive licensing issues (hooray for open source, hint, hint).</p></li>
<li><p>CAN bus is electrical overkill for my application. I need to place approximately 10 to 20 nodes on a 12-to-24 inch long bus. See <a href="#RFIIE_gpio_drive_capability">RFIIE: GPIO drive capability</a>. CAN bus, with its twisted pair lines and balanced drivers is designed for tens of nodes over tens of meters distance.</p></li>
<li><p>CAN bus (nominally) requires external balanced-line driver chips. I’m aware of the driver-less diode “hack” but don’t know how well it works.</p></li>
<li><p>CAN bus' software protocol is overkill for my needs (error correction, “mailboxes”, etc).</p></li>
<li><p>CAN bus comes in “variants”, including STM’s “bxCAN”. Given my experiences with I2C multi-master implementation failures, this fact does not fill me with confidence regarding the inter-operability of the implementations.</p></li>
</ol>
<p>But again, I’m open to suggestions. See (<a href="#RFIIE_low_end_arm_mcu_with_working_inter_operable_can_bus_and_library">RFIIE: Low-End ARM MCU with working inter-operable CANbus (and library)</a>).</p>
<p>Also any other existing protocols/algorithms. See <a href="#RFIIE_alternative_existing_protocol_algorithm">RFIIE: Alternative, existing protocol/algorithm</a>.</p>
<p>Finally, if this was Stack Overflow, the <em>second</em> answer to everything I’ve written in this section would be, “Why doan u use CANbus, u mow-rhan? Ain’t I smart 2 be the 1st wun to post dis?” (The first would be a response to the <a href="#features">“Features”</a> section, above – the canonical “Why would you want to do that?”) C'mon, GitHub. Get with the program!</p>
<h4>Why the silly names? <a name="why_the_silly_names"></a></h4>
<p>The original name for the three-line protocol was “tri2c”. Stupid pun off some other serial protocol I’d heard of. But, you know … lawyers and all that. So I came up with “tri2b”, as in it “tries to be” a workable protocol.</p>
<p>Due to some development snafus that I don’t care to describe (eventually traced to a certain GCC-ARM compiler optimizing out calls to inline functions despite those functions accessing declared-volatile registers) (days, not weeks – nobody said anything about “weeks”) I was for a time convinced that the edge based approach (see <a href="#edge_vs_level_based">Edge- vs level-based</a>) of tri2b was flawed by design. In desperation I switched development to a new four-line, level-based protocol and named it “quad4me” because it was the fallback solution “for me”.</p>
<p>I have long thought that when it comes to software libraries, the “cuter” the name, the lower the quality. Your mileage may vary.</p>
<p>Or not.</p>
<h2>The Protocols <a name="the_protocols"></a></h2>
<p>Both protocols, tri2b and quad4me, are software state machines sequenced by multiple open-drain hardware lines which, in combination, comprise a serial communication bus.</p>
<p>Multiple <em>nodes</em> (<a href="#nodes_vs_masters">1</a>) – running a protocol state machine on an MCU chip – connect to the bus. Each node connects to each of the bus lines via open-drain GPIO ports.</p>
<p><em>Messages</em>, consisting of a fixed-length number of <em>arbitration</em> bits, a fixed-length number of <em>metadata</em> bits, and a variable-length number of <em>data</em> bits are sent from any node to all other nodes simultaneously. If two nodes attempt to send at the same time, the arbitration bits control which one gains control of the bus.</p>
<p>The bus has one data line and two or more “handshake” lines, at least one of which is held low by all of the nodes at each state machine state. State machine transitions occur when <strong>all</strong> of the nodes raise their outputs to the line high (high impedance). This is the “wired-AND” logic of open-drain lines.</p>
<p>That’s it. The whole thing. Implementation is left as an exercise for the reader. Should only take a few hours.</p>
<p>What? You’re still here? Alright … I’ll provide some more details …</p>
<p><a name="nodes_vs_masters"></a> <em>(1) “nodes”==“masters” in I2C/SPI nomenclature, but given that tri2b and quad4me have no “slaves” – all nodes are equal/hierarchy-less – and because that whole “master/slave” thing is so pre-13th Amendment – I use the term “node” instead.</em></p>
<p><a name="two_level_state_machine"></a></p>
<h3>Two-level state machine</h3>
<p>(And I thought I could get away without documenting all this. Oh, well.)</p>
<p>The state machines actually consist of two levels:</p>
<ol>
<li><p><a name="states"></a> A lower-level set of states, per-line-transition, called simply “States”. The states are READ, WRIT (“write”), and in the case of quad4me, NEXT. Each bit of a message requires a transition through each of these states in sequence.</p></li>
<li><p><a name="phases"></a> A higher-level set of states called “Phases”. Multiple bits, each communicated via the READ/WRITE/(NEXT) States, make up the phases: IDLE, ARBT (arbitration), META (metadata), and DATA (data). The IDLE phase has zero bits, ARBT and META a fixed number, and DATA a variable number (possibly zero) specified by the META bits, (see <a href="#meta2bits">meta2bits()</a>, below)).</p></li>
</ol>
<h3>tri2b protocol <a name="tri2b_protocol"></a></h3>
<p>tri2b requires one DATA line, and two handshake lines which I’ve labeled more-or-less arbitrarily ALRT (“alert”) and LTCH (“latch”).</p>
<p>The electrical/logical high and low states of the lines, which in turn drive the <a href="#states">State</a> transitions, look like this:</p>
<p><a name="tri2b_timing_diagram"></a></p>
<pre><code> DATA :X::?::X::?::X::?::X::?::X::?::X::?::X::?::X::?:::::
W W W W W W W W W
ALRT ~~\___/~\___/~\___/~\___/~\___/~\___/~\___/~\___/~~~
R R R R R R R R
LTCH ___/~\___/~\___/~\___/~\___/~\___/~\___/~\___/~\____
Legend:
: data line, either high or low
X data line, data bit written: high->high, high->low, low->high, or low->low
? data line, data bit read
~ handshake line high
_ handshake line low
/ handshake line, low->high
\ handshake line, high->low
R READ State
W WRIT State
</code></pre>
<p>A message starts when one or more nodes which have data to send place the first of their arbitration (priority) bits on their DATA ports, then lower their ALRT ports, and finally raise their LTCH ports. When LTCH goes high (wired-AND), all (sending) nodes have placed their data and the DATA line is ready to be read.</p>
<p>When all other nodes detect a falling edge on their ALRT ports, they likewise set their DATA output. If they have data to send and have detected the falling edge (or have been interrupt-triggered by it, see <a href="#to_interrupt_or_not_to_interrupt">To interrupt or not to interrupt</a>, below) before they have initiated the message sequence themselves, they place their first arbitration bit. If they don’t have data to send, they raise their DATA output high (see <a href="#non_competing_nodes">“non-competing nodes”</a>, below). They then lower their ALRT output, and raise their LTCH output.</p>
<p><strong>This is the “clockless” / “no timing requirements” basis of the protocols</strong>. No state transition into READ state can take place until <strong>all</strong> nodes have raised their LTCH ports due to the open-drain, wired-AND nature of the lines.</p>
<p>Every node then reads the input bit on its DATA port, lowers its LTCH port, and finally raises its ALRT port. The ALRT line going high is a signal that all nodes have read the data line, and all can transition to the next WRIT State.</p>
<p>This per-bit sequence continues through the <a href="#arbitration_phase">arbitration</a> phase and similarly for the metadata and data phase. The only difference is that during the metadata and data phases only one node (the arbitration winner) is placing data bits on the data line – all others leave their DATA ports high so as not to interfere. (See <a href="#the_failed_promise_of_hardware_swapover">The failed promise of hardware swapover</a>, below.)</p>
<p>Note that the exact order of the above sequences of events – the raises and lowers of the handshake lines – is absolutely critical. See <a href="#almost_timing_free">(Almost) Timing-free</a>, below.</p>
<h3>quad4me protocol <a name="quad4me_protocol"></a></h3>
<pre><code> DATA :X::?:::X:?:::X:?:::X:?:::X:?:::X:?:::X:?:::X:?:::::
N N N N N N N N
ALRT ~~\__/~~\__/~~\__/~~\__/~~\__/~~\__/~~\__/~~\__/~~~~
R R R R R R R R
LTCH ___/~~\__/~~\__/~~\__/~~\__/~~\__/~~\__/~~\__/~~\___
W W W W W W W W W
CYCL ~~~~\__/~~~\_/~~~\_/~~~\_/~~~\_/~~~\_/~~~\_/~~~\_/~~
Legend:
: data line, either high or low
X data line, data bit written: high->high, high->low, low->high, low->low
? data line, data bit read
~ handshake line high
_ handshake line low
/ handshake line, low->high
\ handshake line, high->low
R READ State
W WRIT State
N NEXT State
</code></pre>
<p>The quad4me protocol is very similar to tri2b with the exception of an additional NEXT State after READ and before WRIT. This extra State is required due to the following …</p>
<h3>Edge- vs level-based <a name="edge_vs_level_based"></a></h3>
<p>Because State transitions in tri2b are triggered by edges (rising, except for the initial message start falling edge), the handshake lines can be lowered at any time after the rising edge takes place. This freedom does not exist in a level-based protocol such as quad4me.</p>
<p>Consider the LTCH line rising edges in the tri2b <a href="#tri2b_timing_diagram">timing_diagram</a>, above. If this were a level-based protocol, the LTCH line would need to stay high until the ALRT line went high signaling all nodes have read the DATA line and all can transition to WRIT State.</p>
<p>The LTCH line needs to stay high because if not, one or more “fast” nodes could read the data line and lower their LTCH ports (the first one to do so would lower the line due to open-drain wired-AND electrical physics) and one or more “slow” nodes that hadn’t seen LTCH high yet would miss their READ states.</p>
<p>So … why not wait until after ALRT goes high to lower LTCH? That would leave the door open to a different race condition: One or more fast nodes could see the ALRT line high, place their next data bit, lower their ALRT and raise their LTCH ports before one or more of the slow nodes had lowered their LTCH ports. This would cause the LTCH line to go back high signaling that the new data bit was ready to be read when in fact the slow nodes had not yet put their next bits on the DATA line.</p>
<p>This is the reason for the four-line (three handshake plus data) requirement of the level-based quad4me protocol (and, conversely, why the edge-based tri2b needs only two handshake lines). It’s also why quad4me is <a name="theoretically_faster"></a> (theoretically – see <a href="#rise_time">Rise Time</a>, below) 1.5 times slower than tri2b: It requires three States per bit instead of two.</p>
<p>I would be extremely interested in a protocol design that avoids this conundrum. See <a href="#RFIIE_three_or_fewer_line_level_based_protocol">RFIIE: A three (or fewer) line level-based protocol</a>, below.</p>
<h3>Arbitration phase <a name="arbitration_phase"></a></h3>
<p>In both tri2b and quad4me, arbitration is handled by the well-known “first zero bit wins” algorithm, which leverages the wired-AND logic of open-drain communication lines.</p>
<p><a name="non_competing_nodes"></a>
In their arbitration phases, nodes running the protocols which have data they wish to send place their arbitration bits, in MSB-to-LSB order, on the DATA line according to the bitwise State protocols described above. Nodes without data to send always place “1” bits and thus do not compete for arbitration. All nodes read the current arbitration bit on their DATA ports at the appropriate State time.</p>
<p>Due to the wired-AND logic, if <strong>any</strong> node places a “0” bit (lowers the data line), the line will go low regardless of if and how many other nodes are placing “1” bits (logic high, i.e. high-impedance on an open-drain port) at that instant.</p>
<p>Each node compares the current DATA line value (zero or one) to its own current arbitration bit, and if its bit is “1” and the line is “0”, drops out of arbitration (loses). For all subsequent bits it places a “1” on the line, as do all non-competing nodes from the beginning MSB.</p>
<p>In this way the node with the lowest arbitration number (lowest number == highest priority) – the one which has never had a “1” overridden by a “0” – wins. (Non-competing nodes will always lose by this logic.)</p>
<p>For example:</p>
<pre><code> node #5 node #2 node #9 node #3
Bit Line Arbt=5 Write Result Arbt=2 Write Result Arbt=9 Write Result Arbt=3 Write Result
3 0 0101 0 pend 0010 0 pend 1001 1 lose 0011 0 pend
2 0 0101 1 lose 0010 0 pend 1001 1 lost 0011 0 pend
1 1 0101 1 lost 0010 1 pend 1001 1 lost 0011 1 pend
0 0 0101 1 lost 0010 0 win 1001 1 lost 0011 1 lose
</code></pre>
<p>Note that this is <strong>not</strong> the same as the logical AND of the arbitration values:</p>
<pre><code>0b0101 & 0b0010 & 0b1001 & 0b0011 = 0b0000
</code></pre>
<p>In which case a non-existent node 0b0000 would “win”.</p>
<p>For an optional enhancement to arbitration process, see <a href="#dynamic_rank">DYNAMIC_RANK</a>, below.</p>
<h3>(Almost) Timing-free <a name=almost_timing_free></a></h3>
<h4>One line held low at idle <a name="one_line_held_low_at_idle"></a></h4>
<p>As shown in the protocol <a href="#tri2b_timing_diagram">timing diagram</a> above, in both tri2b and quad4me all handshake lines must be initialized to known conditions at idle (before a message starts). The ALRT (and CYCL for quad4me) must be high, and the LTCH line low.</p>
<p>This presents two problems, one electrical and one logical. The electrical one is that the low LTCH line consumes power via its pullup resistor (see <a href="#RFIIE_active_pullup">RFIIE: Active pullup</a>, below) whenever the bus is idle, which is likely to be most of the time in a practical application of the protocols.</p>
<p>The logical problem is: How can the line be initialized to its required known-low condition? How can any/all nodes detect that <strong>all</strong> other nodes have lowered their LTCH ports?</p>
<p>The “high” lines are easy – when they are high, by the definition of open-drain line electrical physics, all nodes' ports must be high (high-impedance). (Actually it’s not quite that trivial – an uninitialized node might coincidentally be outputting high without truly being ready to start the protocol.)</p>
<p>But the required-low is much harder, if not impossible. Any single node outputting a low will cause the line to be low. In fact, the protocols leverage this fact in that any one (or more) nodes lowering the ALRT line signals a message start. LTCH has to be low so that when it does go high it’s certain that all the nodes have set it as such.</p>
<p><a href="#the_example_implementations_and_testbed">The Example Implementations and Testbed</a> skirts this problem by simply delaying a known amount of time at system initialization and taking on faith that all nodes have initialized during that period. This is implemented by:</p>
<ol>
<li>Waiting for all communication lines to go high.</li>
<li>Waiting a fixed period of time while checking if LTCH has gone low.</li>
<li>Lowering LTCH. The first node to do so will break all others out of their wait loops even if their time periods have not expired.</li>
<li>Waiting another fixed period of time.</li>
</ol>
<p>I would be extremely interested in a timing-less solution to this problem. See <a href="#RFIIE_no_line_low_at_idle">RFIIE: No line low at idle</a>, below. At one point I had a very complex scheme in which each node enumerated itself on the bus, one at a time, in arbitrary order, until all had done so. In the end the scheme was unreliable because it used (a variation of) the protocol to do the enumeration and a node coming onto the bus at the wrong time, driving the lines incorrectly, could break it.</p>
<p>This problem brings to mind the old question: “Quick, grasshopper. What is the sound of one hand clapping?”</p>
<h4>Rise time <a name="rise_time"></a></h4>
<p>In the glittering pristine crystal palace of logic built by George Boole, the above protocols achieve their goal of timing-free execution. In the dirty real world of noisy electrical voltage levels, not so much.</p>
<p>Consider the timing diagrams <a href="#tri2b_timing_diagram">above</a>. All of them explicitly depend on one State being completely finished before a rising edge (tri2b) or logic high (quad4me) on a line signals a transition to the next State. For example, nodes output data bits on their DATA ports first, and <strong>then</strong> raise their LTCH ports. When the wired-AND LTCH line goes high it’s known for certain that the DATA line is ready.</p>
<p>Maybe yes, maybe no.</p>
<p>Due the rise time of signals (determined by the line’s resistance-capacitance time constant) the transitions, particularly from low-to-high, are not instantaneous. In fact, there is a trade-off between the power wasted by the open-drain pull-up resistors and the speed of the rise time. See <a href="#RFIIE_active_pullup">RFIIE: Active pullup</a>, below.</p>
<p>If all the lines have identical rise times, and if all the GPIO ports read the same value at the same line voltage level and/or register edges at the same place on the rising waveform, having set the DATA line before the LTCH should be sufficient. If not – as is likely – workarounds are needed. See <a href="#data_wait_us">DATA_WAIT_US</a> in the <a href="#build_variants">Build variants</a>, below. This is especially problematic because in the META and DATA phases a node can raise its handshake line output and wait only until it reads that the line has gone high. But in the ARBT phase another node’s bit may be low and the line will never go high during that STATE time period. For this reason, a fixed timout has to be implemented.</p>
<p>Also: The tri2b protocol raises and lowers the handshake lines without intermediate states providing delays as in quad4me. See <a href="#tri2b_timing_diagram">tri2b waveforms</a>, above. This may be problematic for real-world (as compared to theoretical) edge detection. The example implementatation provides <a href="#min_high_us">MIN_HIGH_US</a> to compensate if necessary.</p>
<h4>The failed promise of hardware swapover <a name="the_failed_promise_of_hardware_swapover"></a></h4>
<p>The tri2b and quad4me protocols were developed due to a (my) failure to get hardware-implemented I2C multi-master arbitration working.</p>
<p>But … after tri2b/quad4me arbitration takes place, only a single node is sending, and all others receive. Why not, at that time, switch to a much faster (than bit-banging) hardware-implemented standard protocol like I2C/SPI/USART for the metadata and/or data phases? Moreover, if SPI or USART, the hardware GPIO ports could be switched from open-drain to push-pull for much faster rise times and bitrates, and then switched back for the next tri2b/quad4me arbitration.</p>
<p>I tried implementing this idea. A lot. (Uhh, “weeks”). Unfortunately nothing worked, due to both hardware protocol limitations and/or MCU peripheral implementation problems.</p>
<p>All of the hardware protocols require some kind of flow control, either on their own or via separate, out-of-band signaling, because more than one meta/data byte can be required for each message. USART’s RTS/CTS/etc flow control looked promising (both tri2b and quad4me have an extra communication line beyond data and clock that could be used) but unfortunately, despite the fact that everything else in USART is configurable (clock and data polarity, etc) no MCU peripheral supports RTS/CTS being low==true as opposed to the standard high. And low==true is needed to implement the wired-AND, sender sends only when all receivers are ready, logic.</p>
<p>SPI initially looked even better, both for its faster data rates and (in most implementations) ability to send/receive up to 16 bits at a time. But it needs external flow control for more than 16 bits, and at least on the NXP chips requires a hardware line/port to enable slave reception of data. (Why? With all the configurable bits in the peripheral, why not have one that sets “always enabled”?). The 3- and 4-wire tri2b/quad4me interface doesn’t have a line to spare – two are required for flow control handshaking, plus SPI clock and data, so even 4-wire quad4me falls short. (<strong>Five</strong> lines?? Three or four is bad enough!) (And in my specific system I don’t have a spare pin to tie permanently high.)</p>
<p>Finally, I2C. Ironic given that if I2C multi-master worked tri2b and quad4me wouldn’t be necessary. I2C “swapover” problems include the fact that even though the “clock stretching” logic is low==true and explicitly supports any of the multiple slaves controlling the flow, the ACK bit on the data line is high==true and thus the converse.</p>
<p>And finally #2: Above and beyond all these theoretical/design problems, in the real world I found that the MCUs I tried could not reliably switch their ports back and forth at per-message speeds between standard GPIO functionality (for tri2b/quad4me arbitration) and “special purpose” peripherals (I2C/SPI/USART). Pulse glitches on the lines? I don’t know – I have neither a ‘scope or a logic analyzer. But I would welcome any ideas on these subjects; see <a href="#rfiie_hardware_swapover">RFIIE: Hardware swapover</a>, below.</p>
<h4>To interrupt or not to interrupt <a name="to_interrupt_or_not_to_interrupt"></a></h4>
<p>… that is the question.</p>
<p>I hate polled code. I believe all software should be interrupt-driven (or, similarly, be signal-driven when running under an operating system). Preferably with short interrupt system handlers that enqueue instructions into a FIFO which the code’s main loop subsequently pops off and executes.</p>
<p>So why aren’t tri2b and quad4me coded on these design principles this repository’s <a href="#the_example_implementations_and_testbed">Example Implementations and Testbed</a>.</p>
<p>Basically for performance reasons. Low-end ARM cores are documented to have 12 or more clock cycles of interrupt latency (plus a similar number for return from ISR?). Bit-banging is slow already – these cycles add up. The example implementation attempts to get the number of clock cycles down in the 200 range, although this is largely driven by <a href="#rise_time">rise time</a> issues.</p>
<p>My initial implementations of tri2b/quad4me triggered their ISR multiple times per bit (at each <a href="#states">State</a> transition, above). I quickly found that they ran in an “interrupt storm”, never getting back to the main loop until a full message was complete.</p>
<p>Even using one interrupt per message can slow throughput down compared to a polled approach. (Less so if the protocol’s <a href="#protocol_method">protocol() method</a> is inlined into the applications main loop to avoid function call and return overhead.)</p>
<p>But <a href="#latency">latency</a> was the primary design requirement for tri2b/quad4me. Due to the <a href="#any_node_hangs">“any slow node can hang the protocol”</a> nature of the design, an interrupt-driven approach is necessary to insure all nodes run the protocol as expediently as possible. The only exceptions to this would be if the main loop only executed very brief snippets of code per loop before re-polling <code>protocol()</code>.</p>
<p>The <code>protocol()</code> method in the example implementation can be compiled to execute either <a href="#triquad_bit_by_bit_vs_triquad_whole_message">bit-by-bit or whole-message</a>. The former, in which it returns when and if a new bit is not immediately on the bus (see <a href="#rise_time">rise time</a>) is probably only useful if there is a large – factor of 100 or more – disparity between different hardware nodes' processing power. Otherwise even the fastest nodes would likely have too few cycles available to perform any useful work, and the call/return overhead (unless inlined) would predominate.</p>
<p>There are some subtleties involved if the example implementation is compiled with <a href="#triquad_polling_vs_triquad_interrupts">TRIQUAD_POLLING</a>. The interrupt handler will trigger on incoming messages, and after doing send any pending messages the client application may have registered but have not been sent yet due to arbitration losses. This of course in addition to receiving any arbitration-winning messages.</p>
<p>But the client app needs to send newly register pending messages, either immediately or at some other time of its choosing. Calling the <code>protocol()</code> method directly leaves open possibility getting a falling edge interrupt before the interrupt has been disabled, and recursively entering <code>protocol()</code> again.</p>
<p>A number of such race conditions are possible, even if the call to <code>protocol()</code> is bracketed by disabling and re-enabling the interrupt. All can be avoided by having the client application invoke <code>protocol()</code> by triggering the interrupt via the NVIC ISPR (set pending interrupt) register instead. This is how the example implementation is coded.</p>
<p>And add one more entry to the <a href="#everyone_will_hate">“everyone will hate”</a> list, below: Long interrupt service handlers. Compiling with TRIQUAD_POLLING will build what is probably one of the longest-running ISRs ever written. I’m on the fence on this one. It’s allowed if following the alternate design philosophy of running the entire application in the ISR(s) – the implementation here is close to that. It’s actually more of a hybrid approach where some non-interrupt-driven code remains in the main loop. Opinions, anyone?</p>
<h3>tri2b or quad4me – which one? <a name="tri2b_or_quad4me_which_one"></a></h3>
<p>The question of which protocol to use comes down to their <a href="#hardware_requirements">Hardware requirements</a>. The most significant factor is that tri2b uses one fewer communication line than quad4me (3 vs 4). It is also theoretically faster (see <a href="#theoretically_faster">above</a>).</p>
<p>If your hardware supports the <a href="#edge_detection_requirement">edge detection requirement</a> of tri2b, it is probably the better choice over quad4me. But … the edge detection must be failure-proof; see <a href="#min_high_us">MIN_HIGH_US</a>, below. In the presence of real-world noise, quad4me is likely more reliable.</p>
<p>Additionally, the current example implementation requires edge detection if compiled to be interrupt-driven (see <a href="#triquad_polling_vs_triquad_interrupts">TRIQUAD_POLLING vs TRIQUAD_INTERRUPTS</a>, below), although this may not be strictly necessary (see <a href="#level_based_interrupts">level-based interrupts</a>, below). If edge detection is required for interrupts, tri2b’s requirements will (likely) be already met.</p>
<p><a name="the_example_implementations_and_testbed"></a></p>
<h2>The Example Implementations and Testbed </h2>
<p>This repository contains example implementations of tri2b and quad4me, a functional test program for them, porting layers to two different MCUs, and a primitive build environment, all written in C++.</p>
<h3>Using/porting the code <a name="using_porting_the_code"></a></h3>
<p>This repository contains a large amount of code. See <a href="#repository_directories_and_files">Repository directories and files</a>, below. Fortunately, the codebase can be looked at as a hierarchy of layers, with only a few of them required for integration into an real application.</p>
<p>The actual protocol implementation code is in the <a href="#base_implementations"><code>tri2b</code> and <code>quad4me</code></a> directories.</p>
<p>Code to execute the protocols on particular MCUs is in the <a href="#derived_class_methods"><code>ports</code></a> directory tree. Unless by coincidence you are using one of the chips ported here, similar code will need to be written.</p>
<p>The <code>ports</code> code is configured (peripheral registers, GPIO registers, hardware MCU pins, etc) via <a href="#config_files"><code>*_config.hxx</code></a> files in the <code>lpc824</code> and <code>stm32f103</code> example directories. This separation between ported code and configuration files is an implantation choice.</p>
<p>The ported code has dependencies on a large body of utility code (some of which in turn wraps MCU vendor-supplied header files) which are highly idiosyncratic. Feel free to include them in an application (modulo the GPL license – see <a href="#no_warranty">No Warranty</a>, above) or replace them with supporting code of your own preference.</p>
<p><a href="#the_testbed">The <code>randomtest</code></a> directory contains functional test code for the protocols. It can be used as an example for actual application code.</p>
<p>Finally, the repository contains an again highly idiosyncratic build system – linker scripts, MCU startup code, Makefiles – that may be used, modified, or replaced.</p>
<h3>The code itself <a name="the_code_itself"></a></h3>
<p>Everyone will hate the C++ code in this repository. “Everyone” includes: <a name="everyone_will_hate"></a></p>
<ul>
<li><a href="#c_coders">C coders</a></li>
<li><a href="#c_plus_plus_coders">C++ coders</a></li>
<li><a href="#assembly_coders">Assembly coders</a></li>
<li><a href="#python_coders">Python coders</a></li>
<li><a href="#other_coders">Other coders</a></li>
</ul>
<h5>C coders <a name="c_coders"></a></h5>
<p>C coders will hate the code because … C++</p>
<h5>C++ coders <a name="c_plus_plus_coders"></a></h5>
<p>C++ purists will hate the code because …</p>
<p>Where should I start?</p>
<ol>
<li>“‘init’ methods? INIT METHODS??? That’s not RAII!!!” <a name="cpp_haters_1"></a></li>
<li>“Why don’t you use the singleton pattern?” <a name="cpp_haters_2"></a></li>
<li>“Why isn’t the code templated on word size?” <a name="cpp_haters_3"></a></li>
<li>“Why the crazy duck-typing? Why don’t you use virtual inheritance?” <a name="cpp_haters_4"></a></li>
<li>“Static polymorphism? Why don’t you use the Curiously Recurring Template Pattern?” <a name="cpp_haters_5"></a></li>
<li>“Multi-line macros to define methods? Why aren’t you using templates? <a name="cpp_haters_6"></a></li>
<li>“Why aren’t you using any design patterns?” <a name="cpp_haters_7"></a></li>
<li>“Why aren’t you using the STL? Why aren’t you using the ‘auto’ keyword? Why aren’t you using range-based ‘for’ loops?” <a name="cpp_haters_8"></a></li>
</ol>
<p>OK, that’s enough. I could go on indefinitely.</p>
<p>There are responses to all of these, some of which follow. Not that I expect any of them to lower the level of righteous indignation the code will raise.</p>
<p><a href="#cpp_haters_1">1</a>) Many reasons:</p>
<p>The required ARM pre-main() startup/init code and linker map configuration are complex enough without adding static construction to the mix.</p>
<p>Many of the post-<code>main()</code> <code>init()</code> methods have to be done after application-specific MCU initialization has taken place (peripheral initialization, clock speed, etc.) This initialization does <strong>not</strong> belong in the generic ARM startup <code>init()</code>, so pre-<code>main()</code> construction wouldn’t work. In addition, there’s what I consider to be a bug in GCC-ARM static construction – see <a href="#rfiie_arm_gcc_static_construction_with_pointer_member_variables">RFIIE: GCC-ARM static construction with pointer member variables</a>.</p>
<p><a href="#cpp_haters_2">2</a>) No thanks. My understanding is there are always edge cases in the singleton pattern. Plus, more importantly: This is a small embedded application, all statically-allocated memory, no <code>new</code> or <code>malloc()</code>.</p>
<p><a href="#cpp_haters_3">3</a>) Large swaths of the code are implicitly written for a 32-bit CPU. Specifically, a 32-bit *ARM* CPU. Porting to a different word-size machine would be a complete rewrite. Typedef'ing of the variables would be the least of one’s worries.</p>
<p><a href="#cpp_haters_4">4</a>) Embedded application. No code space for vtables nor execution time to indirect through them.</p>
<p><a href="#cpp_haters_5">5</a>) The CRTP is not really about static polymorphism. Placing the architecture-specific implementations of the base class methods in the derived class file, when there will only ever be one such compiled into the app binary, is a clean, simple solution.</p>
<p><a href="#cpp_haters_6">6</a>) Template by function name? Is this possible?</p>
<p>Note that the macro names are “namespaced” by <code>TRI2B</code>/<code>QUAD4ME</code> prefixes so as not to conflict with other <code>#defines</code>. They are also <code>#undef</code>’d immediately after use so as to not “escape” into the code which includes them.</p>
<p>BTW, it’s my observation that the more “religious” a C++ programmer is about not using the preprocessor, the more likely they are to use large macros themselves when they find them necessary to work out a particularly obfuscated piece of generic programming. Your mileage may vary.</p>
<p><a href="#cpp_haters_7">7</a>) Because I’m trying to write clear, efficient, maintainable code. A different objective than obtaining a good grade from a 21st century CompSci professor. ;)</p>
<p><a href="#cpp_haters_8">8</a>) I love the STL. No dynamic memory allocation in this embedded application, so it’s a non-starter here.</p>
<p>I love/hate the “auto” keyword. Love it for simplifying declaration of iterators to complex classes. Hate it for making code difficult to maintain when trying to find the type of said iterators. Don’t need it, nor range-based loops, in a codebase where all of the “for” statements iterate via POD variables.</p>
<p>Like I said: Let the hating begin. Feel free rewrite the code to your up-to-date C++ liking (but see <a href="#no_warranty">No Warranty</a>). Also see <a href="#RFIIE_non_request"><strong>Non-</strong> Request</a>, below.</p>
<p>Bjarne Stroustrup himself has said:</p>
<blockquote><p><em>“Within C++, there is a much smaller and cleaner language struggling to get out.”</em> <a href="http://www.stroustrup.com/bs_faq.html#really-say-that">http://www.stroustrup.com/bs_faq.html#really-say-that</a></p></blockquote>
<p>I’m no Bjarne Stroustrup, and maybe I’m misinterpreting what he wrote, but for many years my line has been:</p>
<blockquote><p>My favorite programming language in the C family is one of my own design. Fortunately, any decent C++ compiler will compile it without modification. I call my language “C+=0.5”</p></blockquote>
<h5>Assembly coders <a name="assembly_coders"></a></h5>
<p>Hey, assembly coders! I consider myself one of you – in spirit if not in practice. Assembly (on a machine far too ancient to mention) was my third computer language after Basic and Fortran (did I mention “ancient”?) and my first great love.</p>
<p>But I’m a backslider. I share the current, common belief that “you can’t beat the compiler” (except in very limited contexts).</p>
<p><a name="no_default_switch_case"></a>
One of those contexts, at least with the GCC-ARM compiler, is a C/C++ <code>switch/case</code> statement controlled by an <code>enum</code> variable, and which has no <code>default</code> case. When compiled as a jump table (as opposed to chained “if-else” branches), GCC always adds a range check on the switch variable before indexing into the jump table.</p>
<p>To illustrate, the following C++ code:</p>
<pre><code> class SwitchCase {
public:
enum class CASES { // need 5 or more cases to force jump table
C0 = 0, // implementation (with -O1 optimization)
C1,
C2,
C3,
C4,
};
void switch_case(const CASES c);
};
void SwitchCase::switch_case(
const SwitchCase::CASES c)
{
switch (c) { // need at least 5 cases to compile jump table with -O1
case CASES::C0:
asm("nop");
break;
case CASES::C1:
asm("nop");
break;
case CASES::C2:
asm("nop");
break;
case CASES::C3:
asm("nop");
break;
case CASES::C4:
asm("nop");
break;
}
}
</code></pre>
<p>compiles and links into:</p>
<pre><code> void SwitchCase::switch_case(
const SwitchCase::CASES c)
{
switch (c) { // need at least 5 cases to compile jump table with -O1
100000c0: 2904 cmp r1, #4
100000c2: d804 bhi.n 100000ce <SwitchCase::switch_case(SwitchCase::CASES)+0xe>
100000c4: 0089 lsls r1, r1, #2
100000c6: 4b06 ldr r3, [pc, #24] ; (100000e0 <SwitchCase::switch_case(SwitchCase::CASES)+0x20>)
100000c8: 585b ldr r3, [r3, r1]
100000ca: 469f mov pc, r3
/usr/local/example/switch_case.cxx:28
case CASES::C0:
asm("nop");
100000cc: 46c0 nop ; (mov r8, r8)
/usr/local/example/switch_case.cxx:47
case CASES::C4:
asm("nop");
break;
}
}
100000ce: 4770 bx lr
/usr/local/example/switch_case.cxx:32
asm("nop");
100000d0: 46c0 nop ; (mov r8, r8)
/usr/local/example/switch_case.cxx:33
break;
100000d2: e7fc b.n 100000ce <SwitchCase::switch_case(SwitchCase::CASES)+0xe>
/usr/local/example/switch_case.cxx:36
asm("nop");
100000d4: 46c0 nop ; (mov r8, r8)
/usr/local/example/switch_case.cxx:37
break;
100000d6: e7fa b.n 100000ce <SwitchCase::switch_case(SwitchCase::CASES)+0xe>
/usr/local/example/switch_case.cxx:40
asm("nop");
100000d8: 46c0 nop ; (mov r8, r8)
/usr/local/example/switch_case.cxx:41
break;
100000da: e7f8 b.n 100000ce <SwitchCase::switch_case(SwitchCase::CASES)+0xe>
/usr/local/example/switch_case.cxx:44
asm("nop");
100000dc: 46c0 nop ; (mov r8, r8)
/usr/local/example/switch_case.cxx:47
}
100000de: e7f6 b.n 100000ce <SwitchCase::switch_case(SwitchCase::CASES)+0xe>
100000e0: 1000018c .word 0x1000018c
</code></pre>
<p>with a jump table at 0x1000018c containing:</p>
<pre><code> switch_case.elf: file format elf32-littlearm
Contents of section .rodata:
1000018c cc000010 d0000010 d4000010 d8000010 ................
1000019c dc000010 ....
Disassembly of section .rodata:
1000018c <.rodata>:
1000018c: 100000cc .word 0x100000cc
10000190: 100000d0 .word 0x100000d0
10000194: 100000d4 .word 0x100000d4
10000198: 100000d8 .word 0x100000d8
1000019c: 100000dc .word 0x100000dc
</code></pre>
<p>I’m not sure what the C and C++ standards say about this. They may require that a non-handled case with no default skip the entire body of the switch statement.</p>
<p>But knowing there will be no type-punning (casting an arbitrary value not in the enum class as the switch variable), I’d code this by hand without the range check (<code>cmp r1, #4</code> and <code>bhi.n 100000ce</code>) before the jump table lookup. Regardless the standards, GCC has so many extension/options that I wish the was one to force this. If there is, I haven’t been able to find it (see <a href="#RFIIE_gcc_arm_option_for_switch/case_jump_table_optimization">RFIIE: GCC-ARM option for switch/case jump table optimization</a>).</p>
<h5>Python coders <a name="python_coders"></a></h5>
<p>I love Python. I sprinkle it on my cornflakes at breakfast. I use Python whenever I can, i.e. in non-performance-critical applications. I even believe that most performance-critical applications, at least in non-embedded environments, should be written in Python with C/C++ modules implementing the performance-critical sections.</p>
<p>I’m aware of the existence of MicroPython but haven’t tried it. My strong suspicion is that it would not “play” for the minimal hardware environments I’ve targeted tri2b and quad4me at. Maybe the size efficiencies gained by implementing the protocols in byte code would offset the size of the interpreter, and maybe the interpreted code execution is almost as fast as native code.</p>
<p>Maybe. As the Missourians say, “Show me!”. See <a href="#micropython_implementation">RFIIE: MicroPython implementation</a></p>
<h5>Other coders <a name="other_coders"></a></h5>
<p>Java, C#, Ruby, Swift, Visual Basic, the Arduino ecosystem, etc, etc, …</p>
<p>I have zero knowledge of, and even less interest in, these languages. Feel free to port tri2b/quad4me to any of them (modulo the GPL license – see <a href="#no_warranty">No Warranty</a>).</p>
<h3>The example build system <a name="the_example_build_system"></a></h3>
<p>This repository contains code plus a (my) primitive build environment for compiling, linking, and testing it. I sincerely doubt that anyone will find the build environment useful – if you have a potential use for the implementations you will doubtless have your own such environment. This one is included mainly as a “sanity check” that the complete codebase has been included and is buildable.</p>
<p>The build system is primitive, idiosyncratic, and completely undocumented. I include it here without any intention of supporting it. But see <a href="#rfiie_build_system_enhancements">RFIIE: Build system enhancements</a> in the event that generosity drives you to offer any suggestions for its improvement.</p>
<h3>Prerequisites/dependencies <a name="prerequisites_dependencies"></a></h3>
<p>The code in this repository is meant to be built in a Linux environment. Ports to other platforms may be possible.</p>
<p>GNU configure is a wonderful system which I enjoy using. Unfortunately I do not have the time to learn how to construct “configure.in” files from scratch.</p>
<p>Fortunately, the repository’s dependencies are few. They are:</p>
<p>1) The GNU Arm Embedded Toolchain This repository’s code has currently been tested with the “gcc-arm-none-eabi-8-2018-q4-major” release, <a href="https://launchpad.net/gcc-arm-embedded/+announcements#j15181">https://launchpad.net/gcc-arm-embedded/+announcements#j15181</a>. Ports to other compilers may be possible.</p>
<p>2) GNU make</p>
<p>Included in the repository are the files <code>core_cm0plus.h</code>, <code>core_cmInstr.h</code>, <code>core_cmFunc.h</code>, and <code>system_LPC8xx.h</code> from ARM Limited, <code>stm32f103xb.h</code> from STMicroelectronics, and <code>LPC8xx.h</code> from NXP Semiconductors. My reading of their respective copyright notices is that their inclusion here is allowed. I do not understand the text “modified by ARM 02.09.2019” in <code>LPC8xx.h</code> which, as of this writing, is still in the future. That’s some pro-active bug fixing there. ;)</p>
<p>I will not combine the code into a Debian package, RedHat RPM, or (shudder) Windows Installer executable. It’s good thing GitHub is an independent, open-source- and Linux-centric platform. ;) Note that it also supports downloading the source tree as a ZIP archive.</p>
<h3>Repository directories and files <a name="repository_directories_and_files"></a></h3>
<pre><code> +-LICENSE.txt
+-README.html
+-README.md
+-build/
| +-Makefile.base
| +-Makefile.triquad
| +-Makefile.triquad_nxp
| +-Makefile.triquad_stm
| +-lpc824/
| | +-quad4me/
| | | +-Makefile
| | | +-mcu_config.hxx
| | | +-quad4me_config.hxx
| | | +-randomtest.gdb
| | +-tri2b/
| | +-Makefile
| | +-mcu_config.hxx
| | +-randomtest.gdb
| | +-tri2b_config.hxx
| +-stm32f103/
| +-quad4me/
| | +-Makefile
| | +-mcu_config.hxx
| | +-quad4me_config.hxx
| | +-randomtest.gdb
| +-tri2b/
| +-Makefile
| +-mcu_config.hxx
| +-randomtest.gdb
| +-tri2b_config.hxx
+-buildtest.sh
+-include/
| +-arm/
| | +-cmsis_gcc.h
| | +-core_cm0plus.h
| | +-core_cm3.h
| | +-core_cmFunc.h
| | +-core_cmInstr.h
| +-nxp/
| | +-LPC82x.h
| | +-LPC8xx.h
| | +-system_LPC8xx.h
| +-stm/
| | +-stm32f103xb.h
| | +-system_stm32f1xx.h
| +-util/
| +-bitops.hxx
| +-lpc8xx.hxx
| +-rank_id.hxx
| +-stm32f10_12357_xx.hxx
| +-sys_tick.hxx
| +-xorshift_random.hxx
+-init/
| +-lpc8xx_ram_init.c
| +-stm32f103_ram_init.c
+-ld/
| +-lpc824_ram.ld
| +-stm32f103_ram.ld
+-ports/
| +-lpc824/
| | +-triquad/
| | | +-quad4me.cxx
| | | +-quad4me.hxx
| | | +-quad4me.inl
| | | +-randomtest_port.inl
| | | +-tri2b.cxx
| | | +-tri2b.hxx
| | | +-tri2b.inl
| | +-util/
| | +-mcu.cxx
| | +-mcu.hxx
| | +-mrt.cxx
| | +-mrt.hxx
| | +-sct.cxx
| | +-sct.hxx
| +-stm32f103/
| +-triquad/
| | +-quad4me.cxx
| | +-quad4me.hxx
| | +-quad4me.inl
| | +-randomtest_port.inl
| | +-tri2b.cxx
| | +-tri2b.hxx
| | +-tri2b.inl
| +-util/
| +-mcu.cxx
| +-mcu.hxx
| +-tim.cxx
| +-tim.hxx
| +-tim_m_s.cxx
| +-tim_m_s.hxx
+-quad4me/
| +-quad4me_base.cxx
| +-quad4me_base.hxx
+-randomtest/
| +-randomtest.cxx
+-tri2b/
| +-tri2b_base.cxx
| +-tri2b_base.hxx
+-triquad/
+-tri_quad.cxx
+-tri_quad.hxx
</code></pre>
<p>In addition to this <code>README.md</code>, see comments/documentation in the files themselves.</p>
<h4>randomtest.cxx</h4>
<p>See <a href="#the_testbed">The testbed</a>, below.</p>
<p>Can be compiled to use either tri2b or quad4me protocol.</p>
<h4>tri_quad.[ch]xx</h4>
<p>TriQuad base/interface class for Tri2bBase and Quad4meBase protocol implementations, <a href="#base_implementations">directly below</a>. Data members and methods common to both protocols.</p>
<p><a name="base_implementations"></a></p>
<h4>triquad_base.[ch]xx, quad4me_base.[ch]xx</h4>
<p>The protocol implementations.</p>
<p>Contain classes Tri2bBase and Quad4meBase, respectively</p>
<p><a name="protocol_method"></a>
Client application calls protocol object’s <code>protocol()</code> method</p>
<p><code>protocol()</code> method returns <code>true</code> when message finished, <code>false</code> (only if configured with <code>TRIQUAD_BIT_BY_BIT</code>, see <a href="#build_variants">Build variants</a>, below) otherwise. Client app calls protocol object’s <code>role()</code> method to determine whether message was received (arbitration loss), or pending message was sent (arbitration win).</p>
<p><a name="meta2bits"></a>
Client needs to implement the <code>meta2bits()</code> method returning the app-specific number of data bits calculated from the value of the preceding metadata bits. The example implementation uses a simple a one-to-one mapping, <code>meta2data(int meta){return meta;}</code>. A real application might map a small number of message types (fewer metadata bits) to a set of data lengths, <code>meta2data(int meta){return message_lengths[meta];}</code>.</p>
<p>see <a href="#build_variants">Build variants</a>, below</p>
<h4>tri2b.{hxx,inl,cxx} and quad4me.{hxx,inl,cxx} <a name="ported_code"></a></h4>
<p>Ported code</p>
<p><a name="derived_class_methods"></a>
Duck-typed, static inheritance/polymorphic concrete classes, derived from <code>Tri2bBase</code> and <code>Quad4meBase</code></p>
<p>Implement architecture-specific versions of methods declared in <code>Tri2bBase</code> and <code>Quad4meBase</code>, including:</p>
<ul>
<li><code>void reset_delay_start()</code></li>
<li><code>bool reset_delay_wait()</code></li>
<li><code>void enable_interrupt()</code></li>
<li><code>bool alrt()</code></li>
<li><code>bool ltch()</code></li>
<li><code>bool cycl()</code> <em>(quad4me only)</em></li>
<li><code>bool data()</code></li>
<li><code>void set_alrt()</code></li>
<li><code>void set_ltch()</code></li>
<li><code>void set_cycl()</code> <em>(quad4me only)</em></li>
<li><code>void set_data()</code></li>
<li><code>void clr_alrt()</code></li>
<li><code>void clr_ltch()</code></li>
<li><code>void clr_cycl()</code> <em>(quad4me only)</em></li>
<li><code>void clr_data()</code></li>
<li><code>void disble_alrt_fall()</code> <em>(only if</em> <code>TRIQUAD_INTERRUPTS</code> <em>enabled, see <a href="#triquad_polling_vs_triquad_interrupts">TRIQUAD_POLLING vs TRIQUAD_INTERRUPTS</a>, below)</em></li>
<li><code>void enable_alrt_fall()</code> <em>(only if</em> <code>TRIQUAD_INTERRUPTS</code> <em>enabled, see <a href="#triquad_polling_vs_triquad_interrupts">TRIQUAD_POLLING vs TRIQUAD_INTERRUPTS</a>, below)</em></li>
</ul>
<p>Also declare and define other architecture-specific methods <strong>not</strong> in Tri2bBase and Quad4meBase</p>
<p><a name="config_files"></a></p>
<h4>tri2b_config.hxx and quad4me_config.hxx</h4>
<p>Porting information</p>
<p>Describe hardware environment (GPIO ports, timers, etc.)</p>
<h4>mcu_config.hxx</h4>
<p>CPU and peripheral configuration (CPU speed, peripheral initialization)</p>
<h4>rank_id.hxx</h4>
<p>Least-recently-used reshuffling of arbitration priority-to-node ID mapping</p>
<p>See file for algorithm description.</p>
<h4>bitops.hxx</h4>
<p>Convenience routines</p>
<p>Wrappers, replacements, and/or supplements to CPU bit instructions</p>
<h4>xorshift_random.hxx</h4>
<p>Pseudo-random number generator</p>
<h4>LPC8xx.h and stm32f10_12357_xx.hxx</h4>
<p>Wrappers and convenience/usability extensions to NXP- and STM-provided include files</p>
<p>NXP’s <code>LPC8xx.h</code> defines peripheral memory map locations and register layouts, but no bitfields within the registers.</p>
<p>STM’s <code>stm32f103xb.h</code> does define bitfields, but some very poorly. For example <code>GPIO_CRL_MODE1_0</code> and <code>GPIO_CRL_MODE1_1</code> are simply bits which client code must combine appropriately. The <code>stm32f10_12357_xx.hxx</code> wrapper declares meaningful symbolic constants such as <code>gpio::crl::CONF_OUTPUT_PUSH_PULL</code>, <code>gpio::crl::CONF_OUTPUT_OPEN_DRAIN</code>, etc.</p>
<h4>mcu.[ch]xx</h4>
<p>Architecture-specific MCU CPU and peripheral initialization, driven by <code>mcu_config.hxx</code> files</p>
<h4>mrt.[ch]xx, sct.[ch]xx, tim.[ch]xx, tim_m_s.[ch]xx</h4>
<p>Wrapper classes for hardware peripherals (timers)</p>
<p><code>TimMS</code> class implements two 16-bit timers, chained in series</p>
<h4>lpc824_ram.{init,ld} and stm32f103_ram.{init,ld}</h4>
<p>Simple/primitive MCU (pre-main) init scripts and linker memory map configuration files. See <a href="#rfiie_build_system_enhancements">RFIIE: Build system enhancements</a>.</p>
<h4>Makefile.base</h4>
<p>Highly idiosyncratic compilation recipes</p>
<h4>Makefile.triquad</h4>
<p>Settings. See <a href="#build_variants">Build variants</a>, below. Overridable by commandline arguments.</p>
<h4>Makefile.triquad_nxp and Makefile.triquad_stm</h4>
<p>Default architecture-specific settings. Overridable by commandline arguments.</p>
<h4>randomtest.gdb</h4>