The program v1model-special-ops.p4
demonstrates the use of resubmit,
recirculate, clone, and multicast replication operations in the BMv2
simple_switch's implementation of P4_16's v1model architecture. It
does not do anything fancy with these features, but at least it shows
how to distinguish whether a packet being processed in the ingress
control block is the result of a resubmit or recirculate operation,
vs. a new packet received from an ingress port. Similarly whether a
packet being processed in the egress control block is the result of a
clone operation.
See README-p414.md for a P4_14 program that exercises similar packet operations.
This program also demonstrates "debug tables". When you use the
--log-console
or --log-file
command line options to the
simple_switch
command, then whenever any tables are applied, the
log output shows:
- the values of all fields in the key of those tables, with the name of each field next to its value,
- whether there was a match found or not, and
- if there was a match found, the name of the action and the values of its action parameters (the names of the action parameters are not shown, but the action parameter values are given in the same order they appear in your P4 source code).
"Debug tables" are simply my name for a table that has NoAction
as
its only action, so there is never any reason to add entries to it,
they can never modify the packet or its metadata, and its only reason
for being in the program is to show in the log the values of the
fields in the key when the table is applied. They are effectively
"debug print" commands.
Note that such debug tables are likely to be useless when compiling to
a hardware target. Worse than useless, they might cause the program
to be larger or more complex in ways that it will not "fit" into the
target when the debug table(s) are present, even though the program
does fit when they are left out. If you find them useful for
developing P4 programs, consider surrounding them with C preprocessor
#ifdef
directives so that they can easily be included or left out
with a one line change (or perhaps a compiler command line option).
I tested this program with a few table entries and test packets
described below, and the resulting log output from simple_switch
for
several test packets are also described there.
I used these versions of the p4lang/behavioral-model and p4lang/p4c repositories in my testing:
- p4lang/behavioral-model - git commit 13370aaf9329fcb369a3ea3989722eb5f61c07f3 dated Aug 16 2018
- p4lang/p4c - git commit c534c585f8faba3e10af5776d5538c8a4374b8a6 dated Aug 31 2018
The program demonstrates passing a list of fields to the
recirculate()
, resubmit()
, and clone3()
primitive operations,
which causes the values of those fields to be preserved with the
resubmitted/recirculated/cloned packet.
WARNING: See the section "Caveat emptor" below for issues that may
cause programs you write using recirculate
, resubmit
, or clone3
to not preserve the metadata fields you specify.
See README-using-bmv2.md for some things that are common across different P4 programs executed using bmv2.
To compile the P4_16 version of the code (which is the only version):
p4c --target bmv2 --arch v1model v1model-special-ops.p4
^^^^^^^^^^^^^^^^^^^^^^ source code
Running that command will create these files:
v1model-special-ops.p4i - the output of running only the preprocessor on
the P4 source program.
v1model-special-ops.json - the JSON file format expected by BMv2
behavioral model `simple_switch`.
Only the file with the .json
suffix is needed to run your P4 program
using the simple_switch
command. You can ignore the file with
suffix .p4i
unless you suspect that the preprocessor is doing
something unexpected with your program.
The file README-p414.md gives instructions and details about a P4_14 program that is not the same in behavior to this P4_16 program, but does exercise similar operations on packets.
To run the behavioral model with 8 ports numbered 0 through 7:
sudo simple_switch --log-console -i 0@veth2 -i 1@veth4 -i 2@veth6 -i 3@veth8 -i 4@veth10 -i 5@veth12 -i 6@veth14 -i 7@veth16 v1model-special-ops.json
To get the log to go to a file instead of the console:
sudo simple_switch --log-file ss-log --log-flush -i 0@veth2 -i 1@veth4 -i 2@veth6 -i 3@veth8 -i 4@veth10 -i 5@veth12 -i 6@veth14 -i 7@veth16 v1model-special-ops.json
CHECK THIS: If you see "Add port operation failed" messages in the output of the simple_switch command, it means that one or more of the virtual Ethernet interfaces veth2, veth4, etc. have not been created on your system. Search for "veth" in the file README-using-bmv2.md for a command to create them.
To run CLI for controlling and examining simple_switch's table contents:
simple_switch_CLI
General syntax for table_add commands at simple_switch_CLI prompt:
RuntimeCmd: help table_add
Add entry to a match table: table_add <table name> <action name> <match fields> => <action parameters> [priority]
Summary of all table entries to create:
table_add ipv4_da_lpm do_resubmit 10.1.0.101/32 => 10.1.0.1
table_add ipv4_da_lpm set_l2ptr 10.1.0.201/32 => 0xcafe
table_add ipv4_da_lpm do_clone_i2e 10.3.0.55/32 => 0xd00d
table_add ipv4_da_lpm set_l2ptr 10.47.1.1/32 => 0xbeef
table_add ipv4_da_lpm set_mcast_grp 225.1.2.3/32 => 1113
table_add mac_da set_bd_dmac_intf 0xe50b => 9 02:13:57:0b:e5:ff 2
table_add mac_da set_bd_dmac_intf 0xcafe => 14 02:13:57:fe:ca:ff 3
table_add mac_da set_bd_dmac_intf 0xec1c => 9 02:13:57:1c:ec:ff 2
table_add mac_da set_bd_dmac_intf 0xd00d => 9 02:13:57:0d:d0:ff 1
table_add mac_da set_bd_dmac_intf 0xbeef => 26 02:13:57:ef:be:ff 0
table_add send_frame rewrite_mac 9 => 00:11:22:33:44:55
table_add send_frame do_recirculate 14 => 10.1.0.2
table_add send_frame do_clone_e2e 26 => 00:11:22:33:55:44
table_add send_frame rewrite_mac 10 => 00:11:22:33:0a:55
table_add send_frame rewrite_mac 11 => 00:11:22:33:0b:55
table_add send_frame rewrite_mac 12 => 00:11:22:33:0c:55
table_add get_multicast_copy_out_bd set_out_bd 1113 400 => 10
table_add get_multicast_copy_out_bd set_out_bd 1113 401 => 11
table_add get_multicast_copy_out_bd set_out_bd 1113 402 => 12
mirroring_add 5 4
mirroring_add 11 5
mc_mgrp_create 1113
mc_node_create 400 0
mc_node_create 401 1
mc_node_create 402 2
mc_node_associate 1113 0
mc_node_associate 1113 1
mc_node_associate 1113 2
Note: The control plane operation "mirroring_add 5 1" causes a packet that is cloned to clone session id 5 (aka "mirror session id") to be sent to output port 1.
Details of which input packets should match which sequence of table
entries are given below. All of this can be seen in the log file
output produced using the --log-console
option to simple_switch,
which I have saved in a separate file for each input packet.
A note on the log files:
Most lines have a time, and several other common parts to it, like this example output line:
[09:25:02.955] [bmv2] [D] [thread 4822] [1.0] [cxt 0] Cloning packet at ingress
Note the part [1.0]
. It means roughly "for original input packet
#1, this line contains info about copy #0 made of that packet". For
packets that are not cloned or multicast replicated, this label should
remain the same for all messages about the packet.
For cloned packets, after the clone operation occurs, you will see a mingling of some log lines with "[1.0]" and others with "[1.1]". The "[1.0]" lines are for the original packet, not the clone. The "[1.1]" lines are for the cloned packet. simple_switch apparently processes these two packets either via two parallel threads, or in some other similar fashion, that causes the log lines for these two packets to be intermingled with each other. Try to focus on only one set of lines if you want to follow the sequential thread of execution for one of those two copies.
A similar thing occurs when packets are multicast replicated. The number before the dot is the number of the input packet, and the number after the dot is the "copy number" for packets creatd from that original input packet.
Resubmitted packet, log in file: resub-pkt-log.txt
:
Summary: ingress, resubmit, ingress, egress, out
Distinctive log message in log file to look for: "Resubmitting packet"
resub_pkt=Ether() / IP(dst='10.1.0.101') / TCP(sport=5793, dport=80)
packet in: ipv4.dstAddr = 10.1.0.101
ingress
table_add ipv4_da_lpm do_resubmit 10.1.0.101/32 => 10.1.0.1
resubmit, packet same as before except instance_type
ingress
assign ipv4.srcAddr a vlaue of 10.252.129.2, l2ptr=RESUBMITTED_PKT_L2PTR=0xe50b
table_add mac_da set_bd_dmac_intf 0xe50b => 9 02:13:57:0b:e5:ff 2
egress
table_add send_frame rewrite_mac 9 => 00:11:22:33:44:55
packet out port 2
Recirculated packet, log in file: recirc-pkt-log.txt
:
Summary: ingress, egress, recirculate, ingress, egress, out
Distinctive log message in log file to look for: "Recirculating packet"
recirc_pkt=Ether() / IP(dst='10.1.0.201') / TCP(sport=5793, dport=80)
packet in: ipv4.dstAddr = 10.1.0.201
table_add ipv4_da_lpm set_l2ptr 10.1.0.201/32 => 0xcafe
table_add mac_da set_bd_dmac_intf 0xcafe => 14 02:13:57:fe:ca:ff 3
egress
table_add send_frame do_recirculate 14 => 10.1.0.2
assign ipv4.dstAddr=10.1.0.2
recirculate
ingress
assign ipv4.srcAddr a value of 10.199.86.99, l2ptr=RECIRCULATED_PKT_L2PTR=0xec1c
table_add mac_da set_bd_dmac_intf 0xec1c => 9 02:13:57:1c:ec:ff 2
egress
table_add send_frame rewrite_mac 9 => 00:11:22:33:44:55
packet out port 2
Packet cloned from ingress to egress, log in file: clone-i2e-pkt-log.txt
:
Summary: ingress, clone
original packet -> egress, out
cloned packet -> egress, out
Distinctive log message in log file to look for: "Cloning packet at ingress"
i2e_clone_pkt=Ether() / IP(dst='10.3.0.55') / TCP(sport=5793, dport=80)
packet in: ipv4.dstAddr = 10.3.0.55
ingress
table_add ipv4_da_lpm do_clone_i2e 10.3.0.55/32 => 0xd00d
clone3 to I2E_CLONE_SESSION_ID=5, configured to go out port 4 with this table entry:
mirroring_add 5 4
table_add mac_da set_bd_dmac_intf 0xd00d => 9 02:13:57:0d:d0:ff 1
original packet:
egress
table_add send_frame rewrite_mac 9 => 00:11:22:33:44:55
packet out port 1
cloned packet:
egress
execute code for clone_i2e case that adds switch_to_cpu header
packet out port 4
Packet cloned from egress to egress, log in file: clone-e2e-pkt-log.txt
:
Summary: ingress, egress, clone
original packet -> out
cloned packet -> egress, out
Distinctive log message in log file to look for: "Cloning packet at egress"
e2e_clone_pkt=Ether() / IP(dst='10.47.1.1') / TCP(sport=5793, dport=80)
packet in: ipv4.dstAddr = 10.47.1.1
ingress
table_add ipv4_da_lpm set_l2ptr 10.47.1.1/32 => 0xbeef
table_add mac_da set_bd_dmac_intf 0xbeef => 26 02:13:57:ef:be:ff 0
original packet:
egress
table_add send_frame do_clone_e2e 26 => 00:11:22:33:55:44
do clone3 action to E2E_CLONE_SESSION_ID=11, configured to go
out port 5 with this table entry:
mirroring_add 11 5
packet out port 0
cloned packet:
egress
execute code for clone_e2e case that adds switch_to_cpu header
packet out port 5
Packet using multicast replication, log in file: mcast-pkt-log.txt
:
Summary: ingress, multicast
Each of the 3 copies independently then does: egress, out
Distinctive log message in log file to look for: "Multicast requested for packet"
mcast_pkt=Ether() / IP(dst='225.1.2.3') / TCP(sport=5793, dport=80)
packet in: ipv4.dstAddr = 225.1.2.3
ingress
table_add ipv4_da_lpm set_mcast_grp 225.1.2.3/32 => 1113
at end of ingress, 3 copies made because of this configuration of mcast_grp 1113:
mc_mgrp_create 1113
mc_node_create 400 0
mc_node_create 401 1
mc_node_create 402 2
# Note: The 0, 1, and 2 below should be the "handles" created when
# the mc_node_create commands above were performed. If they were
# the only such commands performed, and they were done in that
# order, they should have been assigned handles 0, 1, and 2.
mc_node_associate 1113 0
mc_node_associate 1113 1
mc_node_associate 1113 2
copy with egress_port=0, egress_rid=400:
egress
table_add get_multicast_copy_out_bd set_out_bd 1113 400 => 10
table_add send_frame rewrite_mac 10 => 00:11:22:33:0a:55
packet out port 0
copy with egress_port=1, egress_rid=401:
egress
table_add get_multicast_copy_out_bd set_out_bd 1113 401 => 11
table_add send_frame rewrite_mac 11 => 00:11:22:33:0b:55
packet out port 1
copy with egress_port=2, egress_rid=402:
egress
table_add get_multicast_copy_out_bd set_out_bd 1113 402 => 12
table_add send_frame rewrite_mac 12 => 00:11:22:33:0c:55
packet out port 2
Any process that you want to have permission to send and receive
packets on Ethernet interfaces (such as the veth virtual interfaces)
must run as the super-user root, hence the use of sudo
:
sudo scapy
resub_pkt=Ether() / IP(dst='10.1.0.101') / TCP(sport=5793, dport=80)
recirc_pkt=Ether() / IP(dst='10.1.0.201') / TCP(sport=5793, dport=80)
i2e_clone_pkt=Ether() / IP(dst='10.3.0.55') / TCP(sport=5793, dport=80)
e2e_clone_pkt=Ether() / IP(dst='10.47.1.1') / TCP(sport=5793, dport=80)
mcast_pkt=Ether() / IP(dst='225.1.2.3') / TCP(sport=5793, dport=80)
# Send packet at layer2, specifying interface
sendp(resub_pkt, iface="veth6")
sendp(recirc_pkt, iface="veth6")
sendp(i2e_clone_pkt, iface="veth6")
sendp(e2e_clone_pkt, iface="veth6")
sendp(mcast_pkt, iface="veth6")
These notes may be specific to not only p4c and behavioral-model open source repository implementations, but perhaps even to the specific versions below that I have tested some of this with.
- p4lang/behavioral-model - git commit 13370aaf9329fcb369a3ea3989722eb5f61c07f3 dated Aug 16 2018
- p4lang/p4c - git commit c534c585f8faba3e10af5776d5538c8a4374b8a6 dated Aug 31 2018
This list of standard_metadata fields comes from that version of p4c, in the file: p4c/p4include/v1model.p4
Some of these "intrinsic" or "standard" metadata fields are different in P4_16 plus the Portable Switch Architecture (PSA). See the PSA specification for the metadata that PSA has and how its values are used.
Unless stated otherwise, assume that these values will be 0 for
recirculated or resubmitted packets at the beginning of ingress
processing, or for cloned packets at the beginning of egress
processing, unless you include the field explicitly in the list of
metadata fields whose values should be preserved, as a parameter to
the resubmit
, recirculate
, or clone3
primitive operation.
At the end of ingress processing, there are multiple of these fields
that are used to determine what happens to the packet next. You can
read the details in the method ingress_thread
of the
p4lang/behavioral-model source file
targets/simple_switch/simple_switch.cpp
,
but the pseudocode given in this simple_switch
documentation
gives a good description of the behavior, as well as similar
pseudocode for what happens to the packet next at the end of egress
processing.
The qid
field is in queueing_metadata
defined on that page:
qid
- This is in the simple_switch documentation aboutqueueing_metadata
, but is not inv1model.p4
. TBD: Should it be added to v1model.p4?
The fields below are not mentioned in the behavioral-model
simple_switch
documentation,
and there is a comment in the
v1model.p4
include file explaining that they are deprecated. Both were added to
the p4lang/p4c repository as part of the initial commit in Apr 2016,
so perhaps they are historical vestiges of some older ideas of how
v1model should work.
recirculate_port
- There is no mention of this field anywhere in the behavioral-model source code.drop
- This field appears to be unused in simple_switch. It is not mentioned in the source filesimple_switch.cpp
.
(Caveat emptor is Latin for, roughly: "Let the buyer beware!")
If you do all of these things at the same time:
- You use the open source tools
p4c
andbehavioral-model
as of January 2019 or earlier, and probably for a while longer until these issues are addressed (see below for where to find out up to date info). - You write a program in the P4_16 variant of P4, with the v1model architecture, or you write a program in the P4_14 variant of P4.
- You use any of the
recirculate
,resubmit
, or clone operations (calledclone3
in P4_16, orclone_ingress_pkt_to_egress
orclone_egress_pkt_to_egress
in P4_14), with a non-empty list of metadata fields whose values you want to preserve with the recirculated, resubmitted, or cloned packet.
Then you may find that the metadata field values are not preserved. The current implementation for preserving such metadata field values is very fragile, and effectively it sometimes works by accident, not by design.
When it does not work, the symptom is that one or more of the metadata
field values you expect to be preserved instead have their default
initial value when the post-recirculated/resubmitted/cloned packet is
next processed. The default initial value for such metadata fields is
most often 0 in simple_switch
.
Aside: It is possible to investigate the BMv2 JSON file created by the
p4c
compiler manually (it is just a specially formatted text file),
in particular the value of the "field_lists"
key, to see whether the
compiler has generated them in a way that will cause this undesired
behavior. One sure sign of a problem is that one or more of the
fields have constant values in the field list in that file, rather
than a named field. Another sign of a problem is if the compiler has
generated a temporary variable, copied the value of your field into
it, and used that temporary variable in the field list. All of these
problems can be checked for using the program
bmv2-json-check.py
.
The check is perhaps not 100% reliable for the last problem mentioned,
but seems to work.
Even when the metadata field values are preserved, they will be the values that those fields have when the current ingress or egress control block is finished executing. If any later code in that control block modifies the values of the metadata fields after the recirculate/resubmit/clone call, the modified value will be preserved. That is the behavior specified in the P4_14 language specification for these operations. For P4_16, it violates language semantics for calls to extern functions and methods, which should only have access to the values of parameters at the time the call occurs.
One or both of the issues linked below will likely be updated, or link to further issues or code changes, when these problems are resolved:
The first implementation of these operations was for the P4_14 variant of the language. In P4_14, the recirculate, resubmit, and clone operations take a "field list" as a parameter. A field list is a named list of metadata field names.
The behavior in simple_switch
is that when you invoke a recirculate,
resubmit, or clone operation, it does not immediately make a copy of
the value of those fields. Instead it remembers a numeric id that the
compiler generates to identify the named field list (see the
description of recirculate_flag
resubmit_flag
and clone_spec
standard metadata fields
here
if you want more details).
When the packet completes the current ingress or egress processing
control block, then at that time simple_switch
copies the values
of the metadata fields into some location associated with that packet.
When the resulting recirculated or resubmitted packet begins its next
ingress processing pass, or the resulting cloned packet begins its
next egress processing pass, those field values are restored from that
location and become the initial values of those metadata fields.
There are two aspects of this that cannot be represented in the P4_16 language: delayed effects, and references by name.
First, the time during packet processing when the values of the packet's metadata fields are copied and saved, can be much later in the sequential flow of program execution than the place where the recirculate, resubmit, or clone operation occurs. It is a delayed effect. The P4_16 language specification does not specify any mechanisms by which one can write a statement A in a program, and then after other statements B, C, D, etc. are executed, the effect of the statement A occurs. Certainly an optimizing P4_16 compiler can reorder the execution, but the effect must be the same as if the program were executed sequentially. A P4_14 recirculate operation can cause values to be saved for metadata fields that are assigned later in the P4_14 program.
Second, a field list gives names of fields, and those are used both for where to copy their current values from, and where to copy those saved values into, later. This is similar to "call by name" in other programming languages, and P4_16 provides no such mechanism. P4_16 defines copy in and copy out semantics for all existing methods of calling a control, function, or extern function or method. Even extern methods and functions are not allowed to keep references or pointers, only values copied from the caller.
It is definitely possible in a P4_16 program to use a technique
similar to the ssimple_switch
implementation, e.g. record a numeric
id in a well-known metadata field that identifies a list of fields one
wants to preserve for the packet, and then at some later part of the
program use the value of that metadata field to make copies of the
desired fields. Such a technique seems likely to become part of the
solution of these issues for P4_16 programs, perhaps written
explicitly by the P4 programmer, rather than auto-generated by the
compiler as occurs now for P4_14 programs.