Skip to content

Commit

Permalink
Merge tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/k…
Browse files Browse the repository at this point in the history
…ernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Remove socket skb caches

   - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and
     avoid memory accounting overhead on each message sent

   - Introduce managed neighbor entries - added by control plane and
     resolved by the kernel for use in acceleration paths (BPF / XDP
     right now, HW offload users will benefit as well)

   - Make neighbor eviction on link down controllable by userspace to
     work around WiFi networks with bad roaming implementations

   - vrf: Rework interaction with netfilter/conntrack

   - fq_codel: implement L4S style ce_threshold_ect1 marking

   - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap()

  BPF:

   - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging
     as implemented in LLVM14

   - Introduce bpf_get_branch_snapshot() to capture Last Branch Records

   - Implement variadic trace_printk helper

   - Add a new Bloomfilter map type

   - Track <8-byte scalar spill and refill

   - Access hw timestamp through BPF's __sk_buff

   - Disallow unprivileged BPF by default

   - Document BPF licensing

  Netfilter:

   - Introduce egress hook for looking at raw outgoing packets

   - Allow matching on and modifying inner headers / payload data

   - Add NFT_META_IFTYPE to match on the interface type either from
     ingress or egress

  Protocols:

   - Multi-Path TCP:
      - increase default max additional subflows to 2
      - rework forward memory allocation
      - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS

   - MCTP flow support allowing lower layer drivers to configure msg
     muxing as needed

   - Automatic Multicast Tunneling (AMT) driver based on RFC7450

   - HSR support the redbox supervision frames (IEC-62439-3:2018)

   - Support for the ip6ip6 encapsulation of IOAM

   - Netlink interface for CAN-FD's Transmitter Delay Compensation

   - Support SMC-Rv2 eliminating the current same-subnet restriction, by
     exploiting the UDP encapsulation feature of RoCE adapters

   - TLS: add SM4 GCM/CCM crypto support

   - Bluetooth: initial support for link quality and audio/codec offload

  Driver APIs:

   - Add a batched interface for RX buffer allocation in AF_XDP buffer
     pool

   - ethtool: Add ability to control transceiver modules' power mode

   - phy: Introduce supported interfaces bitmap to express MAC
     capabilities and simplify PHY code

   - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks

  New drivers:

   - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89)

   - Ethernet driver for ASIX AX88796C SPI device (x88796c)

  Drivers:

   - Broadcom PHYs
      - support 72165, 7712 16nm PHYs
      - support IDDQ-SR for additional power savings

   - PHY support for QCA8081, QCA9561 PHYs

   - NXP DPAA2: support for IRQ coalescing

   - NXP Ethernet (enetc): support for software TCP segmentation

   - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of
     Gigabit-capable IP found on RZ/G2L SoC

   - Intel 100G Ethernet
      - support for eswitch offload of TC/OvS flow API, including
        offload of GRE, VxLAN, Geneve tunneling
      - support application device queues - ability to assign Rx and Tx
        queues to application threads
      - PTP and PPS (pulse-per-second) extensions

   - Broadcom Ethernet (bnxt)
      - devlink health reporting and device reload extensions

   - Mellanox Ethernet (mlx5)
      - offload macvlan interfaces
      - support HW offload of TC rules involving OVS internal ports
      - support HW-GRO and header/data split
      - support application device queues

   - Marvell OcteonTx2:
      - add XDP support for PF
      - add PTP support for VF

   - Qualcomm Ethernet switch (qca8k): support for QCA8328

   - Realtek Ethernet DSA switch (rtl8366rb)
      - support bridge offload
      - support STP, fast aging, disabling address learning
      - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch

   - Mellanox Ethernet/IB switch (mlxsw)
      - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping)
      - offload root TBF qdisc as port shaper
      - support multiple routing interface MAC address prefixes
      - support for IP-in-IP with IPv6 underlay

   - MediaTek WiFi (mt76)
      - mt7921 - ASPM, 6GHz, SDIO and testmode support
      - mt7915 - LED and TWT support

   - Qualcomm WiFi (ath11k)
      - include channel rx and tx time in survey dump statistics
      - support for 80P80 and 160 MHz bandwidths
      - support channel 2 in 6 GHz band
      - spectral scan support for QCN9074
      - support for rx decapsulation offload (data frames in 802.3
        format)

   - Qualcomm phone SoC WiFi (wcn36xx)
      - enable Idle Mode Power Save (IMPS) to reduce power consumption
        during idle

   - Bluetooth driver support for MediaTek MT7922 and MT7921

   - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and
     Realtek 8822C/8852A

   - Microsoft vNIC driver (mana)
      - support hibernation and kexec

   - Google vNIC driver (gve)
      - support for jumbo frames
      - implement Rx page reuse

  Refactor:

   - Make all writes to netdev->dev_addr go thru helpers, so that we can
     add this address to the address rbtree and handle the updates

   - Various TCP cleanups and optimizations including improvements to
     CPU cache use

   - Simplify the gnet_stats, Qdisc stats' handling and remove
     qdisc->running sequence counter

   - Driver changes and API updates to address devlink locking
     deficiencies"

* tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits)
  Revert "net: avoid double accounting for pure zerocopy skbs"
  selftests: net: add arp_ndisc_evict_nocarrier
  net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter
  net: arp: introduce arp_evict_nocarrier sysctl parameter
  libbpf: Deprecate AF_XDP support
  kbuild: Unify options for BTF generation for vmlinux and modules
  selftests/bpf: Add a testcase for 64-bit bounds propagation issue.
  bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit.
  bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off.
  net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c
  net: avoid double accounting for pure zerocopy skbs
  tcp: rename sk_wmem_free_skb
  netdevsim: fix uninit value in nsim_drv_configure_vfs()
  selftests/bpf: Fix also no-alu32 strobemeta selftest
  bpf: Add missing map_delete_elem method to bloom filter map
  selftests/bpf: Add bloom map success test for userspace calls
  bpf: Add alignment padding for "map_extra" + consolidate holes
  bpf: Bloom filter map naming fixups
  selftests/bpf: Add test cases for struct_ops prog
  bpf: Add dummy BPF STRUCT_OPS for test purpose
  ...
  • Loading branch information
torvalds committed Nov 2, 2021
2 parents bfc484f + 84882cf commit fc02cb2
Show file tree
Hide file tree
Showing 2,296 changed files with 215,137 additions and 50,034 deletions.
174 changes: 174 additions & 0 deletions Documentation/ABI/testing/sysfs-timecard
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
What: /sys/class/timecard/
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: This directory contains files and directories
providing a standardized interface to the ancillary
features of the OpenCompute timecard.

What: /sys/class/timecard/ocpN/
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: This directory contains the attributes of the Nth timecard
registered.

What: /sys/class/timecard/ocpN/available_clock_sources
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: (RO) The list of available time sources that the PHC
uses for clock adjustments.

==== =================================================
NONE no adjustments
PPS adjustments come from the PPS1 selector (default)
TOD adjustments from the GNSS/TOD module
IRIG adjustments from external IRIG-B signal
DCF adjustments from external DCF signal
==== =================================================

What: /sys/class/timecard/ocpN/available_sma_inputs
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: (RO) Set of available destinations (sinks) for a SMA
input signal.

===== ================================================
10Mhz signal is used as the 10Mhz reference clock
PPS1 signal is sent to the PPS1 selector
PPS2 signal is sent to the PPS2 selector
TS1 signal is sent to timestamper 1
TS2 signal is sent to timestamper 2
IRIG signal is sent to the IRIG-B module
DCF signal is sent to the DCF module
===== ================================================

What: /sys/class/timecard/ocpN/available_sma_outputs
Date: May 2021
Contact: Jonathan Lemon <[email protected]>
Description: (RO) Set of available sources for a SMA output signal.

===== ================================================
10Mhz output is from the 10Mhz reference clock
PHC output PPS is from the PHC clock
MAC output PPS is from the Miniature Atomic Clock
GNSS output PPS is from the GNSS module
GNSS2 output PPS is from the second GNSS module
IRIG output is from the PHC, in IRIG-B format
DCF output is from the PHC, in DCF format
===== ================================================

What: /sys/class/timecard/ocpN/clock_source
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: (RW) Contains the current synchronization source used by
the PHC. May be changed by writing one of the listed
values from the available_clock_sources attribute set.

What: /sys/class/timecard/ocpN/gnss_sync
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: (RO) Indicates whether a valid GNSS signal is received,
or when the signal was lost.

What: /sys/class/timecard/ocpN/i2c
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: This optional attribute links to the associated i2c device.

What: /sys/class/timecard/ocpN/irig_b_mode
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: (RW) An integer from 0-7 indicating the timecode format
of the IRIG-B output signal: B00<n>

What: /sys/class/timecard/ocpN/pps
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: This optional attribute links to the associated PPS device.

What: /sys/class/timecard/ocpN/ptp
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: This attribute links to the associated PTP device.

What: /sys/class/timecard/ocpN/serialnum
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: (RO) Provides the serial number of the timecard.

What: /sys/class/timecard/ocpN/sma1
What: /sys/class/timecard/ocpN/sma2
What: /sys/class/timecard/ocpN/sma3
What: /sys/class/timecard/ocpN/sma4
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: (RW) These attributes specify the direction of the signal
on the associated SMA connectors, and also the signal sink
or source.

The display format of the attribute is a space separated
list of signals, prefixed by the input/output direction.

The signal direction may be changed (if supported) by
prefixing the signal list with either "in:" or "out:".
If neither prefix is present, then the direction is unchanged.

The output signal may be changed by writing one of the listed
values from the available_sma_outputs attribute set.

The input destinations may be changed by writing multiple
values from the available_sma_inputs attribute set,
separated by spaces. If there are duplicated input
destinations between connectors, the lowest numbered SMA
connector is given priority.

Note that not all input combinations may make sense.

The 10Mhz reference clock input is currently only valid
on SMA1 and may not be combined with other destination sinks.

What: /sys/class/timecard/ocpN/ts_window_adjust
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: (RW) When retrieving the PHC with the PTP SYS_OFFSET_EXTENDED
ioctl, a system timestamp is made before and after the PHC
time is retrieved. The midpoint between the two system
timestamps is usually taken to be the SYS time associated
with the PHC time. This estimate may be wrong, as it depends
on PCI latencies, and when the PHC time was latched

The attribute value reduces the end timestamp by the given
number of nanoseconds, so the computed midpoint matches the
retrieved PHC time.

The initial value is set based on measured PCI latency and
the estimated point where the FPGA latches the PHC time. This
value may be changed by writing an unsigned integer.

What: /sys/class/timecard/ocpN/ttyGNSS
What: /sys/class/timecard/ocpN/ttyGNSS2
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: These optional attributes link to the TTY serial ports
associated with the GNSS devices.

What: /sys/class/timecard/ocpN/ttyMAC
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: This optional attribute links to the TTY serial port
associated with the Miniature Atomic Clock.

What: /sys/class/timecard/ocpN/ttyNMEA
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: This optional attribute links to the TTY serial port
which outputs the PHC time in NMEA ZDA format.

What: /sys/class/timecard/ocpN/utc_tai_offset
Date: September 2021
Contact: Jonathan Lemon <[email protected]>
Description: (RW) The DCF and IRIG output signals are in UTC, while the
TimeCard operates on TAI. This attribute allows setting the
offset in seconds, which is added to the TAI timebase for
these formats.

The offset may be changed by writing an unsigned integer.
92 changes: 92 additions & 0 deletions Documentation/bpf/bpf_licensing.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
=============
BPF licensing
=============

Background
==========

* Classic BPF was BSD licensed

"BPF" was originally introduced as BSD Packet Filter in
http://www.tcpdump.org/papers/bpf-usenix93.pdf. The corresponding instruction
set and its implementation came from BSD with BSD license. That original
instruction set is now known as "classic BPF".

However an instruction set is a specification for machine-language interaction,
similar to a programming language. It is not a code. Therefore, the
application of a BSD license may be misleading in a certain context, as the
instruction set may enjoy no copyright protection.

* eBPF (extended BPF) instruction set continues to be BSD

In 2014, the classic BPF instruction set was significantly extended. We
typically refer to this instruction set as eBPF to disambiguate it from cBPF.
The eBPF instruction set is still BSD licensed.

Implementations of eBPF
=======================

Using the eBPF instruction set requires implementing code in both kernel space
and user space.

In Linux Kernel
---------------

The reference implementations of the eBPF interpreter and various just-in-time
compilers are part of Linux and are GPLv2 licensed. The implementation of
eBPF helper functions is also GPLv2 licensed. Interpreters, JITs, helpers,
and verifiers are called eBPF runtime.

In User Space
-------------

There are also implementations of eBPF runtime (interpreter, JITs, helper
functions) under
Apache2 (https://github.com/iovisor/ubpf),
MIT (https://github.com/qmonnet/rbpf), and
BSD (https://github.com/DPDK/dpdk/blob/main/lib/librte_bpf).

In HW
-----

The HW can choose to execute eBPF instruction natively and provide eBPF runtime
in HW or via the use of implementing firmware with a proprietary license.

In other operating systems
--------------------------

Other kernels or user space implementations of eBPF instruction set and runtime
can have proprietary licenses.

Using BPF programs in the Linux kernel
======================================

Linux Kernel (while being GPLv2) allows linking of proprietary kernel modules
under these rules:
Documentation/process/license-rules.rst

When a kernel module is loaded, the linux kernel checks which functions it
intends to use. If any function is marked as "GPL only," the corresponding
module or program has to have GPL compatible license.

Loading BPF program into the Linux kernel is similar to loading a kernel
module. BPF is loaded at run time and not statically linked to the Linux
kernel. BPF program loading follows the same license checking rules as kernel
modules. BPF programs can be proprietary if they don't use "GPL only" BPF
helper functions.

Further, some BPF program types - Linux Security Modules (LSM) and TCP
Congestion Control (struct_ops), as of Aug 2021 - are required to be GPL
compatible even if they don't use "GPL only" helper functions directly. The
registration step of LSM and TCP congestion control modules of the Linux
kernel is done through EXPORT_SYMBOL_GPL kernel functions. In that sense LSM
and struct_ops BPF programs are implicitly calling "GPL only" functions.
The same restriction applies to BPF programs that call kernel functions
directly via unstable interface also known as "kfunc".

Packaging BPF programs with user space applications
====================================================

Generally, proprietary-licensed applications and GPL licensed BPF programs
written for the Linux kernel in the same package can co-exist because they are
separate executable processes. This applies to both cBPF and eBPF programs.
29 changes: 28 additions & 1 deletion Documentation/bpf/btf.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ sequentially and type id is assigned to each recognized type starting from id
#define BTF_KIND_VAR 14 /* Variable */
#define BTF_KIND_DATASEC 15 /* Section */
#define BTF_KIND_FLOAT 16 /* Floating point */
#define BTF_KIND_DECL_TAG 17 /* Decl Tag */

Note that the type section encodes debug info, not just pure types.
``BTF_KIND_FUNC`` is not a type, and it represents a defined subprogram.
Expand All @@ -106,7 +107,7 @@ Each type contains the following common data::
* "size" tells the size of the type it is describing.
*
* "type" is used by PTR, TYPEDEF, VOLATILE, CONST, RESTRICT,
* FUNC and FUNC_PROTO.
* FUNC, FUNC_PROTO and DECL_TAG.
* "type" is a type_id referring to another type.
*/
union {
Expand Down Expand Up @@ -465,6 +466,32 @@ map definition.

No additional type data follow ``btf_type``.

2.2.17 BTF_KIND_DECL_TAG
~~~~~~~~~~~~~~~~~~~~~~~~

``struct btf_type`` encoding requirement:
* ``name_off``: offset to a non-empty string
* ``info.kind_flag``: 0
* ``info.kind``: BTF_KIND_DECL_TAG
* ``info.vlen``: 0
* ``type``: ``struct``, ``union``, ``func``, ``var`` or ``typedef``

``btf_type`` is followed by ``struct btf_decl_tag``.::

struct btf_decl_tag {
__u32 component_idx;
};

The ``name_off`` encodes btf_decl_tag attribute string.
The ``type`` should be ``struct``, ``union``, ``func``, ``var`` or ``typedef``.
For ``var`` or ``typedef`` type, ``btf_decl_tag.component_idx`` must be ``-1``.
For the other three types, if the btf_decl_tag attribute is
applied to the ``struct``, ``union`` or ``func`` itself,
``btf_decl_tag.component_idx`` must be ``-1``. Otherwise,
the attribute is applied to a ``struct``/``union`` member or
a ``func`` argument, and ``btf_decl_tag.component_idx`` should be a
valid index (starting from 0) pointing to a member or an argument.

3. BTF Kernel API
*****************

Expand Down
9 changes: 9 additions & 0 deletions Documentation/bpf/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,15 @@ Testing and debugging BPF
s390


Licensing
=========

.. toctree::
:maxdepth: 1

bpf_licensing


Other
=====

Expand Down
40 changes: 40 additions & 0 deletions Documentation/bpf/libbpf/libbpf_naming_convention.rst
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,46 @@ mirror of the mainline's version of libbpf for a stand-alone build.
However, all changes to libbpf's code base must be upstreamed through
the mainline kernel tree.


API documentation convention
============================

The libbpf API is documented via comments above definitions in
header files. These comments can be rendered by doxygen and sphinx
for well organized html output. This section describes the
convention in which these comments should be formated.

Here is an example from btf.h:

.. code-block:: c
/**
* @brief **btf__new()** creates a new instance of a BTF object from the raw
* bytes of an ELF's BTF section
* @param data raw bytes
* @param size number of bytes passed in `data`
* @return new BTF object instance which has to be eventually freed with
* **btf__free()**
*
* On error, error-code-encoded-as-pointer is returned, not a NULL. To extract
* error code from such a pointer `libbpf_get_error()` should be used. If
* `libbpf_set_strict_mode(LIBBPF_STRICT_CLEAN_PTRS)` is enabled, NULL is
* returned on error instead. In both cases thread-local `errno` variable is
* always set to error code as well.
*/
The comment must start with a block comment of the form '/\*\*'.

The documentation always starts with a @brief directive. This line is a short
description about this API. It starts with the name of the API, denoted in bold
like so: **api_name**. Please include an open and close parenthesis if this is a
function. Follow with the short description of the API. A longer form description
can be added below the last directive, at the bottom of the comment.

Parameters are denoted with the @param directive, there should be one for each
parameter. If this is a function with a non-void return, use the @return directive
to document it.

License
-------------------

Expand Down
Loading

0 comments on commit fc02cb2

Please sign in to comment.