Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 1 | Distributed Switch Architecture |
| 2 | =============================== |
| 3 | |
| 4 | Introduction |
| 5 | ============ |
| 6 | |
| 7 | This document describes the Distributed Switch Architecture (DSA) subsystem |
| 8 | design principles, limitations, interactions with other subsystems, and how to |
| 9 | develop drivers for this subsystem as well as a TODO for developers interested |
| 10 | in joining the effort. |
| 11 | |
| 12 | Design principles |
| 13 | ================= |
| 14 | |
| 15 | The Distributed Switch Architecture is a subsystem which was primarily designed |
| 16 | to support Marvell Ethernet switches (MV88E6xxx, a.k.a Linkstreet product line) |
| 17 | using Linux, but has since evolved to support other vendors as well. |
| 18 | |
| 19 | The original philosophy behind this design was to be able to use unmodified |
| 20 | Linux tools such as bridge, iproute2, ifconfig to work transparently whether |
| 21 | they configured/queried a switch port network device or a regular network |
| 22 | device. |
| 23 | |
| 24 | An Ethernet switch is typically comprised of multiple front-panel ports, and one |
| 25 | or more CPU or management port. The DSA subsystem currently relies on the |
| 26 | presence of a management port connected to an Ethernet controller capable of |
| 27 | receiving Ethernet frames from the switch. This is a very common setup for all |
| 28 | kinds of Ethernet switches found in Small Home and Office products: routers, |
| 29 | gateways, or even top-of-the rack switches. This host Ethernet controller will |
| 30 | be later referred to as "master" and "cpu" in DSA terminology and code. |
| 31 | |
| 32 | The D in DSA stands for Distributed, because the subsystem has been designed |
| 33 | with the ability to configure and manage cascaded switches on top of each other |
| 34 | using upstream and downstream Ethernet links between switches. These specific |
| 35 | ports are referred to as "dsa" ports in DSA terminology and code. A collection |
| 36 | of multiple switches connected to each other is called a "switch tree". |
| 37 | |
| 38 | For each front-panel port, DSA will create specialized network devices which are |
| 39 | used as controlling and data-flowing endpoints for use by the Linux networking |
| 40 | stack. These specialized network interfaces are referred to as "slave" network |
| 41 | interfaces in DSA terminology and code. |
| 42 | |
| 43 | The ideal case for using DSA is when an Ethernet switch supports a "switch tag" |
| 44 | which is a hardware feature making the switch insert a specific tag for each |
| 45 | Ethernet frames it received to/from specific ports to help the management |
| 46 | interface figure out: |
| 47 | |
| 48 | - what port is this frame coming from |
| 49 | - what was the reason why this frame got forwarded |
| 50 | - how to send CPU originated traffic to specific ports |
| 51 | |
| 52 | The subsystem does support switches not capable of inserting/stripping tags, but |
| 53 | the features might be slightly limited in that case (traffic separation relies |
| 54 | on Port-based VLAN IDs). |
| 55 | |
| 56 | Note that DSA does not currently create network interfaces for the "cpu" and |
| 57 | "dsa" ports because: |
| 58 | |
| 59 | - the "cpu" port is the Ethernet switch facing side of the management |
| 60 | controller, and as such, would create a duplication of feature, since you |
| 61 | would get two interfaces for the same conduit: master netdev, and "cpu" netdev |
| 62 | |
| 63 | - the "dsa" port(s) are just conduits between two or more switches, and as such |
| 64 | cannot really be used as proper network interfaces either, only the |
| 65 | downstream, or the top-most upstream interface makes sense with that model |
| 66 | |
| 67 | Switch tagging protocols |
| 68 | ------------------------ |
| 69 | |
| 70 | DSA currently supports 4 different tagging protocols, and a tag-less mode as |
| 71 | well. The different protocols are implemented in: |
| 72 | |
| 73 | net/dsa/tag_trailer.c: Marvell's 4 trailer tag mode (legacy) |
| 74 | net/dsa/tag_dsa.c: Marvell's original DSA tag |
| 75 | net/dsa/tag_edsa.c: Marvell's enhanced DSA tag |
| 76 | net/dsa/tag_brcm.c: Broadcom's 4 bytes tag |
| 77 | |
| 78 | The exact format of the tag protocol is vendor specific, but in general, they |
| 79 | all contain something which: |
| 80 | |
| 81 | - identifies which port the Ethernet frame came from/should be sent to |
| 82 | - provides a reason why this frame was forwarded to the management interface |
| 83 | |
| 84 | Master network devices |
| 85 | ---------------------- |
| 86 | |
| 87 | Master network devices are regular, unmodified Linux network device drivers for |
| 88 | the CPU/management Ethernet interface. Such a driver might occasionally need to |
| 89 | know whether DSA is enabled (e.g.: to enable/disable specific offload features), |
| 90 | but the DSA subsystem has been proven to work with industry standard drivers: |
| 91 | e1000e, mv643xx_eth etc. without having to introduce modifications to these |
| 92 | drivers. Such network devices are also often referred to as conduit network |
| 93 | devices since they act as a pipe between the host processor and the hardware |
| 94 | Ethernet switch. |
| 95 | |
| 96 | Networking stack hooks |
| 97 | ---------------------- |
| 98 | |
| 99 | When a master netdev is used with DSA, a small hook is placed in in the |
| 100 | networking stack is in order to have the DSA subsystem process the Ethernet |
| 101 | switch specific tagging protocol. DSA accomplishes this by registering a |
| 102 | specific (and fake) Ethernet type (later becoming skb->protocol) with the |
| 103 | networking stack, this is also known as a ptype or packet_type. A typical |
| 104 | Ethernet Frame receive sequence looks like this: |
| 105 | |
| 106 | Master network device (e.g.: e1000e): |
| 107 | |
| 108 | Receive interrupt fires: |
| 109 | - receive function is invoked |
| 110 | - basic packet processing is done: getting length, status etc. |
| 111 | - packet is prepared to be processed by the Ethernet layer by calling |
| 112 | eth_type_trans |
| 113 | |
| 114 | net/ethernet/eth.c: |
| 115 | |
| 116 | eth_type_trans(skb, dev) |
| 117 | if (dev->dsa_ptr != NULL) |
| 118 | -> skb->protocol = ETH_P_XDSA |
| 119 | |
| 120 | drivers/net/ethernet/*: |
| 121 | |
| 122 | netif_receive_skb(skb) |
| 123 | -> iterate over registered packet_type |
| 124 | -> invoke handler for ETH_P_XDSA, calls dsa_switch_rcv() |
| 125 | |
| 126 | net/dsa/dsa.c: |
| 127 | -> dsa_switch_rcv() |
| 128 | -> invoke switch tag specific protocol handler in |
| 129 | net/dsa/tag_*.c |
| 130 | |
| 131 | net/dsa/tag_*.c: |
| 132 | -> inspect and strip switch tag protocol to determine originating port |
| 133 | -> locate per-port network device |
| 134 | -> invoke eth_type_trans() with the DSA slave network device |
| 135 | -> invoked netif_receive_skb() |
| 136 | |
| 137 | Past this point, the DSA slave network devices get delivered regular Ethernet |
| 138 | frames that can be processed by the networking stack. |
| 139 | |
| 140 | Slave network devices |
| 141 | --------------------- |
| 142 | |
| 143 | Slave network devices created by DSA are stacked on top of their master network |
| 144 | device, each of these network interfaces will be responsible for being a |
| 145 | controlling and data-flowing end-point for each front-panel port of the switch. |
| 146 | These interfaces are specialized in order to: |
| 147 | |
| 148 | - insert/remove the switch tag protocol (if it exists) when sending traffic |
| 149 | to/from specific switch ports |
| 150 | - query the switch for ethtool operations: statistics, link state, |
| 151 | Wake-on-LAN, register dumps... |
| 152 | - external/internal PHY management: link, auto-negotiation etc. |
| 153 | |
| 154 | These slave network devices have custom net_device_ops and ethtool_ops function |
| 155 | pointers which allow DSA to introduce a level of layering between the networking |
| 156 | stack/ethtool, and the switch driver implementation. |
| 157 | |
| 158 | Upon frame transmission from these slave network devices, DSA will look up which |
| 159 | switch tagging protocol is currently registered with these network devices, and |
| 160 | invoke a specific transmit routine which takes care of adding the relevant |
| 161 | switch tag in the Ethernet frames. |
| 162 | |
| 163 | These frames are then queued for transmission using the master network device |
| 164 | ndo_start_xmit() function, since they contain the appropriate switch tag, the |
| 165 | Ethernet switch will be able to process these incoming frames from the |
| 166 | management interface and delivers these frames to the physical switch port. |
| 167 | |
| 168 | Graphical representation |
| 169 | ------------------------ |
| 170 | |
| 171 | Summarized, this is basically how DSA looks like from a network device |
| 172 | perspective: |
| 173 | |
| 174 | |
| 175 | |--------------------------- |
| 176 | | CPU network device (eth0)| |
| 177 | ---------------------------- |
| 178 | | <tag added by switch | |
| 179 | | | |
| 180 | | | |
| 181 | | tag added by CPU> | |
| 182 | |--------------------------------------------| |
| 183 | | Switch driver | |
| 184 | |--------------------------------------------| |
| 185 | || || || |
| 186 | |-------| |-------| |-------| |
| 187 | | sw0p0 | | sw0p1 | | sw0p2 | |
| 188 | |-------| |-------| |-------| |
| 189 | |
| 190 | Slave MDIO bus |
| 191 | -------------- |
| 192 | |
| 193 | In order to be able to read to/from a switch PHY built into it, DSA creates a |
| 194 | slave MDIO bus which allows a specific switch driver to divert and intercept |
| 195 | MDIO reads/writes towards specific PHY addresses. In most MDIO-connected |
| 196 | switches, these functions would utilize direct or indirect PHY addressing mode |
| 197 | to return standard MII registers from the switch builtin PHYs, allowing the PHY |
| 198 | library and/or to return link status, link partner pages, auto-negotiation |
| 199 | results etc.. |
| 200 | |
| 201 | For Ethernet switches which have both external and internal MDIO busses, the |
| 202 | slave MII bus can be utilized to mux/demux MDIO reads and writes towards either |
| 203 | internal or external MDIO devices this switch might be connected to: internal |
| 204 | PHYs, external PHYs, or even external switches. |
| 205 | |
| 206 | Data structures |
| 207 | --------------- |
| 208 | |
| 209 | DSA data structures are defined in include/net/dsa.h as well as |
| 210 | net/dsa/dsa_priv.h. |
| 211 | |
| 212 | dsa_chip_data: platform data configuration for a given switch device, this |
| 213 | structure describes a switch device's parent device, its address, as well as |
| 214 | various properties of its ports: names/labels, and finally a routing table |
| 215 | indication (when cascading switches) |
| 216 | |
| 217 | dsa_platform_data: platform device configuration data which can reference a |
| 218 | collection of dsa_chip_data structure if multiples switches are cascaded, the |
| 219 | master network device this switch tree is attached to needs to be referenced |
| 220 | |
| 221 | dsa_switch_tree: structure assigned to the master network device under |
| 222 | "dsa_ptr", this structure references a dsa_platform_data structure as well as |
| 223 | the tagging protocol supported by the switch tree, and which receive/transmit |
| 224 | function hooks should be invoked, information about the directly attached switch |
| 225 | is also provided: CPU port. Finally, a collection of dsa_switch are referenced |
| 226 | to address individual switches in the tree. |
| 227 | |
| 228 | dsa_switch: structure describing a switch device in the tree, referencing a |
| 229 | dsa_switch_tree as a backpointer, slave network devices, master network device, |
| 230 | and a reference to the backing dsa_switch_driver |
| 231 | |
| 232 | dsa_switch_driver: structure referencing function pointers, see below for a full |
| 233 | description. |
| 234 | |
| 235 | Design limitations |
| 236 | ================== |
| 237 | |
| 238 | DSA is a platform device driver |
| 239 | ------------------------------- |
| 240 | |
| 241 | DSA is implemented as a DSA platform device driver which is convenient because |
| 242 | it will register the entire DSA switch tree attached to a master network device |
| 243 | in one-shot, facilitating the device creation and simplifying the device driver |
| 244 | model a bit, this comes however with a number of limitations: |
| 245 | |
| 246 | - building DSA and its switch drivers as modules is currently not working |
| 247 | - the device driver parenting does not necessarily reflect the original |
| 248 | bus/device the switch can be created from |
| 249 | - supporting non-MDIO and non-MMIO (platform) switches is not possible |
| 250 | |
| 251 | Limits on the number of devices and ports |
| 252 | ----------------------------------------- |
| 253 | |
| 254 | DSA currently limits the number of maximum switches within a tree to 4 |
| 255 | (DSA_MAX_SWITCHES), and the number of ports per switch to 12 (DSA_MAX_PORTS). |
| 256 | These limits could be extended to support larger configurations would this need |
| 257 | arise. |
| 258 | |
| 259 | Lack of CPU/DSA network devices |
| 260 | ------------------------------- |
| 261 | |
| 262 | DSA does not currently create slave network devices for the CPU or DSA ports, as |
| 263 | described before. This might be an issue in the following cases: |
| 264 | |
| 265 | - inability to fetch switch CPU port statistics counters using ethtool, which |
| 266 | can make it harder to debug MDIO switch connected using xMII interfaces |
| 267 | |
| 268 | - inability to configure the CPU port link parameters based on the Ethernet |
| 269 | controller capabilities attached to it: http://patchwork.ozlabs.org/patch/509806/ |
| 270 | |
| 271 | - inability to configure specific VLAN IDs / trunking VLANs between switches |
| 272 | when using a cascaded setup |
| 273 | |
| 274 | Common pitfalls using DSA setups |
| 275 | -------------------------------- |
| 276 | |
| 277 | Once a master network device is configured to use DSA (dev->dsa_ptr becomes |
| 278 | non-NULL), and the switch behind it expects a tagging protocol, this network |
| 279 | interface can only exclusively be used as a conduit interface. Sending packets |
| 280 | directly through this interface (e.g.: opening a socket using this interface) |
| 281 | will not make us go through the switch tagging protocol transmit function, so |
| 282 | the Ethernet switch on the other end, expecting a tag will typically drop this |
| 283 | frame. |
| 284 | |
| 285 | Slave network devices check that the master network device is UP before allowing |
| 286 | you to administratively bring UP these slave network devices. A common |
| 287 | configuration mistake is forgetting to bring UP the master network device first. |
| 288 | |
| 289 | Interactions with other subsystems |
| 290 | ================================== |
| 291 | |
| 292 | DSA currently leverages the following subsystems: |
| 293 | |
| 294 | - MDIO/PHY library: drivers/net/phy/phy.c, mdio_bus.c |
| 295 | - Switchdev: net/switchdev/* |
| 296 | - Device Tree for various of_* functions |
| 297 | - HWMON: drivers/hwmon/* |
| 298 | |
| 299 | MDIO/PHY library |
| 300 | ---------------- |
| 301 | |
| 302 | Slave network devices exposed by DSA may or may not be interfacing with PHY |
| 303 | devices (struct phy_device as defined in include/linux/phy.h), but the DSA |
| 304 | subsystem deals with all possible combinations: |
| 305 | |
| 306 | - internal PHY devices, built into the Ethernet switch hardware |
| 307 | - external PHY devices, connected via an internal or external MDIO bus |
| 308 | - internal PHY devices, connected via an internal MDIO bus |
| 309 | - special, non-autonegotiated or non MDIO-managed PHY devices: SFPs, MoCA; a.k.a |
| 310 | fixed PHYs |
| 311 | |
| 312 | The PHY configuration is done by the dsa_slave_phy_setup() function and the |
| 313 | logic basically looks like this: |
| 314 | |
| 315 | - if Device Tree is used, the PHY device is looked up using the standard |
| 316 | "phy-handle" property, if found, this PHY device is created and registered |
| 317 | using of_phy_connect() |
| 318 | |
| 319 | - if Device Tree is used, and the PHY device is "fixed", that is, conforms to |
| 320 | the definition of a non-MDIO managed PHY as defined in |
| 321 | Documentation/devicetree/bindings/net/fixed-link.txt, the PHY is registered |
| 322 | and connected transparently using the special fixed MDIO bus driver |
| 323 | |
| 324 | - finally, if the PHY is built into the switch, as is very common with |
| 325 | standalone switch packages, the PHY is probed using the slave MII bus created |
| 326 | by DSA |
| 327 | |
| 328 | |
| 329 | SWITCHDEV |
| 330 | --------- |
| 331 | |
| 332 | DSA directly utilizes SWITCHDEV when interfacing with the bridge layer, and |
| 333 | more specifically with its VLAN filtering portion when configuring VLANs on top |
| 334 | of per-port slave network devices. Since DSA primarily deals with |
| 335 | MDIO-connected switches, although not exclusively, SWITCHDEV's |
| 336 | prepare/abort/commit phases are often simplified into a prepare phase which |
Masanari Iida | bf91795 | 2016-04-09 00:00:25 +0900 | [diff] [blame] | 337 | checks whether the operation is supported by the DSA switch driver, and a commit |
Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 338 | phase which applies the changes. |
| 339 | |
| 340 | As of today, the only SWITCHDEV objects supported by DSA are the FDB and VLAN |
| 341 | objects. |
| 342 | |
| 343 | Device Tree |
| 344 | ----------- |
| 345 | |
| 346 | DSA features a standardized binding which is documented in |
| 347 | Documentation/devicetree/bindings/net/dsa/dsa.txt. PHY/MDIO library helper |
| 348 | functions such as of_get_phy_mode(), of_phy_connect() are also used to query |
| 349 | per-port PHY specific details: interface connection, MDIO bus location etc.. |
| 350 | |
| 351 | HWMON |
| 352 | ----- |
| 353 | |
| 354 | Some switch drivers feature internal temperature sensors which are exposed as |
| 355 | regular HWMON devices in /sys/class/hwmon/. |
| 356 | |
| 357 | Driver development |
| 358 | ================== |
| 359 | |
| 360 | DSA switch drivers need to implement a dsa_switch_driver structure which will |
| 361 | contain the various members described below. |
| 362 | |
| 363 | register_switch_driver() registers this dsa_switch_driver in its internal list |
| 364 | of drivers to probe for. unregister_switch_driver() does the exact opposite. |
| 365 | |
| 366 | Unless requested differently by setting the priv_size member accordingly, DSA |
| 367 | does not allocate any driver private context space. |
| 368 | |
| 369 | Switch configuration |
| 370 | -------------------- |
| 371 | |
Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 372 | - tag_protocol: this is to indicate what kind of tagging protocol is supported, |
| 373 | should be a valid value from the dsa_tag_protocol enum |
| 374 | |
| 375 | - probe: probe routine which will be invoked by the DSA platform device upon |
| 376 | registration to test for the presence/absence of a switch device. For MDIO |
| 377 | devices, it is recommended to issue a read towards internal registers using |
| 378 | the switch pseudo-PHY and return whether this is a supported device. For other |
| 379 | buses, return a non-NULL string |
| 380 | |
| 381 | - setup: setup function for the switch, this function is responsible for setting |
| 382 | up the dsa_switch_driver private structure with all it needs: register maps, |
| 383 | interrupts, mutexes, locks etc.. This function is also expected to properly |
| 384 | configure the switch to separate all network interfaces from each other, that |
| 385 | is, they should be isolated by the switch hardware itself, typically by creating |
| 386 | a Port-based VLAN ID for each port and allowing only the CPU port and the |
| 387 | specific port to be in the forwarding vector. Ports that are unused by the |
| 388 | platform should be disabled. Past this function, the switch is expected to be |
| 389 | fully configured and ready to serve any kind of request. It is recommended |
| 390 | to issue a software reset of the switch during this setup function in order to |
| 391 | avoid relying on what a previous software agent such as a bootloader/firmware |
| 392 | may have previously configured. |
| 393 | |
| 394 | - set_addr: Some switches require the programming of the management interface's |
| 395 | Ethernet MAC address, switch drivers can also disable ageing of MAC addresses |
| 396 | on the management interface and "hardcode"/"force" this MAC address for the |
| 397 | CPU/management interface as an optimization |
| 398 | |
| 399 | PHY devices and link management |
| 400 | ------------------------------- |
| 401 | |
| 402 | - get_phy_flags: Some switches are interfaced to various kinds of Ethernet PHYs, |
| 403 | if the PHY library PHY driver needs to know about information it cannot obtain |
| 404 | on its own (e.g.: coming from switch memory mapped registers), this function |
| 405 | should return a 32-bits bitmask of "flags", that is private between the switch |
| 406 | driver and the Ethernet PHY driver in drivers/net/phy/*. |
| 407 | |
| 408 | - phy_read: Function invoked by the DSA slave MDIO bus when attempting to read |
| 409 | the switch port MDIO registers. If unavailable, return 0xffff for each read. |
| 410 | For builtin switch Ethernet PHYs, this function should allow reading the link |
| 411 | status, auto-negotiation results, link partner pages etc.. |
| 412 | |
| 413 | - phy_write: Function invoked by the DSA slave MDIO bus when attempting to write |
| 414 | to the switch port MDIO registers. If unavailable return a negative error |
| 415 | code. |
| 416 | |
Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 417 | - adjust_link: Function invoked by the PHY library when a slave network device |
| 418 | is attached to a PHY device. This function is responsible for appropriately |
| 419 | configuring the switch port link parameters: speed, duplex, pause based on |
| 420 | what the phy_device is providing. |
| 421 | |
| 422 | - fixed_link_update: Function invoked by the PHY library, and specifically by |
| 423 | the fixed PHY driver asking the switch driver for link parameters that could |
| 424 | not be auto-negotiated, or obtained by reading the PHY registers through MDIO. |
| 425 | This is particularly useful for specific kinds of hardware such as QSGMII, |
| 426 | MoCA or other kinds of non-MDIO managed PHYs where out of band link |
| 427 | information is obtained |
| 428 | |
| 429 | Ethtool operations |
| 430 | ------------------ |
| 431 | |
| 432 | - get_strings: ethtool function used to query the driver's strings, will |
| 433 | typically return statistics strings, private flags strings etc. |
| 434 | |
| 435 | - get_ethtool_stats: ethtool function used to query per-port statistics and |
| 436 | return their values. DSA overlays slave network devices general statistics: |
| 437 | RX/TX counters from the network device, with switch driver specific statistics |
| 438 | per port |
| 439 | |
| 440 | - get_sset_count: ethtool function used to query the number of statistics items |
| 441 | |
| 442 | - get_wol: ethtool function used to obtain Wake-on-LAN settings per-port, this |
| 443 | function may, for certain implementations also query the master network device |
| 444 | Wake-on-LAN settings if this interface needs to participate in Wake-on-LAN |
| 445 | |
| 446 | - set_wol: ethtool function used to configure Wake-on-LAN settings per-port, |
| 447 | direct counterpart to set_wol with similar restrictions |
| 448 | |
| 449 | - set_eee: ethtool function which is used to configure a switch port EEE (Green |
| 450 | Ethernet) settings, can optionally invoke the PHY library to enable EEE at the |
| 451 | PHY level if relevant. This function should enable EEE at the switch port MAC |
| 452 | controller and data-processing logic |
| 453 | |
| 454 | - get_eee: ethtool function which is used to query a switch port EEE settings, |
| 455 | this function should return the EEE state of the switch port MAC controller |
| 456 | and data-processing logic as well as query the PHY for its currently configured |
| 457 | EEE settings |
| 458 | |
| 459 | - get_eeprom_len: ethtool function returning for a given switch the EEPROM |
| 460 | length/size in bytes |
| 461 | |
| 462 | - get_eeprom: ethtool function returning for a given switch the EEPROM contents |
| 463 | |
| 464 | - set_eeprom: ethtool function writing specified data to a given switch EEPROM |
| 465 | |
| 466 | - get_regs_len: ethtool function returning the register length for a given |
| 467 | switch |
| 468 | |
| 469 | - get_regs: ethtool function returning the Ethernet switch internal register |
| 470 | contents. This function might require user-land code in ethtool to |
| 471 | pretty-print register values and registers |
| 472 | |
| 473 | Power management |
| 474 | ---------------- |
| 475 | |
| 476 | - suspend: function invoked by the DSA platform device when the system goes to |
| 477 | suspend, should quiesce all Ethernet switch activities, but keep ports |
| 478 | participating in Wake-on-LAN active as well as additional wake-up logic if |
| 479 | supported |
| 480 | |
| 481 | - resume: function invoked by the DSA platform device when the system resumes, |
| 482 | should resume all Ethernet switch activities and re-configure the switch to be |
| 483 | in a fully active state |
| 484 | |
| 485 | - port_enable: function invoked by the DSA slave network device ndo_open |
| 486 | function when a port is administratively brought up, this function should be |
| 487 | fully enabling a given switch port. DSA takes care of marking the port with |
| 488 | BR_STATE_BLOCKING if the port is a bridge member, or BR_STATE_FORWARDING if it |
| 489 | was not, and propagating these changes down to the hardware |
| 490 | |
| 491 | - port_disable: function invoked by the DSA slave network device ndo_close |
| 492 | function when a port is administratively brought down, this function should be |
| 493 | fully disabling a given switch port. DSA takes care of marking the port with |
| 494 | BR_STATE_DISABLED and propagating changes to the hardware if this port is |
| 495 | disabled while being a bridge member |
| 496 | |
| 497 | Hardware monitoring |
| 498 | ------------------- |
| 499 | |
| 500 | These callbacks are only available if CONFIG_NET_DSA_HWMON is enabled: |
| 501 | |
| 502 | - get_temp: this function queries the given switch for its temperature |
| 503 | |
| 504 | - get_temp_limit: this function returns the switch current maximum temperature |
| 505 | limit |
| 506 | |
| 507 | - set_temp_limit: this function configures the maximum temperature limit allowed |
| 508 | |
| 509 | - get_temp_alarm: this function returns the critical temperature threshold |
| 510 | returning an alarm notification |
| 511 | |
| 512 | See Documentation/hwmon/sysfs-interface for details. |
| 513 | |
| 514 | Bridge layer |
| 515 | ------------ |
| 516 | |
Vivien Didelot | 71327a4 | 2016-03-13 16:21:32 -0400 | [diff] [blame] | 517 | - port_bridge_join: bridge layer function invoked when a given switch port is |
Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 518 | added to a bridge, this function should be doing the necessary at the switch |
| 519 | level to permit the joining port from being added to the relevant logical |
Vivien Didelot | a669275 | 2016-02-12 12:09:39 -0500 | [diff] [blame] | 520 | domain for it to ingress/egress traffic with other members of the bridge. |
Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 521 | |
Vivien Didelot | 71327a4 | 2016-03-13 16:21:32 -0400 | [diff] [blame] | 522 | - port_bridge_leave: bridge layer function invoked when a given switch port is |
Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 523 | removed from a bridge, this function should be doing the necessary at the |
| 524 | switch level to deny the leaving port from ingress/egress traffic from the |
| 525 | remaining bridge members. When the port leaves the bridge, it should be aged |
| 526 | out at the switch hardware for the switch to (re) learn MAC addresses behind |
Vivien Didelot | a669275 | 2016-02-12 12:09:39 -0500 | [diff] [blame] | 527 | this port. |
Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 528 | |
Vivien Didelot | 43c44a9 | 2016-04-06 11:55:03 -0400 | [diff] [blame] | 529 | - port_stp_state_set: bridge layer function invoked when a given switch port STP |
Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 530 | state is computed by the bridge layer and should be propagated to switch |
| 531 | hardware to forward/block/learn traffic. The switch driver is responsible for |
| 532 | computing a STP state change based on current and asked parameters and perform |
| 533 | the relevant ageing based on the intersection results |
| 534 | |
| 535 | Bridge VLAN filtering |
| 536 | --------------------- |
| 537 | |
Florian Fainelli | f05e2db | 2016-05-24 21:26:41 -0700 | [diff] [blame] | 538 | - port_vlan_filtering: bridge layer function invoked when the bridge gets |
| 539 | configured for turning on or off VLAN filtering. If nothing specific needs to |
| 540 | be done at the hardware level, this callback does not need to be implemented. |
| 541 | When VLAN filtering is turned on, the hardware must be programmed with |
| 542 | rejecting 802.1Q frames which have VLAN IDs outside of the programmed allowed |
| 543 | VLAN ID map/rules. If there is no PVID programmed into the switch port, |
| 544 | untagged frames must be rejected as well. When turned off the switch must |
| 545 | accept any 802.1Q frames irrespective of their VLAN ID, and untagged frames are |
| 546 | allowed. |
| 547 | |
Vivien Didelot | f453939 | 2016-04-06 11:06:20 -0400 | [diff] [blame] | 548 | - port_vlan_prepare: bridge layer function invoked when the bridge prepares the |
| 549 | configuration of a VLAN on the given port. If the operation is not supported |
| 550 | by the hardware, this function should return -EOPNOTSUPP to inform the bridge |
| 551 | code to fallback to a software implementation. No hardware setup must be done |
| 552 | in this function. See port_vlan_add for this and details. |
| 553 | |
Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 554 | - port_vlan_add: bridge layer function invoked when a VLAN is configured |
| 555 | (tagged or untagged) for the given switch port |
| 556 | |
| 557 | - port_vlan_del: bridge layer function invoked when a VLAN is removed from the |
| 558 | given switch port |
| 559 | |
Vivien Didelot | 65aebfc | 2016-02-23 12:13:54 -0500 | [diff] [blame] | 560 | - port_vlan_dump: bridge layer function invoked with a switchdev callback |
| 561 | function that the driver has to call for each VLAN the given port is a member |
| 562 | of. A switchdev object is used to carry the VID and bridge flags. |
| 563 | |
Vivien Didelot | f453939 | 2016-04-06 11:06:20 -0400 | [diff] [blame] | 564 | - port_fdb_prepare: bridge layer function invoked when the bridge prepares the |
| 565 | installation of a Forwarding Database entry. If the operation is not |
| 566 | supported, this function should return -EOPNOTSUPP to inform the bridge code |
| 567 | to fallback to a software implementation. No hardware setup must be done in |
| 568 | this function. See port_fdb_add for this and details. |
| 569 | |
Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 570 | - port_fdb_add: bridge layer function invoked when the bridge wants to install a |
| 571 | Forwarding Database entry, the switch hardware should be programmed with the |
| 572 | specified address in the specified VLAN Id in the forwarding database |
| 573 | associated with this VLAN ID |
| 574 | |
| 575 | Note: VLAN ID 0 corresponds to the port private database, which, in the context |
| 576 | of DSA, would be the its port-based VLAN, used by the associated bridge device. |
| 577 | |
| 578 | - port_fdb_del: bridge layer function invoked when the bridge wants to remove a |
| 579 | Forwarding Database entry, the switch hardware should be programmed to delete |
| 580 | the specified MAC address from the specified VLAN ID if it was mapped into |
| 581 | this port forwarding database |
| 582 | |
Vivien Didelot | f453939 | 2016-04-06 11:06:20 -0400 | [diff] [blame] | 583 | - port_fdb_dump: bridge layer function invoked with a switchdev callback |
| 584 | function that the driver has to call for each MAC address known to be behind |
| 585 | the given port. A switchdev object is used to carry the VID and FDB info. |
| 586 | |
Florian Fainelli | 77760e9 | 2015-08-25 15:33:13 -0700 | [diff] [blame] | 587 | TODO |
| 588 | ==== |
| 589 | |
| 590 | The platform device problem |
| 591 | --------------------------- |
| 592 | DSA is currently implemented as a platform device driver which is far from ideal |
| 593 | as was discussed in this thread: |
| 594 | |
| 595 | http://permalink.gmane.org/gmane.linux.network/329848 |
| 596 | |
| 597 | This basically prevents the device driver model to be properly used and applied, |
| 598 | and support non-MDIO, non-MMIO Ethernet connected switches. |
| 599 | |
| 600 | Another problem with the platform device driver approach is that it prevents the |
| 601 | use of a modular switch drivers build due to a circular dependency, illustrated |
| 602 | here: |
| 603 | |
| 604 | http://comments.gmane.org/gmane.linux.network/345803 |
| 605 | |
| 606 | Attempts of reworking this has been done here: |
| 607 | |
| 608 | https://lwn.net/Articles/643149/ |
| 609 | |
| 610 | Making SWITCHDEV and DSA converge towards an unified codebase |
| 611 | ------------------------------------------------------------- |
| 612 | |
| 613 | SWITCHDEV properly takes care of abstracting the networking stack with offload |
| 614 | capable hardware, but does not enforce a strict switch device driver model. On |
| 615 | the other DSA enforces a fairly strict device driver model, and deals with most |
| 616 | of the switch specific. At some point we should envision a merger between these |
| 617 | two subsystems and get the best of both worlds. |
| 618 | |
| 619 | Other hanging fruits |
| 620 | -------------------- |
| 621 | |
| 622 | - making the number of ports fully dynamic and not dependent on DSA_MAX_PORTS |
| 623 | - allowing more than one CPU/management interface: |
| 624 | http://comments.gmane.org/gmane.linux.network/365657 |
| 625 | - porting more drivers from other vendors: |
| 626 | http://comments.gmane.org/gmane.linux.network/365510 |