Network Working Group P. Mohapatra Internet-Draft Sproute Networks Intended status: Standards Track R. Fernando Expires: 20 March 2025 Cisco Systems R. Das, Ed. Juniper Networks, Inc. S. Mohanty, Ed. Zscaler M. Mishra Cisco Systems R.J. Szarecki Google LLC 16 September 2024 BGP Link Bandwidth Extended Community draft-ietf-idr-link-bandwidth-09 Abstract This document describes an application of BGP extended communities that allows a router to perform unequal cost load balancing. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119]. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 20 March 2025. Mohapatra, et al. Expires 20 March 2025 [Page 1] Internet-Draft BGP Link Bandwidth Extended Community September 2024 Copyright Notice Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Link Bandwidth Extended Community . . . . . . . . . . . . . . 3 3. Protocol Procedures . . . . . . . . . . . . . . . . . . . . . 3 3.1. Sender (Originating Link Bandwidth Community) . . . . . . 4 3.2. Receiver (Receiving link bandwidth community) . . . . . . 4 3.3. Re-advertisement Procedures . . . . . . . . . . . . . . . 4 3.3.1. Re-advertisement with Next hop Self . . . . . . . . . 4 3.3.2. Re-advertisement with Next Hop Unchanged . . . . . . 4 3.4. Link bandwidth community Arithmetic and BGP multipath . . 5 4. Error Handling . . . . . . . . . . . . . . . . . . . . . . . 5 5. Document History . . . . . . . . . . . . . . . . . . . . . . 5 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 6 7. Security Considerations . . . . . . . . . . . . . . . . . . . 6 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 6 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 6 10. Normative References . . . . . . . . . . . . . . . . . . . . 6 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 7 1. Introduction Load balancing is a critical aspect of network design, enabling efficient utilization of available bandwidth and improving overall network performance. Traditional equal-cost multi-path (ECMP) routing does not account for the varying capacities of different paths. This document suggests that the external link bandwidth be carried in the network using one of two new extended communities [RFC4360] - the transitive and non-transitive link bandwidth extended community. The Link Bandwidth Extended Community provides a mechanism for routers to advertise the bandwidth of their downstream path(s), facilitating maximum utilisation of network resources. Mohapatra, et al. Expires 20 March 2025 [Page 2] Internet-Draft BGP Link Bandwidth Extended Community September 2024 2. Link Bandwidth Extended Community The Link Bandwidth Extended Communities are defined as a BGP extended community that carries the bandwidth information of a router, represented by BGP Protocol Next Hop, connecting to remote network. This community can be used to inform other routers about the available bandwidth on trough a given route. The Link bandwidth extended communities can be either transitive or non-transitive. Therefore the value of the high-order octet of the extended Type Field can be 0x00 or 0x40 respectively. The value of the low-order octet of the extended type field for this communities is 0x04. The value of the Global Administrator subfield in the Value Field SHOULD represent the Autonomous System of the router that attaches the Link Bandwidth Community, but in can be set to any 2-byte value. If four octet AS numbering scheme is used [RFC6793], AS_TRANS should be used in the Global Administrator subfield. The bandwidth of the link is expressed as 4 octets in [IEEE.754-2019] floating point format, units being bytes (not bits!) per second. It is carried in the Local Administrator subfield of the Value Field. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type=0x00/0x40 | SubType= 0x04 | AS Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Link Bandwidth Value | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: 1-octet field MUST be set to 0x00 or 0x40 to indicate transitive/non-transitive. SubType: 1-octet field MUST be set to 0x04 to indicate 'Link-Bandwidth'. Global Administrator sub-field: 2-octet represent the Autonomous System. Local Administrator sub-field: Bandwidth value (bytes per sec) encoded as 4 octets in IEEE floating point format. Figure 1: Link Bandwidth Extended Community 3. Protocol Procedures Mohapatra, et al. Expires 20 March 2025 [Page 3] Internet-Draft BGP Link Bandwidth Extended Community September 2024 3.1. Sender (Originating Link Bandwidth Community) An originator of the link bandwidth community SHOULD be able to originate either a transitive or a non-transitive link bandwidth extended community. Implementation SHOULD provide configuration to set the transitivity type of the link bandwidth community, as well as Global Administrator filed value and bandwidth value in (Local Administrator filed), trough local policy. No more than one link bandwidth extended community SHALL be attached to a route. An originator can attach link bandwidth community to BGP path in egress processing to adj-RIB-out only or in ingress processing in which case link bandwidth community is present in Local-RIB. Note: Implementation MAY provide configuration option (knob) to allow sending non-transitive link bandwidth extended community on external BGP sessions. 3.2. Receiver (Receiving link bandwidth community) A BGP receiver MUST be able to process link bandwidth community of both transitive or non-transitive type. The receiver MUST NOT flap or treat the route as malformed based on the transitivity of the link bandwidth community and/or BGP session type (internal vs. external). Note: Implementation MAY provide configuration option (knob) to accept non-transitive link bandwidth extended community from external BGP sessions. 3.3. Re-advertisement Procedures 3.3.1. Re-advertisement with Next hop Self When a BGP speaker re-advertises a route with the Link Bandwidth Extended Community and sets the next hop to itself, it SHOULD follow the same procedures as outlined in Section 3.1. In the absence of any import or export policies that alter the Link Bandwidth Extended Community, any received Link Bandwidth extended community on the route will be re-advertised unchanged, in accordance with standard BGP procedures. 3.3.2. Re-advertisement with Next Hop Unchanged A BGP speaker that receives a route with link bandwidth community, re-advertises or reflects the same without changing its next hop SHOULD NOT change the link bandwidth extended community in any way. Mohapatra, et al. Expires 20 March 2025 [Page 4] Internet-Draft BGP Link Bandwidth Extended Community September 2024 3.4. Link bandwidth community Arithmetic and BGP multipath In a BGP multipath ECMP environment, the value of the link bandwidth community that is sent or re-advertised may be calculated based on the link bandwidth communities of the routes contributing to multipath in the Local Routing Information Base (Local-RIB). This topic is beyond the scope of this document. 4. Error Handling If a receiver receives a route with more than one Link Bandwidth Extended Community, it SHOULD: Prefer the lowest value of the attached link bandwidth community (Irrespective of the transitivity) Prefer the transitive Link Bandwidth Extended Community when choosing between transitive and non-transitive types that have the same value Implementations MAY provide configuration options (knobs) to change the above preference. 5. Document History The BGP Link Bandwidth Extended Community has evolved over several versions of the IETF draft. In the earlier versions up to draft- ietf-idr-link-bandwidth-08, only the non-transitive version of the link bandwidth extended community was supported. However, starting from draft-ietf-idr-link-bandwidth-09, both transitive and non- transitive versions of the link bandwidth extended community are supported. An old sender/receiver is a BGP speakers either use procedures upto draft (https://datatracker.ietf.org/doc/html/draft-ietf-idr-link- bandwidth-08) or any undocumented behavior for link bandwidth extended community. A new sender/receiver is a BGP speaker that implements procedures specified in this document. Receiving speaker needs to be upgraded to support the procedures defined in this document to provide full interop with both transitive and non-transitive versions of LBW. In order to keep changes to procedures simple, it is not a goal to provide interop between old Receiver and new Sender. Mohapatra, et al. Expires 20 March 2025 [Page 5] Internet-Draft BGP Link Bandwidth Extended Community September 2024 6. IANA Considerations This document defines a specific application of the two-octet AS specific extended community. IANA is requested to assign a sub- type value of 0x04 for the link bandwidth extended community. Name Value ---- ----- non-transitive Link Bandwidth Ext. Community 0x4004 Name Value ---- ----- transitive Link Bandwidth Ext. Community 0x0004 7. Security Considerations There are no additional security risks introduced by this design. 8. Contributors Kaliraj Vairavakkalai Juniper Networks, Inc. 1133 Innovation Way, Sunnyvale, CA 94089 United States of America Email: kaliraj@juniper.net Natrajan Venkataraman Juniper Networks, Inc. 1133 Innovation Way, Sunnyvale, CA 94089 United States of America Email: natv@juniper.net 9. Acknowledgments The authors would like to thank Yakov Rekhter, Srihari Sangli and Dan Tappan for proposing unequal cost load balancing as one possible application of the extended community attribute. The authors would like to thank Bruno Decraene, Robert Raszuk, Joel Halpern, Aleksi Suhonen, Randy Bush, Jeff Haas and John Scudder for their comments and contributions. 10. Normative References Mohapatra, et al. Expires 20 March 2025 [Page 6] Internet-Draft BGP Link Bandwidth Extended Community September 2024 [IEEE.754-2019] IEEE, "IEEE Standard for Floating-Point Arithmetic", 22 July 2019, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC4360] Sangli, S., Tappan, D., and Y. Rekhter, "BGP Extended Communities Attribute", RFC 4360, DOI 10.17487/RFC4360, February 2006, . [RFC6793] Vohra, Q. and E. Chen, "BGP Support for Four-Octet Autonomous System (AS) Number Space", RFC 6793, DOI 10.17487/RFC6793, December 2012, . Authors' Addresses Pradosh Mohapatra Sproute Networks Email: pradosh@sproute.com Rex Fernando Cisco Systems 170 W. Tasman Drive San Jose, CA 95134 United States of America Email: rex@cisco.com Reshma Das (editor) Juniper Networks, Inc. 1133 Innovation Way, Sunnyvale, CA 94089 United States of America Email: dreshma@juniper.net Satya Mohanty (editor) Zscaler 120 Holger Way, San Jose, CA 95134 United States of America Email: smohanty@zscaler.com Mohapatra, et al. Expires 20 March 2025 [Page 7] Internet-Draft BGP Link Bandwidth Extended Community September 2024 Mankamana Mishra Cisco Systems 821 alder drive, Milpitas, CA 95035 United States of America Email: mankamis@cisco.com Rafal Jan Szarecki Google LLC 1160 N Mathilda Ave,, Sunnyvale, CA 94089 United States of America Email: rszarecki@gmail.com Mohapatra, et al. Expires 20 March 2025 [Page 8]