The prefix hash table in BGP used the same hash function as the rtable.
When a batch of routes are exported during feed/flush to the BGP, they
all have similar hash values, so they are all crowded in a few slots in
the BGP prefix table (which is much smaller - around the size of the
batch - and uses higher bits from hash values), making it much slower due
to excessive collisions. Use a different hash function to avoid this.
Also, increase the batch size to fill 4k BGP packets and increase minimum
BGP bucket and prefix hash sizes to avoid back and forth resizing during
flushes.
This leads to order of magnitude faster flushes (on my test data).
When shutting down a Babel instance we send a wildcard retraction to make
sure all peers can quickly switch to other route origins. Add another small
optimisation borrowed from babeld: sending a Hello message (along with the
retraction) with a very low interval.
This will cause neighbours to modify their expiry timers for the node's
state to quickly time it out, thus conserving resources in the network.
Add BFD protocol option 'strict bind' to use separate listening socket
for each BFD interface bound to its address instead of using shared
listening sockets.
A recent change in Babel causes ifaces to disappear after
reconfiguration. The patch fixes that.
Thanks to Johannes Kimmel for an insightful bugreport.
Implement flowspec validation procedure as described in RFC 8955 sec. 6
and RFC 9117. The Validation procedure enforces that only routers in the
forwarding path for a network can originate flowspec rules for that
network.
The patch adds new mechanism for tracking inter-table dependencies, which
is necessary as the flowspec validation depends on IP routes, and flowspec
rules must be revalidated when best IP routes change.
The validation procedure is disabled by default and requires that
relevant IP table uses trie, as it uses interval queries for subnets.
Attach a prefix trie to IP/VPN/ROA tables. Use it for net_route() and
net_roa_check(). This leads to 3-5x speedups for IPv4 and 5-10x
speedup for IPv6 of these calls.
TODO:
- Rebuild the trie during rt_prune_table()
- Better way to avoid trie_add_prefix() in net_get() for existing tables
- Make it configurable (?)
One of previous commits added error logging of invalid routes. This
also inadvertently caused error logging of route loops, which should
be ignored silently. Fix that.
Most error messages in attribute processing are in rx/decode step and
these use L_REMOTE log class. But there are few that are in tx/export
step and these should use L_ERR log class.
Use tx-specific macro (REJECT()) in tx/export code and rename field
err_withdraw to err_reject in struct bgp_export_state to ensure that
appropriate error reporting macros are called in proper contexts.
RFC 6810 and RFC 8210 specify that the "Max Length" value MUST NOT be
less than the Prefix Length element (underflow). On the other side,
overflow of the Max Length element also is possible, it being an 8-bit
unsigned integer allows for values larger than 32 or 128. This also
implicitly ensures there is no overflow of "Length" value.
When a PDU is received where the Max Length field is corrputed, the RTR
client (BIRD) should immediately terminate the session, flush all data
learned from that cache, and log an error for the operator.
Minor changes done by commiter.
Some cleanups and bugfixes to the previous patch, including:
- Fix rate limiting in index mismatch check
- Fix missing BABEL_AUTH_INDEX_LEN in auth_tx_overhead computation
- Fix missing auth_tx_overhead recalculation during reconfiguration
- Fix pseudoheader construction in babel_auth_sign() (sport vs fport)
- Fix typecasts for ptrdiffs in log messages
- Make auth log messages similar to corresponding RIP/OSPF ones
- Change auth log messages for events that happen during regular
operation to debug messages
- Switch meaning of babel_auth_check*() functions for consistency
with corresponding RIP/OSPF ones
- Remove requirement for min/max key length, only those required by
given MAC code are enforced
This implements support for MAC authentication in the Babel protocol, as
specified by RFC 8967. The implementation seeks to follow the RFC as close
as possible, with the only deliberate deviation being the addition of
support for all the HMAC algorithms already supported by Bird, as well as
the Blake2b variant of the Blake algorithm.
For description of applicability, assumptions and security properties,
see RFC 8967 sections 1.1 and 1.2.
In preparation for adding authentication checks, refactor the TLV
walking code so it can be reused for a separate pass of the packet
for authentication checks.
Routes from downed protocols stay in rtable (until next rtable prune
cycle ends) and may be even exported to another protocol. In BGP case,
source BGP protocol is examined, although dynamic parts (including
neighbor entries) are already freed. That may lead to crash under some
race conditions. Ensure that freed neighbor entry is not accessed to
avoid this issue.
When an interface disappears, all the neighbors are freed as well. Seqno
requests were anyway not decoupled from them, leading to strange
segfaults. This fix adds a proper seqno request list inside neighbors to
make sure that no pointer to neighbor is kept after free.
The babel protocol code checks whether iface supports multicast, and
whether it has a link-local address assigned. However, it doesn not give
any feedback if any of those checks fail, it just silently ignores the
interface. Fix this by explicitly logging when multicast check fails.
Based on patch from Toke Høiland-Jørgensen, thanks!
Ifaces with host address (/32) were forced to be stubby, but now they
can be used as PtP or PtMP. For these ifaces we need to:
- Do not force stub mode
- Accept packets from any IP as local
- Accept any configured neighbor as local
- Detect ifaces properly as unnumbered
- Use ONLINK flag for nexthops
As specified in RFC 2328 8.1: "On physical point-to-point networks,
the IP destination is always set to the address AllSPFRouters."
Note that this likely break setups with multiple neighbors on a network
configured as PtP, which worked before. These should be configured as
PtMP.
Thanks to Senthil Kumar Nagappan for the original patch and to Joakim
Tjernlund for suggestions.
The flag makes sense just in external representation. It is reset during
BGP export, but keeping it internally broke MRT dumps for short attributes
that used it anyways.
Thanks to Simon Marsh for the bugreport and the patch.
BGP statistics code was preliminary and i wanted to replace it by
separate 'show X stats' command. The patch hides the preliminary
output in 'show protocols all' so it is not part of the released
version.
In OSPFv3, only Hello and DBDes packets contain flags specifying whether
RFC 7166 authentication trailer is used. Other packets are processed
based on stored authentication state in neighbor structure. Update this
state with each received Hello to handle authentication change from
reconfigurations.
Thanks to Joakim Tjernlund and Kenth Eriksson for the bugreport.
This is an implementation of draft-walton-bgp-hostname-capability-02.
It is implemented since quite some time for FRR and in datacenter, this
gives a nice output to avoid using IP addresses.
It is disabled by default. The hostname is retrieved from uname(2) and
can be overriden with "hostname" option. The domain name is never set
nor displayed.
Minor changes by committer.
Flag signalling that MP-BGP mode should be used got reset after first
batch of routes, so remaining routes were processed without that, leading
to missing MP_REACH_NLRI attribute.
Thanks to Piotr Wydrych for the bugreport.
Add fake MP_REACH_NLRI attribute with BGP next hop when encoding MRT
table dumps for IPv6 routes. That is necessary to encode next hop as
NEXT_HOP attribute is not used for MP-BGP.
Thanks to Santiago Aggio for the bugreport.
Direct BFD sessions needs to be dispatched not only by IP addresses, but
also by interfaces, in order to avoid collisions between neighbors with
the same IPv6 link-local addresses.
Extend BFD session hash_ip key by interface index to handle that. Use 0
for multihop sessions.
Thanks to Sebastian Hahn for the original patch.
It was mixed-up if hostname is IPv6 address, and reporting separate
values (like port) on separate lines fits better into key-value style
of 'show protocols all' output. Also, the patch simplifies transport
identification formatting (although it is unused now).
Thanks to Alarig Le Lay for the suggestion.
The option is not implemented since transition to 2.0 and no plan to add it.
Also remove some deprecated RTS_* valus from documentation.
Thanks to Sébastien Parisot for notification.
The patch add support for per-channel debug flags, currently just
'states', 'routes', and 'filters'. Flag 'states' is used for channel
state changes, remaining two for routes passed through the channel.
The per-protocol debug flags 'routes'/'filters' still enable reporting
of routes for all channels, to keep existing behavior.
The patch causes minor changes in some log messages.
When config structures are copied due to template application,
we need to reset list node structure before calling add_tail().
Thanks to Mikael Magnusson for patches.
The babel protocol code was initialising objects returned from the slab
allocator by assigning to each of the struct members individually, but
wasn't touching the NODE member while doing so. This leads to warnings on
debug builds since commit:
baac700906 ("List expensive check.")
To fix this, introduce an sl_allocz() variant of the slab allocator which
will zero out the memory before returning it, and switch all the babel call
sites to use this version. The overhead for doing this should be negligible
for small objects, and in the case of babel, the largest object being
allocated was being zeroed anyway, so we can drop the memset in
babel_read_tlv().
Add support for proper handling of multiple routes with the same network
to the static protocol. Routes are distinguished by internal index, which
is assigned automatically (sequentially for routes within each network).
Having different route preference or igp_metric attribute is optional.
When a new link-LSA is originated, we need to notify intra-area-prefix-LSA
handling, like when a new link-LSA is received. Otherwise a new network
prefix added to a DR is not propagated immediately.
Thanks to Bala Sajja for the bugreport.
Merge multiple BFD option blocks in BGP configs instead of using the last
one. That is necessary for proper handling of templates when BFD options
are used both in a BGP template and in a BGP protocol derived from that
template.