Commit graph

3808 commits

Author SHA1 Message Date
Ondrej Zajicek (work)
71c9484b00 NEWS and version update 2022-02-09 03:47:49 +01:00
Ondrej Zajicek (work)
2fc8b4c4ba Alloc: Use posix_memalign() instead of aligned_alloc()
For compatibility with older systems use posix_memalign(). We can
switch to aligned_alloc() when we commit to C11 for multithreading.
2022-02-08 22:42:00 +01:00
Ondrej Zajicek (work)
ef614f2984 Netlink: Minor cleanup 2022-02-08 22:21:08 +01:00
Ondrej Zajicek (work)
edc1a24017 Lib: Update alignment of slabs
Alignment of slabs should be at least sizeof(ptr) to avoid unaligned
pointers in slab structures. Fixme: Use proper way to choose alignment
for internal allocators.
2022-02-07 04:39:49 +01:00
Ondrej Zajicek (work)
53a2540687 Merge branch 'oz-trie-table' 2022-02-06 23:42:10 +01:00
Ondrej Zajicek (work)
24600c642a Trie: Fix trie format
After switching to 16-way tries, trie format ignored unaligned / internal
prefixes and only reported the primary prefix of a trie node.

Fix trie format by showing internal prefixes based on the 'local' bitmask
of a node. Also do basic (intra-node) reconstruction of prefix patterns
by finding common subtrees in 'local' bitmask.

In future, we could improve that by doing inter-node reconstruction, so
prefixes entered as one pattern for a subtree (e.g. 192.168.0.0/18+)
would be reported as such, like with aligned prefixes.
2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
5a89edc6fd Nest: Implement locking of prefix tries during walks
The prune loop may may rebuild the prefix trie and therefore invalidate
walk state for asynchronous walks (used in 'show route in' cmd). Fix it
by adding locking that keeps the old trie in memory until current walks
are done.

In future this could be improved by rebuilding trie walk states (by
lookup for last found prefix) after the prefix trie rebuild.
2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
de6318f70a Nest: Implement prefix trie pruning
When rtable is pruned and network fib nodes are removed, we also need to
prune prefix trie. Unfortunately, rebuilding prefix trie takes long time
(got about 400 ms for 1M networks), so must not be atomic, we have to
rebuild a new trie while current one is still active. That may require
some considerable amount of temporary memory, so we do that only if
we expect significant trie size reduction.
2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
ba5aec94cd Trie: Add prefix counter
Add counter of prefixes stored in trie. Works only for 'restricted' tries
composed of explicit prefixes (pxlen == l == h), like ones used in rtables.
2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
d0f9a77f64 Doc: Describe routing table options 2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
1f2eb2aca8 BGP: Implement flowspec validation procedure
Implement flowspec validation procedure as described in RFC 8955 sec. 6
and RFC 9117. The Validation procedure enforces that only routers in the
forwarding path for a network can originate flowspec rules for that
network.

The patch adds new mechanism for tracking inter-table dependencies, which
is necessary as the flowspec validation depends on IP routes, and flowspec
rules must be revalidated when best IP routes change.

The validation procedure is disabled by default and requires that
relevant IP table uses trie, as it uses interval queries for subnets.
2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
1ae42e5223 Nest: Add routing table configuration blocks
Allow to specify sorted flag, trie fla, and min/max settle time.

Also do not enable trie by default, it must be explicitly enabled.
2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
fde1cff012 Nest: Add convenience functions to check rtable net type 2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
61375bd0b3 Nest: Avoid unnecessary net_format() in 'show route' command
When output of 'show route' command was generated, the net_format() was
called for each network prematurely, even if the result was not needed.

Fix the code to call net_format() only when needed. This makes queries
that process many networks but show only few (e.g. 'show route where ..',
or 'show route count') much faster (like 5x - 10x faster).
2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
9ac16df3d7 Nest: Add trie iteration code to 'show route'
Add trie iteration code to rt_show_cont() CLI hook and use it to
accelerate 'show route in <addr>' commands using interval queries.
2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
ea97b89051 Nest: Implement 'show route in <addr>' command
Implement 'show route in <addr>' command, which shows all routes in
networks that are subnets of given network. Currently limited to IP
network types.
2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
836a87b8ac Nest: Attach prefix trie to rtable for faster LPM and interval queries
Attach a prefix trie to IP/VPN/ROA tables. Use it for net_route() and
net_roa_check(). This leads to 3-5x speedups for IPv4 and 5-10x
speedup for IPv6 of these calls.

TODO:
 - Rebuild the trie during rt_prune_table()
 - Better way to avoid trie_add_prefix() in net_get() for existing tables
 - Make it configurable (?)
2022-02-06 23:27:13 +01:00
Ondrej Zajicek (work)
4c6ee53f31 BGP: Make routing loops silent
One of previous commits added error logging of invalid routes. This
also inadvertently caused error logging of route loops, which should
be ignored silently. Fix that.
2022-01-28 18:13:18 +01:00
Ondrej Zajicek (work)
963b2c7ce2 BGP: Use proper class in attribute error messages
Most error messages in attribute processing are in rx/decode step and
these use L_REMOTE log class. But there are few that are in tx/export
step and these should use L_ERR log class.

Use tx-specific macro (REJECT()) in tx/export code and rename field
err_withdraw to err_reject in struct bgp_export_state to ensure that
appropriate error reporting macros are called in proper contexts.
2022-01-28 05:35:22 +01:00
Ondrej Zajicek (work)
75d01ecc2d BGP: Improve 'invalid next hop' error reporting
Distinguish multiple causes of 'invalid next hop' message and report
the relevant next hop address.

Thanks to Simon Ruderich for the original patch.
2022-01-28 05:03:03 +01:00
Ondrej Zajicek (work)
9dbb7eb6eb BGP: Log route updates that were changed to withdraws
Typical BGP error handling is treat-as-withdraw, where an invalid route
is replaced with a withdraw. Log route network when it happens.
2022-01-24 03:44:21 +01:00
Matous Holinka
a9646efd40 .gitlab-ci.yml: minor changes inside the .yml file.
+ ubuntu:21.10 added into the pipeline,
- ubuntu:20.10 removed from the pipeline,

+ misc/docker/ubuntu-21.10-amd64/Dockerfile added,
- misc/docker/ubuntu-20.10-amd64/Dockerfile removed.
2022-01-17 05:17:50 +01:00
Ondrej Zajicek (work)
81ee6cda2e Netlink: Add option to specify netlink socket receive buffer size
Add option 'netlink rx buffer' to specify netlink socket receive buffer
size. Uses SO_RCVBUFFORCE, so it can override rmem_max limit.

Thanks to Trisha Biswas and Michal for the original patches.
2022-01-17 05:11:29 +01:00
Ondrej Zajicek (work)
bbc33f6ec3 Netlink: Add another workaround for older kernel headers
Unfortunately, SOL_NETLINK is both recently added and arch-dependent,
so we cannot just define it.
2022-01-15 22:39:40 +01:00
Ondrej Zajicek (work)
8988264a64 Netlink: Add workaround for older kernel headers 2022-01-14 23:15:05 +01:00
Ondrej Zajicek (work)
e818f16448 Netlink: Enable strict checking for KRT dumps
Add strict checking for netlink KRT dumps to avoid PMTU cache records
from FNHE table dump along with KRT.

Linux Kernel added FNHE table dump to the netlink API in patch:

8d3b68cd37.1561131177.git.sbrivio@redhat.com/

Therefore, since Linux 5.3 these route cache entries are dumped together
with regular routes during periodic KRT scans, which in some cases may be
huge amount of useless data. This can be avoided by using strict checking
for netlink dumps:

https://lore.kernel.org/netdev/20181008031644.15989-1-dsahern@kernel.org/

The patch mitigates the risk of receiving unknown and potentially large
number of FNHE records that would block BIRD I/O in each sync. There is a
known issue caused by the GRE tunnels on Linux that seems to be creating
one FNHE record for each destination IP address that is routed through
the tunnel, even when the PMTU equals to GRE interface MTU.

Thanks to Tomas Hlavacek for the original patch.
2022-01-14 21:53:40 +01:00
Ondrej Zajicek (work)
d0dd1d20cd Netlink: Explicitly skip received cloned routes
Kernel uses cloned routes to keep route cache entries, but reports them
together with regular routes. They were skipped implicitly as they
do not have rtm_protocol filled. Add explicit check for cloned flag
and skip such routes explicitly.

Also, improve debug logs of skipped routes.
2022-01-14 19:07:57 +01:00
Ondrej Zajicek (work)
60e9def9ef BGP: Add option 'free bind'
The BGP 'free bind' option applies the IP_FREEBIND/IPV6_FREEBIND
socket option for the BGP listening socket.

Thanks to Alexander Zubkov for the idea.
2022-01-09 02:44:32 +01:00
Alexander Zubkov
87a02489f3 IO: Support nonlocal bind in socket interface
Add option to socket interface for nonlocal binding, i.e. binding to an
IP address that is not present on interfaces. This behaviour is enabled
when SKF_FREEBIND socket flag is set. For Linux systems, it is
implemented by IP_FREEBIND socket flag.

Minor changes done by commiter.
2022-01-08 19:02:31 +01:00
Ondrej Zajicek (work)
bcb25084d3 Test: Activate some remaining build tests 2022-01-05 20:07:27 +01:00
Ondrej Zajicek (work)
f5c8fb5fba Netlink: Do not ignore dead routes from BIRD
Currently, BIRD ignores dead routes to consider them absent. But it also
ignores its own routes and thus it can not correctly manage such routes
in some cases. This patch makes an exception for routes with proto bird
when ignoring dead routes, so they can be properly updated or removed.

Thanks to Alexander Zubkov for the original patch.
2022-01-05 19:25:42 +01:00
Ondrej Zajicek (work)
77d032c71f Netlink: Improve multipath parsing errors
Function nl_parse_multipath() should handle errors internally.
2022-01-05 18:46:41 +01:00
Ondrej Zajicek (work)
29dda184e5 Conf: Fix parsing full-length IPv6 addresses
Lexer expression for bytestring was too loose, accepting also
full-length IPv6 addresses. It should be restricted such that
colon is used between every byte or never.

Fix the regex and also add some test cases for it.

Thanks to Alexander Zubkov for the bugreport
2022-01-05 16:38:49 +01:00
Matous
75aceadaf7 gitlab-ci.yml: failing gitlab runner fixed.
'registry.labs.nic.cz' -> 'registry.nic.cz' changed
2022-01-05 04:13:39 +01:00
Alexander Zubkov
77042292ff Doc: Document min/max operators for lists 2021-12-28 04:09:36 +01:00
Alexander Zubkov
0e1fd7ea6a Filter: Add operators to find minimum and maximum element of sets
Add operators .min and .max to find minumum or maximum element in sets
of types: clist, eclist, lclist. Example usage:

bgp_community.min
bgp_ext_community.max
filter(bgp_large_community, [(as1, as2, *)]).min

Signed-off-by: Alexander Zubkov <green@qrator.net>
2021-12-28 04:07:09 +01:00
Alexander Zubkov
e15e465720 Doc: Document community components access operators 2021-12-28 04:07:09 +01:00
Alexander Zubkov
a2a268da4f Filter: Add operators to pick community components
Add operators that can be used to pick components from
pair (standard community) or lc (large community) types.
For example:

(10, 20).asn --> 10
(10, 20).data --> 20

(10, 20, 30).asn --> 10
(10, 20, 30).data1 --> 20
(10, 20, 30).data2 --> 30

Signed-off-by: Alexander Zubkov <green@qrator.net>
2021-12-28 04:07:00 +01:00
Ondrej Zajicek (work)
a39cd2cc0b BSD: Assume onlink flag on ifaces with only host addresses
The BSD kernel does not support the onlink flag and BIRD does not use
direct routes for next hop validation, instead depends on interface
address ranges. We would like to handle PtMP cases with only host
addresses configured, like:

  ifconfig wg0 192.168.0.10/32
  route add 192.168.0.4 -iface wg0
  route add 192.168.0.8 -iface wg0

To accept BIRD routes with onlink next-hop, like:

  route 192.168.42.0/24 via 192.168.0.4%wg0 onlink

BIRD would dismiss the route when receiving from the kernel, as the
next-hop 192.168.0.4 is not part of any interface subnet and onlink
flag is not kept by the BSD kernel.

The commit fixes this by assuming that for routes received from the
kernel, any next-hop is onlink on ifaces with only host addresses.

Thanks to Stefan Haller for the original patch.
2021-12-27 21:00:04 +01:00
Job Snijders
b9f38727a7 RPKI: Add contextual out-of-bound checks in RTR Prefix PDU handler
RFC 6810 and RFC 8210 specify that the "Max Length" value MUST NOT be
less than the Prefix Length element (underflow). On the other side,
overflow of the Max Length element also is possible, it being an 8-bit
unsigned integer allows for values larger than 32 or 128. This also
implicitly ensures there is no overflow of "Length" value.

When a PDU is received where the Max Length field is corrputed, the RTR
client (BIRD) should immediately terminate the session, flush all data
learned from that cache, and log an error for the operator.

Minor changes done by commiter.
2021-12-18 16:35:28 +01:00
Simon Ruderich
00410fd6c1 Doc: bgp: remove "advertise ipv4"
The option was removed in d15b0b0a ("BGP redesign", 2016-12-07)
but the documentation wasn't updated.
2021-12-18 03:17:48 +01:00
Ondrej Zajicek (work)
b21104c97e Nest: Do not ignore secondary flag changes in ifa updates
Compare all IA_* flags that are set by sysdep iface code.

The old code ignores IA_SECONDARY flag when comparing whether iface
address updates from kernel changed anything. This is usually not an
issue as kernel removes all secondary addresses due to removal of the
primary one, but it breaks when sysctl 'promote_secondaries' is enabled
and kernel promotes secondary addresses to primary ones.

Thanks to 'Alexander' for the bugreport.
2021-12-18 01:09:52 +01:00
Ondrej Zajicek (work)
78ddfd2600 Trie: Clarify handling of less-common net types
For convenience, Trie functions generally accept as input values not only
NET_IPx types of nets, but also NET_VPNx and NET_ROAx types. But returned
values are always NET_IPx types.
2021-12-02 03:35:29 +01:00
Maria Matejka
f772afc525 Memory statistics split into Effective and Overhead
This feature is intended mostly for checking that BIRD's allocation
strategies don't consume much memory space. There are some cases where
withdrawing routes in a specific order lead to memory fragmentation and
this output should give the user at least a notion of how much memory is
actually used for data storage and how much memory is "just allocated"
or used for overhead.

Also raising the "system allocator overhead estimation" from 8 to 16
bytes; it is probably even more. I've found 16 as a local minimum in
best scenarios among reachable machines. I couldn't find any reasonable
method to estimate this value when BIRD starts up.

This commit also fixes the inaccurate computation of memory overhead for
slabs where the "system allocater overhead estimation" was improperly
added to the size of mmap-ed memory.
2021-11-27 22:54:15 +01:00
Ondrej Zajicek (work)
14fc24f3a5 Trie: Implement longest-prefix-match queries and walks
The prefix trie now supports longest-prefix-match query by function
trie_match_longest_ipX() and it can be extended to iteration over all
covering prefixes for a given prefix (from longest to shortest) using
TRIE_WALK_TO_ROOT_IPx() macro.
2021-11-26 03:26:36 +01:00
Maria Matejka
644e9ca94e Directly mapped pages are kept for future use if temporarily not needed 2021-11-24 19:42:52 +00:00
Ondrej Zajicek (work)
062e69bf52 Trie: Implement trie walking code
Trie walking allows enumeration of prefixes in a trie in the usual
lexicographic order. Optionally, trie enumeration can be restricted
to a chosen subnet (and its descendants).
2021-11-19 18:04:32 +01:00
Ondrej Zajicek (work)
71c18d9f53 Trie: Simplify network matching code
Introduce ipX_prefix_equal() and use it to simplify network matching code.
2021-11-13 21:11:18 +01:00
Ondrej Zajicek (work)
9f24fef5e9 Conf: Fix crash during shutdown
BIRD implements shutdown by reconfiguring to fake empty configuration.
Such fake config structure is created from the last running config and
shares some data, including symbol table. This allows access to (removed)
routing tables and causes crash when 'show route' command is used during
shutdown.

Clean up symbol table, table list and links to default tables, so removed
routing tables cannot be accessed during shutdown.
2021-10-20 01:51:28 +02:00
Ondrej Zajicek (work)
067f69a56d Filter: Add prefix trie benchmarks
Add trie tests intended as benchmarks that use external datasets
instead of generated prefixes. As datasets are not included, they
are commented out by default.
2021-09-25 16:06:43 +02:00