Commit graph

3866 commits

Author SHA1 Message Date
Ondrej Zajicek (work)
bbc33f6ec3 Netlink: Add another workaround for older kernel headers
Unfortunately, SOL_NETLINK is both recently added and arch-dependent,
so we cannot just define it.
2022-01-15 22:39:40 +01:00
Ondrej Zajicek (work)
8988264a64 Netlink: Add workaround for older kernel headers 2022-01-14 23:15:05 +01:00
Ondrej Zajicek (work)
e818f16448 Netlink: Enable strict checking for KRT dumps
Add strict checking for netlink KRT dumps to avoid PMTU cache records
from FNHE table dump along with KRT.

Linux Kernel added FNHE table dump to the netlink API in patch:

8d3b68cd37.1561131177.git.sbrivio@redhat.com/

Therefore, since Linux 5.3 these route cache entries are dumped together
with regular routes during periodic KRT scans, which in some cases may be
huge amount of useless data. This can be avoided by using strict checking
for netlink dumps:

https://lore.kernel.org/netdev/20181008031644.15989-1-dsahern@kernel.org/

The patch mitigates the risk of receiving unknown and potentially large
number of FNHE records that would block BIRD I/O in each sync. There is a
known issue caused by the GRE tunnels on Linux that seems to be creating
one FNHE record for each destination IP address that is routed through
the tunnel, even when the PMTU equals to GRE interface MTU.

Thanks to Tomas Hlavacek for the original patch.
2022-01-14 21:53:40 +01:00
Ondrej Zajicek (work)
d0dd1d20cd Netlink: Explicitly skip received cloned routes
Kernel uses cloned routes to keep route cache entries, but reports them
together with regular routes. They were skipped implicitly as they
do not have rtm_protocol filled. Add explicit check for cloned flag
and skip such routes explicitly.

Also, improve debug logs of skipped routes.
2022-01-14 19:07:57 +01:00
Ondrej Zajicek (work)
60e9def9ef BGP: Add option 'free bind'
The BGP 'free bind' option applies the IP_FREEBIND/IPV6_FREEBIND
socket option for the BGP listening socket.

Thanks to Alexander Zubkov for the idea.
2022-01-09 02:44:32 +01:00
Alexander Zubkov
87a02489f3 IO: Support nonlocal bind in socket interface
Add option to socket interface for nonlocal binding, i.e. binding to an
IP address that is not present on interfaces. This behaviour is enabled
when SKF_FREEBIND socket flag is set. For Linux systems, it is
implemented by IP_FREEBIND socket flag.

Minor changes done by commiter.
2022-01-08 19:02:31 +01:00
Ondrej Zajicek (work)
bcb25084d3 Test: Activate some remaining build tests 2022-01-05 20:07:27 +01:00
Ondrej Zajicek (work)
f5c8fb5fba Netlink: Do not ignore dead routes from BIRD
Currently, BIRD ignores dead routes to consider them absent. But it also
ignores its own routes and thus it can not correctly manage such routes
in some cases. This patch makes an exception for routes with proto bird
when ignoring dead routes, so they can be properly updated or removed.

Thanks to Alexander Zubkov for the original patch.
2022-01-05 19:25:42 +01:00
Ondrej Zajicek (work)
77d032c71f Netlink: Improve multipath parsing errors
Function nl_parse_multipath() should handle errors internally.
2022-01-05 18:46:41 +01:00
Ondrej Zajicek (work)
29dda184e5 Conf: Fix parsing full-length IPv6 addresses
Lexer expression for bytestring was too loose, accepting also
full-length IPv6 addresses. It should be restricted such that
colon is used between every byte or never.

Fix the regex and also add some test cases for it.

Thanks to Alexander Zubkov for the bugreport
2022-01-05 16:38:49 +01:00
Matous
75aceadaf7 gitlab-ci.yml: failing gitlab runner fixed.
'registry.labs.nic.cz' -> 'registry.nic.cz' changed
2022-01-05 04:13:39 +01:00
Alexander Zubkov
77042292ff Doc: Document min/max operators for lists 2021-12-28 04:09:36 +01:00
Alexander Zubkov
0e1fd7ea6a Filter: Add operators to find minimum and maximum element of sets
Add operators .min and .max to find minumum or maximum element in sets
of types: clist, eclist, lclist. Example usage:

bgp_community.min
bgp_ext_community.max
filter(bgp_large_community, [(as1, as2, *)]).min

Signed-off-by: Alexander Zubkov <green@qrator.net>
2021-12-28 04:07:09 +01:00
Alexander Zubkov
e15e465720 Doc: Document community components access operators 2021-12-28 04:07:09 +01:00
Alexander Zubkov
a2a268da4f Filter: Add operators to pick community components
Add operators that can be used to pick components from
pair (standard community) or lc (large community) types.
For example:

(10, 20).asn --> 10
(10, 20).data --> 20

(10, 20, 30).asn --> 10
(10, 20, 30).data1 --> 20
(10, 20, 30).data2 --> 30

Signed-off-by: Alexander Zubkov <green@qrator.net>
2021-12-28 04:07:00 +01:00
Ondrej Zajicek (work)
a39cd2cc0b BSD: Assume onlink flag on ifaces with only host addresses
The BSD kernel does not support the onlink flag and BIRD does not use
direct routes for next hop validation, instead depends on interface
address ranges. We would like to handle PtMP cases with only host
addresses configured, like:

  ifconfig wg0 192.168.0.10/32
  route add 192.168.0.4 -iface wg0
  route add 192.168.0.8 -iface wg0

To accept BIRD routes with onlink next-hop, like:

  route 192.168.42.0/24 via 192.168.0.4%wg0 onlink

BIRD would dismiss the route when receiving from the kernel, as the
next-hop 192.168.0.4 is not part of any interface subnet and onlink
flag is not kept by the BSD kernel.

The commit fixes this by assuming that for routes received from the
kernel, any next-hop is onlink on ifaces with only host addresses.

Thanks to Stefan Haller for the original patch.
2021-12-27 21:00:04 +01:00
Job Snijders
b9f38727a7 RPKI: Add contextual out-of-bound checks in RTR Prefix PDU handler
RFC 6810 and RFC 8210 specify that the "Max Length" value MUST NOT be
less than the Prefix Length element (underflow). On the other side,
overflow of the Max Length element also is possible, it being an 8-bit
unsigned integer allows for values larger than 32 or 128. This also
implicitly ensures there is no overflow of "Length" value.

When a PDU is received where the Max Length field is corrputed, the RTR
client (BIRD) should immediately terminate the session, flush all data
learned from that cache, and log an error for the operator.

Minor changes done by commiter.
2021-12-18 16:35:28 +01:00
Simon Ruderich
00410fd6c1 Doc: bgp: remove "advertise ipv4"
The option was removed in d15b0b0a ("BGP redesign", 2016-12-07)
but the documentation wasn't updated.
2021-12-18 03:17:48 +01:00
Ondrej Zajicek (work)
b21104c97e Nest: Do not ignore secondary flag changes in ifa updates
Compare all IA_* flags that are set by sysdep iface code.

The old code ignores IA_SECONDARY flag when comparing whether iface
address updates from kernel changed anything. This is usually not an
issue as kernel removes all secondary addresses due to removal of the
primary one, but it breaks when sysctl 'promote_secondaries' is enabled
and kernel promotes secondary addresses to primary ones.

Thanks to 'Alexander' for the bugreport.
2021-12-18 01:09:52 +01:00
Ondrej Zajicek (work)
78ddfd2600 Trie: Clarify handling of less-common net types
For convenience, Trie functions generally accept as input values not only
NET_IPx types of nets, but also NET_VPNx and NET_ROAx types. But returned
values are always NET_IPx types.
2021-12-02 03:35:29 +01:00
Maria Matejka
f772afc525 Memory statistics split into Effective and Overhead
This feature is intended mostly for checking that BIRD's allocation
strategies don't consume much memory space. There are some cases where
withdrawing routes in a specific order lead to memory fragmentation and
this output should give the user at least a notion of how much memory is
actually used for data storage and how much memory is "just allocated"
or used for overhead.

Also raising the "system allocator overhead estimation" from 8 to 16
bytes; it is probably even more. I've found 16 as a local minimum in
best scenarios among reachable machines. I couldn't find any reasonable
method to estimate this value when BIRD starts up.

This commit also fixes the inaccurate computation of memory overhead for
slabs where the "system allocater overhead estimation" was improperly
added to the size of mmap-ed memory.
2021-11-27 22:54:15 +01:00
Ondrej Zajicek (work)
14fc24f3a5 Trie: Implement longest-prefix-match queries and walks
The prefix trie now supports longest-prefix-match query by function
trie_match_longest_ipX() and it can be extended to iteration over all
covering prefixes for a given prefix (from longest to shortest) using
TRIE_WALK_TO_ROOT_IPx() macro.
2021-11-26 03:26:36 +01:00
Maria Matejka
644e9ca94e Directly mapped pages are kept for future use if temporarily not needed 2021-11-24 19:42:52 +00:00
Ondrej Zajicek (work)
062e69bf52 Trie: Implement trie walking code
Trie walking allows enumeration of prefixes in a trie in the usual
lexicographic order. Optionally, trie enumeration can be restricted
to a chosen subnet (and its descendants).
2021-11-19 18:04:32 +01:00
Ondrej Zajicek (work)
71c18d9f53 Trie: Simplify network matching code
Introduce ipX_prefix_equal() and use it to simplify network matching code.
2021-11-13 21:11:18 +01:00
Maria Matejka
60880b539b Extended route trace: logging Path Identifiers 2021-11-09 17:42:36 +01:00
Ondrej Zajicek (work)
9f24fef5e9 Conf: Fix crash during shutdown
BIRD implements shutdown by reconfiguring to fake empty configuration.
Such fake config structure is created from the last running config and
shares some data, including symbol table. This allows access to (removed)
routing tables and causes crash when 'show route' command is used during
shutdown.

Clean up symbol table, table list and links to default tables, so removed
routing tables cannot be accessed during shutdown.
2021-10-20 01:51:28 +02:00
Maria Matejka
0b295d695a Dropping the unused rte_same hook 2021-10-13 19:09:05 +02:00
Maria Matejka
89ff49f8f0 Dropping rte-local dumper entries 2021-10-13 19:09:05 +02:00
Maria Matejka
e42eedb912 Kernel: Convert the rte-local attributes to extended attributes and flags to pflags 2021-10-13 19:09:04 +02:00
Maria Matejka
5cff1d5f02 Route: moved rte_src pointer from rta to rte
It is an auxiliary key in the routing table, not a route attribute.
2021-10-13 19:09:04 +02:00
Maria Matejka
d5a32563df Preexport: No route modification, no linpool needed 2021-10-13 19:09:04 +02:00
Maria Matejka
541881bedf RIP fixup + dropping the tmp_attrs mechanism as obsolete 2021-10-13 19:09:04 +02:00
Maria Matejka
3660f19dd5 Dropping the RTS_DUMMY temporary route storage.
Kernel route sync is done by other ways now and this code is not used
currently.
2021-10-13 19:09:04 +02:00
Maria Matejka
eb937358c0 Preference moved to RTA and set explicitly in protocols 2021-10-13 19:09:04 +02:00
Maria Matejka
cee0cd148c Export table: Delay freeing of old stored route.
This is needed to provide the protocols the full old route after filters
when export table is enabled.
2021-10-13 19:09:04 +02:00
Maria Matejka
ddd89ba12d BGP: Moved the suppressed and stale flags to pflags 2021-10-13 19:09:04 +02:00
Maria Matejka
c507fb41bb Babel: Convert the rte-local attributes to extended attributes 2021-10-13 19:09:04 +02:00
Maria Matejka
8216ec3027 There may be a symbol with NULL protocol when reconfiguring 2021-10-13 19:09:04 +02:00
Maria Matejka
5f0cb61d82 OSPF: Convert the rte-local attributes to extended attributes 2021-10-13 19:09:04 +02:00
Maria Matejka
8ac20511e1 Show route may be accidentally called on shutdown also when not all default tables are present 2021-10-13 19:09:04 +02:00
Maria Matejka
a0e4c66404 RIP: convert the rte-local attributes to extended attributes 2021-10-13 19:09:04 +02:00
Maria Matejka
6e13df70fd Extended route attributes may include also pointers 2021-10-13 19:09:04 +02:00
Maria Matejka
d471d5fc7c IGP metric getter refactoring to protocol callback
Direct protocol hooks for IGP metric inside nest/rt-table.c make the
protocol API unnecessarily complex. Instead, we use a proper callback.
2021-10-13 19:09:04 +02:00
Maria Matejka
a54f75f454 fixup! Multipage allocation 2021-10-13 19:08:35 +02:00
Maria Matejka
6cd3771378 Multipage allocation
We can also quite simply allocate bigger blocks. Anyway, we need these
blocks to be aligned to their size which needs one mmap() two times
bigger and then two munmap()s returning the unaligned parts.

The user can specify -B <N> on startup when <N> is the exponent of 2,
setting the block size to 2^N. On most systems, N is 12, anyway if you
know that your configuration is going to eat gigabytes of RAM, you are
almost forced to raise your block size as you may easily get into memory
fragmentation issues or you have to raise your maximum mapping count,
e.g. "sysctl vm.max_map_count=(number)".
2021-10-13 19:01:22 +02:00
Maria Matejka
3a31c3aad6 CLI socket accept() may also fail and should produce some message, not a coredump. 2021-10-13 19:00:36 +02:00
Maria Matejka
d322ee3d54 OSPF: explicitly stop the periodic tick on shutdown to avoid recalculation races 2021-10-13 19:00:36 +02:00
Maria Matejka
e5a8eec6d7 Linpools may use pages instead of xmalloc 2021-10-13 19:00:36 +02:00
Maria Matejka
bea582cbb5 fixup! Bound allocated pages to resource pools with page caches to avoid unnecessary syscalls 2021-10-13 18:59:45 +02:00