Commit graph

555 commits

Author SHA1 Message Date
Maria Matejka 57308fb277 Page allocator: Fixed minor bugs and added commentary 2022-11-03 12:38:57 +01:00
Maria Matejka 9d03c3f56c Memory pages are not munmapped, instead we just madvise()
Memory unmapping causes slow address space fragmentation, leading in
extreme cases to failing to allocate pages at all. Removing this problem
by keeping all the pages allocated to us, yet calling madvise() to let
kernel dispose of them.

This adds a little complexity and overhead as we have to keep the
pointers to the free pages, therefore to hold e.g. 1 GB of 4K pages with
8B pointers, we have to store 2 MB of data.
2022-11-02 12:56:54 +01:00
Alexander Zubkov 0f2be469f8 KRT: Fix setting default preference
Changes in commit eb937358 broke setting of channel preference for alien
routes learned during scan. The preference was set only for async routes.
Move common attribute processing part of functions krt_learn_async() and
krt_learn_async() to a separate function to have only one place for such
changes.
2022-09-27 11:33:41 +02:00
Maria Matejka dc28c6ed1c Simplified the protocol hookup code in Makefiles 2022-08-18 22:07:30 +02:00
Ondrej Zajicek 082905a833 Merge branch 'master' into backport 2022-07-27 00:47:24 +02:00
Ondrej Zajicek 534d0a4b44 KRT: Scan routing tables separetely on linux to avoid congestion
Remove compile-time sysdep option CONFIG_ALL_TABLES_AT_ONCE, replace it
with runtime ability to run either separate table scans or shared scan.

On Linux, use separate table scans by default when the netlink socket
option NETLINK_GET_STRICT_CHK is available, but retreat to shared scan
when it fails.

Running separate table scans has advantages where some routing tables are
managed independently, e.g. when multiple routing daemons are running on
the same machine, as kernel routing table modification performance is
significantly reduced when the table is modified while it is being
scanned.

Thanks Daniel Gröber for the original patch and Toke Høiland-Jørgensen
for suggestions.
2022-07-24 02:15:20 +02:00
Maria Matejka 2e5bfeb73a Merge remote-tracking branch 'origin/master' into backport 2022-07-11 11:08:10 +02:00
Maria Matejka d429bc5c84 Merge commit 'beb5f78a' into backport 2022-07-11 10:41:17 +02:00
Maria Matejka 7e9cede1fd Merge version 2.0.10 into backport 2022-07-10 14:19:24 +02:00
Ondrej Zajicek (work) 946cedfcfe Filter: Implement soft scopes
Soft scopes are anonymous scopes that most likely do not contain any
symbol, so allocating regular scope is postponed when it is really
needed.
2022-06-27 21:13:31 +02:00
Maria Matejka beb5f78ada Preexport callback now takes the channel instead of protocol as argument
Passing protocol to preexport was in fact a historical relic from the
old times when channels weren't a thing. Refactoring that to match
current extensibility needs.
2022-06-27 19:04:24 +02:00
Ondrej Zajicek f39e9aa203 IO: Improve resolution of latency debugging messages 2022-06-04 17:54:08 +02:00
Maria Matejka 9eec503b25 Fixed a munmap abort bug
When BIRD was munmapping too many pages, it sometimes aborted, saying
that munmap failed with "Not enough memory" as the address space was
getting more and more fragmented.

There is a workaround in place, simply keeping that page for future use,
yet it has never been compiled in because I somehow forgot to include
errno.h. And because I also thought that somebody may have ENOMEM not
defined (why?!), there was a check which quietly omitted that
workaround.

Anyway, ENOMEM is POSIX. It's an utter nonsense to check for its
existence. If it doesn't exist, something is broken.
2022-04-13 11:36:54 +02:00
Maria Matejka 4a23ede2b0 Protocols have their own explicit init routines 2022-04-06 18:14:08 +02:00
Maria Matejka 4e60b3ee72 Fixed a static assert in page allocator 2022-03-09 13:28:03 +01:00
Maria Matejka 19e727a248 Merge commit '60880b539b8886f76961125d89a265c6e1112b7a' into haugesund 2022-03-09 11:29:56 +01:00
Maria Matejka 24773af9e0 Merge commit 'e42eedb9' into haugesund 2022-03-09 11:02:55 +01:00
Maria Matejka 83d9920f90 Merge commit '5cff1d5f' into haugesund
Conflicts:
      proto/bgp/attrs.c
      proto/pipe/pipe.c
2022-03-09 10:56:06 +01:00
Maria Matejka ff47cd80dd Merge commit 'd5a32563' into haugesund 2022-03-09 10:50:38 +01:00
Maria Matejka 9e60a1fbc3 Fixed resource initialization in unit tests 2022-03-09 10:30:42 +01:00
Maria Matejka eeec9ddbf2 Merge commit '0c59f7ff' into haugesund 2022-03-09 09:13:55 +01:00
Maria Matejka 0c59f7ff01 Revert "Bound allocated pages to resource pools with page caches to avoid unnecessary syscalls"
This reverts commit 7f0e598208.
2022-03-09 09:13:31 +01:00
Maria Matejka 1c7df2c240 Revert "Multipage allocation"
This reverts commit 6cd3771378.
2022-03-09 09:13:20 +01:00
Maria Matejka c78247f9b9 Single-threaded version of sark-branch memory page management 2022-03-09 09:10:44 +01:00
Maria Matejka 48bf1322aa Introducing an universal temporary linpool flushed after every task 2022-03-02 12:13:49 +01:00
Maria Matejka d071aca7aa Merge commit '2c13759136951ef0e70a3e3c2b2d3c9a387f7ed9' into haugesund 2022-03-02 10:01:44 +01:00
Ondrej Zajicek (work) 2fc8b4c4ba Alloc: Use posix_memalign() instead of aligned_alloc()
For compatibility with older systems use posix_memalign(). We can
switch to aligned_alloc() when we commit to C11 for multithreading.
2022-02-08 22:42:00 +01:00
Alexander Zubkov 87a02489f3 IO: Support nonlocal bind in socket interface
Add option to socket interface for nonlocal binding, i.e. binding to an
IP address that is not present on interfaces. This behaviour is enabled
when SKF_FREEBIND socket flag is set. For Linux systems, it is
implemented by IP_FREEBIND socket flag.

Minor changes done by commiter.
2022-01-08 19:02:31 +01:00
Maria Matejka 644e9ca94e Directly mapped pages are kept for future use if temporarily not needed 2021-11-24 19:42:52 +00:00
Maria Matejka e42eedb912 Kernel: Convert the rte-local attributes to extended attributes and flags to pflags 2021-10-13 19:09:04 +02:00
Maria Matejka 5cff1d5f02 Route: moved rte_src pointer from rta to rte
It is an auxiliary key in the routing table, not a route attribute.
2021-10-13 19:09:04 +02:00
Maria Matejka d5a32563df Preexport: No route modification, no linpool needed 2021-10-13 19:09:04 +02:00
Maria Matejka 541881bedf RIP fixup + dropping the tmp_attrs mechanism as obsolete 2021-10-13 19:09:04 +02:00
Maria Matejka eb937358c0 Preference moved to RTA and set explicitly in protocols 2021-10-13 19:09:04 +02:00
Maria Matejka 6cd3771378 Multipage allocation
We can also quite simply allocate bigger blocks. Anyway, we need these
blocks to be aligned to their size which needs one mmap() two times
bigger and then two munmap()s returning the unaligned parts.

The user can specify -B <N> on startup when <N> is the exponent of 2,
setting the block size to 2^N. On most systems, N is 12, anyway if you
know that your configuration is going to eat gigabytes of RAM, you are
almost forced to raise your block size as you may easily get into memory
fragmentation issues or you have to raise your maximum mapping count,
e.g. "sysctl vm.max_map_count=(number)".
2021-10-13 19:01:22 +02:00
Maria Matejka 3a31c3aad6 CLI socket accept() may also fail and should produce some message, not a coredump. 2021-10-13 19:00:36 +02:00
Maria Matejka 7f0e598208 Bound allocated pages to resource pools with page caches to avoid unnecessary syscalls 2021-09-10 18:13:50 +02:00
Maria Matejka 227e2d5541 Debug output uses local buffer to avoid clashes between threads. 2021-09-10 17:37:46 +02:00
Ondrej Zajicek (work) 47d92d8f9d Nest: Clean up main channel handling
Remove assumption that main channel is the only channel.
2021-09-10 17:32:05 +02:00
Ondrej Zajicek (work) f761be6b30 Nest: Clean up main channel handling
Remove assumption that main channel is the only channel.
2021-06-17 16:56:51 +02:00
Ondrej Zajicek (work) e5724f71d2 sysdep: Add wrapper to get random bytes - update
Simplify the code and fix an issue with getentropy() return value.
2021-06-06 16:26:06 +02:00
Toke Høiland-Jørgensen c48ebde5ce sysdep: Add wrapper to get random bytes
Add a wrapper function in sysdep to get random bytes, and required checks
in configure.ac to select how to do it. The configure script tries, in
order, getrandom(), getentropy() and reading from /dev/urandom.
2021-06-06 16:26:06 +02:00
Ondrej Zajicek (work) a2277975d7 Unix: Expand accepted ranges of iproute2 constants
We support 32bit table and realm/flow ids, we should also accept them as
constants.

Thanks to Patrick Hemmer for the bugreport.
2021-04-07 16:14:20 +02:00
Maria Matejka ff397df7ed Routing table is now a resource allocated from its own pool
This also fixes memory leaks from import/export tables being never
cleaned up and freed.
2021-03-30 21:56:08 +02:00
Maria Matejka 886dd92eee Slab: head now uses bitmask for used/free nodes info instead of lists
From now, there are no auxiliary pointers stored in the free slab nodes.
This led to strange debugging problems if use-after-free happened in
slab-allocated structures, especially if the structure's first member is
a next pointer.

This also reduces the memory needed by 1 pointer per allocated object.
OTOH, we now rely on pages being aligned to their size's multiple, which
is quite common anyway.
2021-03-25 16:47:48 +01:00
Ondrej Zajicek (work) 7be3af7fa6 Rate-limit scheduling of work-events
In general, events are code handling some some condition, which is
scheduled when such condition happened and executed independently from
I/O loop. Work-events are a subgroup of events that are scheduled
repeatedly until some (often significant) work is done (e.g. feeding
routes to protocol). All scheduled events are executed during each
I/O loop iteration.

Separate work-events from regular events to a separate queue and
rate limit their execution to a fixed number per I/O loop iteration.
That should prevent excess latency when many work-events are
scheduled at one time (e.g. simultaneous reload of many BGP sessions).
2021-03-12 15:35:56 +01:00
Vincent Bernat 714238716e BGP: Add support for BGP hostname capability
This is an implementation of draft-walton-bgp-hostname-capability-02.
It is implemented since quite some time for FRR and in datacenter, this
gives a nice output to avoid using IP addresses.

It is disabled by default. The hostname is retrieved from uname(2) and
can be overriden with "hostname" option. The domain name is never set
nor displayed.

Minor changes by committer.
2021-02-10 16:53:57 +01:00
Ondrej Zajicek (work) 2a8cc7259e Kernel: Do not check templates
So one can define kernel protocol template without channels.
For other protocols, it is either irrelevant or already done.

Thanks to Clemens Schrimpe for the bugreport.
2021-01-07 01:56:00 +01:00
Ondrej Zajicek (work) 62d57b9bdf Log: Fix locking during log reconfiguration
The log subsystem should be locked earlier, as default_log_list() may
internally manipulate with the current_log_list (if it is also a default
log list).
2020-11-25 15:15:13 +01:00
Ondrej Zajicek (work) 0ef082c51e Log: Reinitialize the static logging structures
The static logging structures are reused, we need to reinitialize them
otherwise add_tail() would fail in debug build. Reinitializing these
structures should be fine as the list they belong to is being
reinitialized on entry to the very same function.

Thanks to Andreas Rammhold and Mikael Magnusson for patches.
2020-11-25 15:04:34 +01:00