User Tools

Site Tools


system:annoyances

/var filesystem space

latest (in approximately chronological order):

# date -Iseconds && df /var && du -x /var | sort -bnr | head -n 12
2021-04-24T06:11:53+00:00
Filesystem            1K-blocks    Used Available Use% Mounted on
/dev/mapper/balug-var   6657552 6128292    252300  97% /var
6116556 /var
2102544 /var/lib
1233792 /var/log
1175268 /var/spool
1122516 /var/mail
1089844 /var/spool/exim4
732660  /var/lib/fail2ban
645068  /var/spool/exim4/input
616584  /var/lib/clamav
541756  /var/log/apache2
475888  /var/log/exim4
369448  /var/spool/exim4/msglog
# 

2021-03-20 or thereabout - had already increased the VM (virtual) drive from 16 GiB to 20 GiB, in significant part to handle more space on /var (and anticipated space on /usr) … but not necessarily exclusively for that: disk is cheap, don't be stingy ... oops! ;-)
Analysis / divide and conquer … where's most of the space most recently getting newly sucked up? From slightly earlier we have
from around 2021-04-23 202-04-24T04:40:57Z:

# du -x /var | sort -bnr | head -n 12
5866812 /var
2102592 /var/lib
1249060 /var/log
1017624 /var/spool
1015100 /var/mail
932200  /var/spool/exim4
731780  /var/lib/fail2ban
616584  /var/lib/clamav
531456  /var/log/apache2
505952  /var/log/exim4
491980  /var/spool/exim4/input
364892  /var/spool/exim4/msglog
# 

and analysing …:

differences in approximate order, generally first by largest size of most specific, then lesser sizes of rather specific, then not so specific by size
153088 /var/spool/exim4/input
157644 /var/spool/exim4
157644 /var/spool   
107416 /var/mail
30064 /var/log/exim4
15268 /var/log
10300 /var/log/apache2
4556 /var/spool/exim4/msglog
880 /var/lib/fail2ban
48 /var/lib
249744 /var

So, most likely getting chewed up by spam and processing thereof - no significant surprise on that.
Peeking closer, looks like crud spam is somehow making into queue … then just getting stuck there - and presumably eventually failling.
For the moment, temporarily stopping exim4.
Maybe has something to do with recent bits adding mailman3 and related … or perhaps has nothing to do with that and might just be coincidental on the (approximate) timing. The times do rather correlate … but correlation isn't necessarily causation.
In any case, would appear at least some important bits of the exim4 configuration aren't correct or are no longer correct/functional.
This may also have to do with use of eximconfig and it being quite out-of-date and unsupported (may have been a good idea at some point in the past, but has currently outlived most any direct usefulness, and may even now be more problematic than useful). That might also be exacerbated by some of the "anti-spam" services it uses, … some of which may have even been taken over by spammers by this point in time.

So … slightly earlier, set up a tmp.balug.org VM, to work on configuration of mailman3, etc Debian 10 with exim4 and mailman, etc. So, …let's slightly repurpose that VM. Earlier was intended as a testbed to get all the exim4/mailman3/mailman stuff worked out, and with clean exim4 config, and without the obsoleted unsupported eximconfig, and get that working, and then - with suitable changes for IPs, domains, lists, etc., migrate that over to the balug VM. Well, lets just amend the purpose to prioritize clean functioning exim4 config that plays nice with mailman (mailman2), and vice versa - and worry about the mailman3 bits later (lower priority presently).

And for now, will still leave exim4 stopped on the balug VM - probably more harm than good to run it presently, and having it off for several hours or so, not a huge deal. 72+ hours, however, would be a big deal, so, … hopefully maybe have this "all better" in … oh, 12 hours, or maybe way less? Shall see.

tmp.balug.org VM … repurposing …

  • ripping out the mailman3 stuff for now (much of it wasn't yet fully configured anyway - mostly just complicates things at the moment)
  • … dang, 1 GiB of (virtual) RAM, and … fork failed due to out of memory … ugh, don't really want to give the VM more RAM … but it has zero swap, so …
  • don't have LVM on this tmp.balug.org VM … oh well, whatever, swap done as file - "good enough" for now.
  • installed mailman
  • purged mailman3 packages & related, including autoremove
  • exim4 only listening on localhost - good enough for now
  • hostname isn't set properly … corrected … & rebooted
  • DNS: stripping the IPv6 (routable & related) bits out for now (haven't got all the IPv6 routing set up - don't need it presently for these tests) … removed the AAAA records, but left the rest alone (presumably will reintroduce those AAAA records in future, PTR will also help track what it's intended that IP will get used for)
  • removed Internet routable IPv6 address(es)
  • renamed the VM to tmp.balug.org for better consistency (and likewise it's storage file to tmp.balug.org.sda)
  • purged these packages (redundant - have 'em on the balug VM and more accessible there): exim4-doc-html exim4-doc-info
  • exim4 has a very basic local-only config, let's see if we can fairly easily reconfigure that (at least a bit) for Internet …
    # DEBIAN_PRIORITY=medium dpkg-reconfigure exim4-config
  • That should now be enough for basic (mostly) functional Internet email sending/receiving … but we may generally fail much on outbound due to lack of established reputation on the IPv4 and/or missing "reverse" DNS for IPv4
    mx-tmp.balug.org. 300 IN A 96.86.170.228
    tmp.balug.org. 300 IN A 96.86.170.228
    tmp.balug.org. 300 IN MX 0 mx-tmp.balug.org.
    tmp.balug.org. 300 IN TXT "v=spf1 ip4:96.86.170.228 ip6:2001:470:1f05:19e::f"
    228.170.86.96.in-addr.arpa. 3600 IN PTR 96-86-170-228-static.hfc.comcastbusiness.net.
    Actually, do have a "reverse" IPv4, so that might be "good enough".
    Also, don't have mailman configured yet, so that won't work … yet.
  • basic test from Internet to @tmp.balug.org worked, lots more to test …
  • postmaster@tmp.balug.org to Internet worked … can the incoming defend against some of the more egregious spam attempts? …
  • rejects relay attempt
  • also rejects relay attempt when spoofing VM host - even when "reverse" DNS also spoofs same localhost name, so that's good, and looks like it may already be somewhat better than on that balug VM (looks like spam was making past that and getting snaged/rejected later in the process - but at huge cost to filesystem space and logs and such). So, now "just" get mailman (mailman2) working "well enough", and should have a base model template that can be applied to balug to improve relative to current state … though quite a bit more anti-spam should also be added to reject a bunch of other crud too.
  • So, mailman (mailman2) - need that to be able to handle using pipes in aliases … that's configured on the balug VM, but not tmp.balug.org VM … it's somewhere in the config … but not trivial to find and nail down exactly where, so … resuming working on that … will try to more directly compare the two configs between the two hosts, … and see where that may be buried - there's a lot 'o config stuff for exim4.

Relevant difference should be somewhere in …

$ diff <(ssh -anx -l root -o BatchMode=yes tmp.balug.org. 'cat /var/lib/exim4/config.autogenerated') <(ssh -4anx -l mpaoli -o BatchMode=yes balug-sf-lug-v2.balug.org. 'sudo cat /var/lib/exim4/config.autogenerated') | wc -l
124
$ 

And … crud, doesn't appear to be in there. But, … appears, in whole or (large) part, … balug VM may mostly be running what's based on a much older configuration - so that may be complicating things more / in addition to the old eximconfig stuff:

34c37
< MAIN_PACKAGE_VERSION=4.92-8+deb10u5
---
> MAIN_PACKAGE_VERSION=4.84.2-2+deb8u3

So, that may also explain why I'm not easily finding the pipe configuration on the balug VM - as it's configuration may be too different/older, for me to be able to find bit relevant bit that would match (notably configuration variables) from the tmp.balug.org VM.
So … let's go back to just more directly trying to enable it on tmp.balug.org VM, without consulting the config on the balug VM to try to determine how to do that.
Looks like /usr/share/doc/mailman/README.Exim4.Debian.gz gives pretty good instructions/outline to configure mailman (mailman2) to work with exim4, and how to get the alias bits and such working. Let's try implementing that. Already created one list named mailman … but might be simpler to rip that out, and recreate it after the other config bits are in place.
balug VM already has in /etc/mailman/mm_cfg.py:

MTA = 'Postfix'
POSTFIX_ALIAS_CMD = '/bin/true'
POSTFIX_MAP_CMD = 'chgrp Debian-exim'
POSTFIX_STYLE_VIRTUAL_DOMAINS = [ 'lists.balug.org', 'temp.balug.org' ]
POSTFIX_MAP_CMD = 'chmod o+r'

So, for tmp.balug.org VM we do:

MTA = 'Postfix'
POSTFIX_ALIAS_CMD = '/bin/true'
POSTFIX_MAP_CMD = 'chmod o+r'
POSTFIX_STYLE_VIRTUAL_DOMAINS = [ 'tmp.balug.org' ]

And let's also fix up on balug VM (POSTFIX_MAP_CMD probably shouldn't be in there twice - probably doesn't hurt as the last probably overrides … but just once would be more clear for the humans … also, temp.balug.org is long since obsolete and should be removed)

MTA = 'Postfix'
POSTFIX_ALIAS_CMD = '/bin/true'
POSTFIX_MAP_CMD = 'chmod o+r'
POSTFIX_STYLE_VIRTUAL_DOMAINS = [ 'lists.balug.org' ]

The documented instructions/example uses exim4 split configuration, … so, we change that to split configuration … # DEBIAN_PRIORITY=medium dpkg-reconfigure exim4-config
So, … pipe alias stuff still not quite working. And "of course", we already see evidence of miscreants and their bots:

# hostname; tail -n 1 rejectlog
tmp.balug.org
2021-04-24 18:04:25 rejected EHLO from [122.228.19.80]: syntactically invalid argument(s): []
# 

pipe alias bits …

#  tail -n 1 mainlog
2021-04-24 19:53:02 1laOKs-0002Km-9R == |/var/lib/mailman/mail/mailman owner mailman <mailman-owner@tmp.balug.org> R=system_aliases defer (-30): pipe_transport unset in system_aliases router
# 

added:

# cat /etc/exim4/conf.d/main/000_localmacros
SYSTEM_ALIASES_PIPE_TRANSPORT = address_pipe
# 

That's closer, but still not quite there:

The following text was generated during the delivery attempt:

------ pipe to |/var/lib/mailman/mail/mailman owner mailman
       generated by mailman-owner@tmp.balug.org ------

Group mismatch error.  Mailman expected the mail
wrapper script to be executed as group "daemon", but
the system's mail server executed the mail script as
group "Debian-exim".  Try tweaking the mail server to run the
script as group "daemon", or re-run configure, 
providing the command line option `--with-mail-gid=Debian-exim'.

… and finally working with:

# cat /etc/exim4/conf.d/main/000_localmacros
SYSTEM_ALIASES_PIPE_TRANSPORT = address_pipe
SYSTEM_ALIASES_GROUP = daemon
# 

So … that's pretty good. Should add a wee bit more before using that as template for the balug VM:

  • put TLS cert in place for STARTTLS
  • reconfigure exim4 to limit what source IPs it uses (listening on all is fine, but sending to Internet should only use IPs properly set up with SPF & "reverse" DNS)

Configured the tmp.balug.org VM with proper cert (already had suitable matching wildcard cert) and working STARTTLS using that cert,
also configured the tmp.balug.org VM to use only the specified source IP addresses.
So, that mostly should be a good "template" for the balug VM … with, 'of course' suitable specific config changes to be made for the balug VM.
Also, before restart of exim4 on the balug VM, should clear out the massive pile of crud that's in the queue … but without clobbering any legitimate stuff that may still be in there. So, let's move on to cleaning that up first, before making the other configuration changes to the balug VM.
So, analyzing what's in queue, we have:

  19909 <>
  16089 <error@balug.org>
     92 <balug-admin-bounces@lists.balug.org>
     72 <balug-talk-bounces@lists.balug.org>
     64 <balug-test-bounces@lists.balug.org>
     59 <balug-announce-bounces@lists.balug.org>

Those are envelope FROM addresses, by count, where, the empty <> FROM are bounce messages.
So, … thee top two can be dropped as, at best unimportant (and probably mostly crud … spam and backscatter thereof),
the rest probably deserve some (semi-)manual closer inspection - many may be deferred bits on envelope TO addresses with issues, and hence hanging around for redelivery attempt(s).
Bounce messages in queue, we've got (again by count):

  14697 <> *** frozen *** error@balug.org
   5212 <> *** frozen *** error@balug.org D balug@balug.org

So those can all go (simplified/consolidated queue listing down to 1 line per item in queue).
Checked further on the bounce (envelope FROM <>) messages - improbable there's anything legitimate/important there … removed them (19909 messages) from queue.
Likewise checked on the queued mail with envelope FROM <error@balug.org> - improbable there's anything legitimate/important there … removed them (16089 messages) from queue.
The few hundred or so messages remaining in queue … (mostly) legitimate? Let's have a look … and checked a fair randomized sample, they look to be mostly to entirely legitimate.
So, to do, remains approximately, notably for balug VM:

  • (done) "replace" / merge in applicable configuration bits (most notably for exim4)
  • (done) before restarting exim4, let's temporary extent the queue timeout - since we've now had exim4 down for some fair bit (around 24 hours or more).
  • (done) oh … should resize the queue directories for efficiency/space (on most *nix filesystem types, directories grow, but never shrink).
  • (done) restart exim4
  • (done) send out relevant follow-up list postings to BALUG-Admin and BALUG-Talk

There's then still further anti-spam stuff, etc. to do, but that should be "better enough" to reenable exim4 service.
So … resizing of those directories …

# du -sx /var/spool/exim4/input
9036    /var/spool/exim4/input
# df -h /var
Filesystem             Size  Used Avail Use% Mounted on
/dev/mapper/balug-var  6.4G  4.9G  1.3G  80% /var
# 

That's way the heck better, but still, directories to resize …:

/var/spool/exim4/input
 92 0   96 6   96 C  108 I  116 O  140 U   80 a  120 g   96 m  132 s  108 y
108 1   92 7  100 D  100 J   84 P   84 V  100 b   84 h   80 n  104 t  116 z
100 2   84 8   92 E   84 K  108 Q  120 W  124 c  140 i   88 o  104 u
136 3  104 9  132 F  156 L  116 R   88 X  108 d  116 j  100 p  108 v
120 4   96 A   80 G  116 M  124 S   80 Y   96 e   88 k   84 q  124 w
 88 5  128 B  100 H   88 N   96 T  104 Z  108 f  108 l  128 r  120 x
# cd /var/spool/exim4/
# mktemp -d /var/spool/exim4/input.tmp.XXXXXXXXXX
/var/spool/exim4/input.tmp.r6J0lcwTTJ
# (cd input && umask 077 && find . -xdev -depth -print0 | pax -rw -0dl -p e /var/spool/exim4/input.tmp.r6J0lcwTTJ/)
# mv input input.BAK && mv input.tmp.r6J0lcwTTJ input
# (cd input && pwd -P && ls -sd *)
/var/spool/exim4/input
4 0  4 4  4 8  4 C  4 G  4 K  4 O  4 S  4 W  4 a  4 e  4 i  4 m  4 q  4 u  4 y
4 1  4 5  4 9  4 D  4 H  4 L  4 P  4 T  4 X  4 b  4 f  4 j  4 n  4 r  4 v  4 z
4 2  4 6  4 A  4 E  4 I  4 M  4 Q  4 U  4 Y  4 c  4 g  4 k  4 o  4 s  4 w
4 3  4 7  4 B  4 F  4 J  4 N  4 R  4 V  4 Z  4 d  4 h  4 l  4 p  4 t  4 x
# find input.BAK -type f -links +1 -exec rm \{\} \;
# rmdir input.BAK/*/ input.BAK

"replace" / merge in applicable configuration bits (most notably for exim4) …:

// Let's set aside the old ...:
# hostname && pwd && ls -d exim*
balug-sf-lug-v2.balug.org
/etc
exim4  exim4.original.pax.xz
# mv exim4 exim4.2021-04-25
# mv /var/lib/exim4/config.autogenerated /var/lib/exim4/config.autogenerated.2021-04-25
# cp -p /etc/mailman/mm_cfg.py /etc/mailman/mm_cfg.py.2021-04-25
# 
// and, we bring over archive of relevant config bits from the tmp.balug.org VM,
// and we'll first extract these with a .new suffix a top level relevant directory/file, and we'll leave it at .new until suitably adjusted to move in place.

$ ssh -anx -l root -o BatchMode=yes tmp.balug.org. 'cd / && umask 022 && tar -cf - etc/mailman/mm_cfg.py etc/exim4 var/lib/exim4/config.autogenerated etc/letsencrypt/live/exim4 | gzip -9' | ssh -4ax -l mpaoli -o BatchMode=yes balug-sf-lug-v2.balug.org. 'umask 077 && cat >$(mktemp /var/tmp/tmp.exim.XXXXXXXXXX.tar.gz)'

# hostname && chown 0:0 /var/tmp/tmp.exim.xBND4qYntI.tar.gz
balug-sf-lug-v2.balug.org
# ls -ld /etc/letsencrypt/live/exim4
ls: cannot access '/etc/letsencrypt/live/exim4': No such file or directory
# </var/tmp/tmp.exim.xBND4qYntI.tar.gz gzip -d | (cd / && tar -xpf - etc/letsencrypt)
# ls -ld /etc/letsencrypt/live/exim4
lrwxrwxrwx 1 root root 15 Apr 24 20:28 /etc/letsencrypt/live/exim4 -> lists.balug.org
# ls -lLd /etc/letsencrypt/live/exim4/*
-r--r--r--  1 root root        1919 Mar 31 09:11 /etc/letsencrypt/live/exim4/cert.pem
-r--r--r-- 12 root root        1586 Jan  3 23:11 /etc/letsencrypt/live/exim4/chain.pem
-r--r--r--  1 root root        3505 Mar 31 09:11 /etc/letsencrypt/live/exim4/fullchain.pem
-r--r-----  1 root Debian-exim 1708 Mar 31 09:11 /etc/letsencrypt/live/exim4/privkey.pem
# 
// and that is the appropriate cert for email/exim4 for the balug VM, so no changes needed on that bit
# mkdir /var/.new
# cd /var/.new
# </var/tmp/tmp.exim.xBND4qYntI.tar.gz gzip -d | tar -xpf - var/lib/exim4/config.autogenerated
# ls -ld /var/lib/exim4/config.autogenerated* var/lib/exim4/config.autogenerated
-rw-r--r-- 1 root Debian-exim 24252 Jun 12  2017 /var/lib/exim4/config.autogenerated.2021-04-25
-rw-r--r-- 1 root Debian-exim 27552 Apr 25 03:52 var/lib/exim4/config.autogenerated
# mv var/lib/exim4/config.autogenerated /var/lib/exim4/config.autogenerated.new
# cd
# find /var/.new -depth -type d -exec rmdir \{\} \;
# mkdir /etc/.new
# cd /etc/.new
# </var/tmp/tmp.exim.xBND4qYntI.tar.gz gzip -d | tar -xpf - etc/exim4
# ls -d /etc/exim4.new
ls: cannot access '/etc/exim4.new': No such file or directory
# ls -ld etc/exim4
drwxr-xr-x 3 root root 1024 Apr 24 19:03 etc/exim4
# mv etc/exim4 /etc/exim4.new
# cd
# find /etc/.new -depth -type d -exec rmdir \{\} \;
# cp -p /etc/mailman/mm_cfg.py /etc/mailman/mm_cfg.py.new
# </var/tmp/tmp.exim.xBND4qYntI.tar.gz gzip -d | tar -O -xf - etc/mailman/mm_cfg.py > /etc/mailman/mm_cfg.py.new
# pwd
/root
# vi /etc/mailman/mm_cfg.py.new
// ...
# mv /etc/mailman/mm_cfg.py.new /etc/mailman/mm_cfg.py
# diff /etc/mailman/mm_cfg.py.2021-04-25 /etc/mailman/mm_cfg.py | sed -e 's/SECRET = '\''[^'\'']*'\''/SECRET = '\''[REDACTED]'\''/'
73,74c73
< add_virtualhost('temp.balug.org', 'temp.balug.org')
< #add_virtualhost('lists.balug.org', 'lists.balug.org')
---
> # add_virtualhost('temp.balug.org', 'temp.balug.org')
85,86c84
< # Unset send_reminders on newly created lists
< #DEFAULT_SEND_REMINDERS = 0
---
> # set send_reminders on newly created lists
88a87,101
> # If the following is set to a non-empty string, this string in combination
> # with the time, list name and the IP address of the requestor is used to
> # create a hidden hash as part of the subscribe form on the listinfo page.
> # This hash is checked upon form submission and the subscribe fails if it
> # doesn't match.  I.e. the form posted must be first retrieved from the
> # listinfo CGI by the same IP that posts it.  The subscribe also fails if
> # the time the form was retrieved is more than the above FORM_LIFETIME or less
> # than the below SUBSCRIBE_FORM_MIN_TIME before submission.
> # Important: If you have any static subscribe forms on your web site, setting
> # this option will break them.  With this option set, subscribe forms must be
> # dynamically generated to include the hidden data.  See the code block
> # beginning with "if mm_cfg.SUBSCRIBE_FORM_SECRET:" in Mailman/Cgi/listinfo.py
> # for the details of the hidden data.
> SUBSCRIBE_FORM_SECRET = '[REDACTED]'
>
98,101d110
< MTA = 'Postfix'
< POSTFIX_ALIAS_CMD = '/bin/true'
< POSTFIX_MAP_CMD = 'chmod o+r'
< POSTFIX_STYLE_VIRTUAL_DOMAINS = [ 'lists.balug.org' ]
122a132,141
> # ***** START bits per /usr/share/doc/mailman/README.Exim4.Debian.gz *****
> # And yes, the "Postfix" there is on purpose, it should not be replaced
> # by "exim4". It causes mailman to (among others) create a list of
> # mailman lists, including what virtual domain they should be in. That
> # is the information that is used here; the rest is ignored.
> MTA = 'Postfix'
> POSTFIX_ALIAS_CMD = '/bin/true'
> POSTFIX_MAP_CMD = 'chmod o+r'
> POSTFIX_STYLE_VIRTUAL_DOMAINS = [ 'lists.balug.org' ]
> # ***** END bits per /usr/share/doc/mailman/README.Exim4.Debian.gz *****
# systemctl stop mailman.service
# systemctl start mailman.service
# cd /etc/exim4.new
// # vi ...
# mv /etc/exim4.new /etc/exim4
# DEBIAN_PRIORITY=medium dpkg-reconfigure exim4-config
// apparently the only bit that changed:
# pwd -P && diff ../exim4.BAK/update-exim4.conf.conf update-exim4.conf.conf
/etc/exim4
20c20
< dc_other_hostnames=''
---
> dc_other_hostnames='balug.org; lists.balug.org'
# 
// temporarily increase max queue time from 4 days to 7 days:
# awk '{if($1~/^[^#]/||$1~/^#\*/||$0~/^# temp/)print;}' conf.d/retry/30_exim4-config
#*                      *           F,2h,15m; G,16h,1h,1.5; F,4d,6h
# temporarily up to 7 days:
*                      *           F,2h,15m; G,16h,1h,1.5; F,7d,6h
# 
// Theoretically should be good to go now ... let's start exim4 for a little bit, ... then stop it and look at logs, to see if things seem to be going okay.
# systemctl enable exim4.service
# systemctl start exim4.service && { sleep 180; systemctl stop exim4.service; }
# 
// checking over logs ... rejectlog looks good (lots of legitimate rejects in 3 minutes, no false positives)
// mainlog mostly looks good and as expected - only particularly bits that didn't seem as expected:
Berkeley DB error: BDB0058 page 19818: illegal page type or format
Berkeley DB error: BDB0060 PANIC: fatal region error detected; run recovery
Berkeley DB error: BDB0061 PANIC: Invalid argument
Berkeley DB error: BDB1581 File handles still open at environment close
Berkeley DB error: BDB1582 Open file handle: /var/spool/exim4/db/retry
// used db_recover
# systemctl start exim4.service
// still getting Berkeley DB error diagnostics
// stopped exim4, did a dump & (re)load of DB (with db_dump & db_load), restarted exim4 ... seems to be running okay now without those Berkeley DB errors

Analyzed the mail queue again. Found one more abuser with a bunch 'o queued mail.
That particular abuser had 332 queued mail messages - all of which were subscription requests that been processed - but not confirmed, for the same email address and all from the same IPv4 address. All the queued emails were confirmation emails - emails to that email address to get confirmation of the subscription request. The email domain appears legitimate, but the IP address dubious at best (no reverse DNS, etc.)
Anyway, removed those 332 queued email messages … that then dropped the queue to only 20 remaining queued messages - all of which appear legitimate.
Analyzed logs further, notably for web and email traffic/attempts. Looks like most all that problematic email was from bad web bots repeatedly and voluminously subscribing (well, attempting to subscribe) that, and one other email address, to BALUG's various lists - causing confirmation emails to be queued. Looks like two such emails got delivered, but all (or almost all?) of the others got deferred by the receiving MTAS (there were only 2 email addresses). So, perhaps bad bot trying to do DoS/DDoS against those two target emails? Could potentially block the IP address but … whack-a-mole - would likely just pop up on another IP.
Checked the mail queue again - after subtracting out target addresses that have already been successfully delivered to, there remain at the moment only 6 unique email addresses presently showing any delivery issues.

More anti-spam to do … SPF … looks like config files can have that enabled …

conf.d/acl/30_exim4-config_check_rcpt
  # This is quite costly in terms of DNS lookups (~6 lookups per mail).  Do not
  # enable if that's an issue.  Also note that if you enable this, you must
  # install "spf-tools-perl" which provides the spfquery command.
  # Missing spf-tools-perl will trigger the "Unexpected error in
  # SPF check" warning.
  .ifdef CHECK_RCPT_SPF
  deny
    message = [SPF] $sender_host_address is not allowed to send mail from \
              ${if def:sender_address_domain {$sender_address_domain}{$sender_helo_name}}.  \
              Please see \
              http://www.openspf.org/Why?scope=${if def:sender_address_domain \

$ dpkg -l spf-tools-perl | grep '^ii '
ii  spf-tools-perl 2.9.0-4      all          SPF tools (spfquery, spfd) based on the Mail::SPF Perl module
$ nc -z www.openspf.org. 80
nc: unable to connect to address www.openspf.org., service 80
$ nc -z www.openspf.org. 443
nc: unable to connect to address www.openspf.org., service 443
$ 

So, is spf-tools-perl still applicable, or is it just the diagnostic that's out-of-date referring to a service that's no longer (at least pesently)
reachable?

$ dpkg -L spf-tools-perl | sort | grep -e bin/ -e '/man/.*spf'
/usr/bin/spfquery.mail-spf-perl
/usr/sbin/spfd.mail-spf-perl
/usr/share/man/man1/spfquery.mail-spf-perl.1p.gz
/usr/share/man/man8/spfd.mail-spf-perl.8p.gz
$ man spfquery
...
$ spfquery --scope mfrom --identity balug.org --ip-address $(dig +short balug.org. A)
pass
balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)
balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)
Received-SPF: pass (balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=balug.org; client-ip=96.86.170.229
$ echo $?
0
$ spfquery --scope mfrom --identity balug.org --ip-address 8.8.8.8; echo $?
neutral
balug.org: Default neutral result due to no mechanism matches
balug.org: Default neutral result due to no mechanism matches
Received-SPF: neutral (balug.org: Default neutral result due to no mechanism matches) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=balug.org; client-ip=8.8.8.8
3
$ 
neutral ? - are we missing something that ought say that should fail???
Anyway, looks like spfquery probably works fine, but the web site may be no longer available (DDoS from spammers, or ???).

$ spfquery --scope mfrom --identity lists.balug.org --ip-address $(dig +short balug.org. A)
pass
lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)
lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)
Received-SPF: pass (lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=lists.balug.org; client-ip=96.86.170.229
$ spfquery --scope mfrom --identity lists.balug.org --ip-address 8.8.8.8
neutral
lists.balug.org: Default neutral result due to no mechanism matches
lists.balug.org: Default neutral result due to no mechanism matches
Received-SPF: neutral (lists.balug.org: Default neutral result due to no mechanism matches) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=lists.balug.org; client-ip=8.8.8.8
$ 

Again with the neutral.  Those ought be hard fail.
... Ah ...:
balug.org. IN TXT "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2"
We're missing the -all at the end.
Should check all our SPF records, and fix as appropriate.
Should probably also add spf version 2, but first things first ...
So ... we have ...:
balug.org.              600     IN      SPF     "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2"
balug.org.              600     IN      TXT     "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2"
tmp.balug.org.          300     IN      TXT     "v=spf1 ip4:96.86.170.228 ip6:2001:470:1f05:19e::f"
lists.balug.org.        600     IN      SPF     "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2"
lists.balug.org.        600     IN      TXT     "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2"

berkeleylug.com.        172800  IN      SPF     "v=spf1 -all"
berkeleylug.com.        172800  IN      TXT     "v=spf1 -all"
sf-lug.com.             172800  IN      SPF     "v=spf1 -all"
sf-lug.com.             172800  IN      TXT     "v=spf1 -all"
sf-lug.net.             172800  IN      SPF     "v=spf1 -all"
sf-lug.net.             172800  IN      TXT     "v=spf1 -all"
sflug.com.              172800  IN      SPF     "v=spf1 -all"
sflug.com.              172800  IN      TXT     "v=spf1 -all"
sflug.net.              172800  IN      SPF     "v=spf1 -all"
sflug.net.              172800  IN      TXT     "v=spf1 -all"
sflug.org.              86400   IN      SPF     "v=spf1 -all"
sflug.org.              86400   IN      TXT     "v=spf1 -all"
We should:
remove the RRs of type SPF (superseded/obsoleted, per RFC(s))
add trailing " -all" for those that don't have it
Our active sending TTLs look rather short, should probably nudge 'em up to ... 3600 or so? ... at least after they're tested out okay.
And after updating, we have:
balug.org.              3600    IN      TXT     "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all"
lists.balug.org.        3600    IN      TXT     "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all"
tmp.balug.org.          3600    IN      TXT     "v=spf1 ip4:96.86.170.228 ip6:2001:470:1f05:19e::f -all"
berkeleylug.com.        172800  IN      TXT     "v=spf1 -all"
sf-lug.com.             172800  IN      TXT     "v=spf1 -all"
sf-lug.net.             172800  IN      TXT     "v=spf1 -all"
sflug.com.              172800  IN      TXT     "v=spf1 -all"
sflug.net.              172800  IN      TXT     "v=spf1 -all"
sflug.org.              86400   IN      TXT     "v=spf1 -all"
So ... that now looks better.
And let's do a little retest on our earlier:
$ spfquery --scope mfrom --identity balug.org --ip-address $(dig +short balug.org. A); echo "$?"
pass
balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)
balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)
Received-SPF: pass (balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=balug.org; client-ip=96.86.170.229
0
$ spfquery --scope mfrom --identity lists.balug.org --ip-address $(dig +short balug.org. A); echo "$?"
pass
lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)
lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)
Received-SPF: pass (lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=lists.balug.org; client-ip=96.86.170.229
0
$ spfquery --scope mfrom --identity balug.org --ip-address 8.8.8.8; echo "$?"
fail
Please see http://www.openspf.org/Why?s=mfrom;id=balug.org;ip=8.8.8.8;r=balug-sf-lug-v2.balug.org
balug.org: Sender is not authorized by default to use 'balug.org' in 'mfrom' identity (mechanism '-all' matched)
Received-SPF: fail (balug.org: Sender is not authorized by default to use 'balug.org' in 'mfrom' identity (mechanism '-all' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=balug.org; client-ip=8.8.8.8
1
$ spfquery --scope mfrom --identity lists.balug.org --ip-address 8.8.8.8; echo "$?"
fail
Please see http://www.openspf.org/Why?s=mfrom;id=lists.balug.org;ip=8.8.8.8;r=balug-sf-lug-v2.balug.org
lists.balug.org: Sender is not authorized by default to use 'lists.balug.org' in 'mfrom' identity (mechanism '-all' matched)
Received-SPF: fail (lists.balug.org: Sender is not authorized by default to use 'lists.balug.org' in 'mfrom' identity (mechanism '-all' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=lists.balug.org; client-ip=8.8.8.8
1
$
So, that looks much better now.
wordpress also sends mail:
From www-data@balug.org Tue Apr 27 02:12:48 2021
From: WordPress <wordpress@berkeleylug.com>
So, @berkeleylug.com needs to be set up to send - and at least minimally receive, email (e.g. postmaster ...)
So, ... SPF first, as that has the longer TTL presently ...
from:
berkeleylug.com.        172800  IN      TXT     "v=spf1 -all"
to:
berkeleylug.com.        3600    IN      TXT     "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all"

And, added bit more for digitalwitness.org. and sf-lug.org. (latter of which thus far still uses @linuxmafia.com for mail), now have:
balug.org.              3600    IN      TXT     "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all"
lists.balug.org.        3600    IN      TXT     "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all"
tmp.balug.org.          3600    IN      TXT     "v=spf1 ip4:96.86.170.228 ip6:2001:470:1f05:19e::f -all"
berkeleylug.com.        3600    IN      TXT     "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all"
digitalwitness.org.     86400   IN      TXT     "v=spf1 -all"
sf-lug.com.             172800  IN      TXT     "v=spf1 -all"
sf-lug.net.             172800  IN      TXT     "v=spf1 -all"
sf-lug.org.             86400   IN      TXT     "v=spf1 -all"
sflug.com.              172800  IN      TXT     "v=spf1 -all"
sflug.net.              172800  IN      TXT     "v=spf1 -all"
sflug.org.              86400   IN      TXT     "v=spf1 -all"
SPF version 2 could be good/better ... but later, not a top priority.
So, let's look into enabling SPF checking upon receipt of incoming ...
I also noticed what looks like something about a daemon - which may be preferable for large volumes/streams of incoming ...
let's look at documentation bit more ...
$ man spfd.mail-spf-perl
$ systemctl list-unit-files | fgrep spf
$ 
So, nothin' in systemd unit files nor exim4 config that supports the spf daemon, so doing that would mean fair bit more manual configuring.
For now let's presume spfquery (non-daemonized) is quite "good enough" for now - we can change later if we need to.
So ... let's configure that ...
added ...:
# tail -n 1 conf.d/main/000_localmacros
CHECK_RCPT_SPF = true
# systemctl restart exim4.service
# That should be enough for that to now be operational - that should stop >> 50% of the incoming spam (attempts).  Should see results in logs
quite soon (if not already).
Not seeing an SPF failure in the logs ... quite yet.
Let's test something that should fail ...
Drats - test made it through, even though the config should'a rejected it.
Oh, let's also add berkeleylug.com to the email domains, so that should work.
# DEBIAN_PRIORITY=medium dpkg-reconfigure exim4-config
# systemctl start exim4.service
Let's try sending to postmaster@berkeleylug.com
and yes, that got delivered fine.
So ... why is SPF check not working?
# systemctl stop exim4.service
# ls -d /usr/*bin/*exim*conf*
/usr/sbin/update-exim4.conf  /usr/sbin/update-exim4.conf.template
# update-exim4.conf
# systemctl start exim4.service
SPF check still not working.
Wordpress email ... something to circle back on later.
For now, for header it uses:
From: WordPress <wordpress@berkeleylug.com>
Looks like the only bit of that that's easy to change is the domain.  Looks like it uses php mail.  There are plugins to change that, but
that's then more complications.  As for envelope, since it's using Apache, between that and exim, that ends up as:
MAIL FROM:<www-data@balug.org>
Again, not simple to change that.  More to circle back on for later.
For now, dropped in aliases for www-data and wordpress, so at least attempts to those - and for now at least, won't bounce at those domains if
attempted.  So, that should help deliverability (and, on the receiving side, probably some more spam for postmaster as I presently aliased those to
postmaster ... "good enough" for now).
Looks like the SPF checks are now working.
I also found an older spdf process running and killed that off - maybe that made the difference?
So, yes, and seeing SPF fail/rejects in the log e.g.:
# fgrep -ai spf rejectlog
2021-04-28 02:29:33 H=(sweja-se.mail.protection.outlook.com) [183.199.220.44] F=<oefydgodea@ottawa.ca> rejected RCPT <rsvp@balug.org>: SPF check failed.
2021-04-28 03:50:56 H=(smail1.vub.sk) [222.77.253.120] F=<jhylunrrhc@swebolt.se> rejected RCPT <rsvp@balug.org>: SPF check failed.
# dig +noall +answer +nottl ottawa.ca. TXT ottawa.ca. SPF swebolt.se. TXT swebolt.se. SPF | fgrep \"v=spf
ottawa.ca.              IN      TXT     "v=spf1 include:spf.protection.outlook.com include:_spf.esolutionsgroup.ca include:emsd1.com -all"
swebolt.se.             IN      TXT     "v=spf1 mx ip4:167.99.44.246 include:spf.protection.outlook.com a:smtp05.dgcsystems.net -all"
# spfquery --scope mfrom --id oefydgodea@ottawa.ca --ip 183.199.220.44; echo "$?"
fail
Please see http://www.openspf.org/Why?s=mfrom;id=oefydgodea%40ottawa.ca;ip=183.199.220.44;r=balug-sf-lug-v2.balug.org
ottawa.ca: Sender is not authorized by default to use 'oefydgodea@ottawa.ca' in 'mfrom' identity (mechanism '-all' matched)
Received-SPF: fail (ottawa.ca: Sender is not authorized by default to use 'oefydgodea@ottawa.ca' in 'mfrom' identity (mechanism '-all' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from="oefydgodea@ottawa.ca"; client-ip=183.199.220.44
1
# spfquery --scope mfrom --id jhylunrrhc@swebolt.se --ip 222.77.253.120; echo "$?"
fail
Please see http://www.openspf.org/Why?s=mfrom;id=jhylunrrhc%40swebolt.se;ip=222.77.253.120;r=balug-sf-lug-v2.balug.org
swebolt.se: Sender is not authorized by default to use 'jhylunrrhc@swebolt.se' in 'mfrom' identity (mechanism '-all' matched)
Received-SPF: fail (swebolt.se: Sender is not authorized by default to use 'jhylunrrhc@swebolt.se' in 'mfrom' identity (mechanism '-all' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from="jhylunrrhc@swebolt.se"; client-ip=222.77.253.120
1
# 
Wrote a handy little program to summarize the exim rejectlog failure from the most recent few such log files:
# Rejectlog_report
6313 Unrouteable address
1013 relay not permitted
8 SPF check failed
7 SMTP protocol synchronization error (input sent without waiting for greeting)
7 maximum allowed line length
3 unqualified address not permitted
1 SMTP protocol synchronization error (next input sent too soon: pipelining was not advertised)
1 missing or malformed local part
1 syntactically invalid
# 
Look at least the top couple items would be good candidates for adding configurations for fail2ban.
Some others beyond that may also be worth doing - but not as high a priority.
// reverted the temporarily increase of max queue time from 4 days to 7 days:
# awk '{if($1~/^[^#]/||$1~/^#\*/||$0~/^# temp/)print;}' conf.d/retry/30_exim4-config
*                      *           F,2h,15m; G,16h,1h,1.5; F,4d,6h
# systemctl reload exim4.service
# 
system/annoyances.txt · Last modified: 2021-05-06T06:05:20+0000 by michael_paoli