=== /var filesystem space === * 2021-04-21 or starting thereabouts, background:\\ [[https://lists.balug.org/pipermail/balug-talk/2021-April/000219.html|[BALUG-Talk] Possible solution of blankness or editing outline-only on BALUG DokuWiki?]]\\ [[https://lists.balug.org/pipermail/balug-talk/2021-April/000220.html|[BALUG-Talk] Possible solution of blankness or editing outline-only on BALUG DokuWiki?]]\\ [[https://lists.balug.org/pipermail/balug-talk/2021-April/000221.html|[BALUG-Talk] Possible solution of blankness or editing outline-only on BALUG DokuWiki?]]\\ [[https://lists.balug.org/pipermail/balug-talk/2021-April/000222.html|[BALUG-Talk] Possible solution of blankness or editing outline-only on BALUG DokuWiki?]]\\ [[https://lists.balug.org/pipermail/balug-talk/2021-April/000223.html|[BALUG-Talk] Possible solution of blankness or editing outline-only on BALUG DokuWiki?]]\\ [[https://lists.balug.org/pipermail/balug-talk/2021-April/000224.html|[BALUG-Talk] -->BALUG-Admin Re: Possible solution of blankness or editing outline-only on BALUG DokuWiki?]] * probable cause(s): email spam message, and logs/queues thereof, logs & related from various activity of miscreants (and their bots), e.g. bad we crawlers, authentication failure attempts, countermeasure & database tracking thereof (fail2ban), etc. latest (in approximately chronological order): # date -Iseconds && df /var && du -x /var | sort -bnr | head -n 12 2021-04-24T06:11:53+00:00 Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/balug-var 6657552 6128292 252300 97% /var 6116556 /var 2102544 /var/lib 1233792 /var/log 1175268 /var/spool 1122516 /var/mail 1089844 /var/spool/exim4 732660 /var/lib/fail2ban 645068 /var/spool/exim4/input 616584 /var/lib/clamav 541756 /var/log/apache2 475888 /var/log/exim4 369448 /var/spool/exim4/msglog # 2021-03-20 or thereabout - had already increased the VM (virtual) drive from 16 GiB to 20 GiB, in significant part to handle more space on /var (and anticipated space on /usr) ... but not necessarily exclusively for that: [[http://linuxmafia.com/pipermail/sf-lug/2021q1/015201.html|disk is cheap, don't be stingy ... oops! ;-)]]\\ Analysis / divide and conquer ... where's most of the space most recently getting newly sucked up? From slightly earlier we have\\ from around [[https://lists.balug.org/pipermail/balug-talk/2021-April/000223.html|2021-04-23 202-04-24T04:40:57Z]]: # du -x /var | sort -bnr | head -n 12 5866812 /var 2102592 /var/lib 1249060 /var/log 1017624 /var/spool 1015100 /var/mail 932200 /var/spool/exim4 731780 /var/lib/fail2ban 616584 /var/lib/clamav 531456 /var/log/apache2 505952 /var/log/exim4 491980 /var/spool/exim4/input 364892 /var/spool/exim4/msglog # and analysing ...:\\ differences in approximate order, generally first by largest size of most specific, then lesser sizes of rather specific, then not so specific by size 153088 /var/spool/exim4/input 157644 /var/spool/exim4 157644 /var/spool 107416 /var/mail 30064 /var/log/exim4 15268 /var/log 10300 /var/log/apache2 4556 /var/spool/exim4/msglog 880 /var/lib/fail2ban 48 /var/lib 249744 /var So, most likely getting chewed up by spam and processing thereof - no significant surprise on that.\\ Peeking closer, looks like crud spam is somehow making into queue ... then just getting stuck there - and presumably eventually failling.\\ For the moment, temporarily stopping exim4.\\ Maybe has something to do with recent bits adding mailman3 and related ... or perhaps has nothing to do with that and might just be coincidental on the (approximate) timing. The times do rather correlate ... but correlation isn't necessarily causation.\\ In any case, would appear at least some important bits of the exim4 configuration aren't correct or are no longer correct/functional.\\ This may also have to do with use of eximconfig and it being quite out-of-date and unsupported (may have been a good idea at some point in the past, but has currently outlived most any direct usefulness, and may even now be more problematic than useful). That might also be exacerbated by some of the "anti-spam" services it uses, ... some of which may have even been taken over by spammers by this point in time. So ... slightly earlier, set up a tmp.balug.org VM, to work on configuration of mailman3, etc Debian 10 with exim4 and mailman, etc. So, ...let's slightly repurpose that VM. Earlier was intended as a testbed to get all the exim4/mailman3/mailman stuff worked out, and with clean exim4 config, and without the obsoleted unsupported eximconfig, and get that working, and then - with suitable changes for IPs, domains, lists, etc., migrate that over to the balug VM. Well, lets just amend the purpose to prioritize clean functioning exim4 config that plays nice with mailman (mailman2), and vice versa - and worry about the mailman3 bits later (lower priority presently). And for now, will still leave exim4 stopped on the balug VM - probably more harm than good to run it presently, and having it off for several hours or so, not a huge deal. 72+ hours, however, would be a big deal, so, ... hopefully maybe have this "all better" in ... oh, 12 hours, or maybe way less? Shall see. tmp.balug.org VM ... repurposing ... * ripping out the mailman3 stuff for now (much of it wasn't yet fully configured anyway - mostly just complicates things at the moment) * ... dang, 1 GiB of (virtual) RAM, and ... fork failed due to out of memory ... ugh, don't really want to give the VM more RAM ... but it has zero swap, so ... * don't have LVM on this tmp.balug.org VM ... oh well, whatever, swap done as file - "good enough" for now. * installed mailman * purged mailman3 packages & related, including autoremove * exim4 only listening on localhost - good enough for now * hostname isn't set properly ... corrected ... & rebooted * DNS: stripping the IPv6 (routable & related) bits out for now (haven't got all the IPv6 routing set up - don't need it presently for these tests) ... removed the AAAA records, but left the rest alone (presumably will reintroduce those AAAA records in future, PTR will also help track what it's intended that IP will get used for) * removed Internet routable IPv6 address(es) * renamed the VM to tmp.balug.org for better consistency (and likewise it's storage file to tmp.balug.org.sda) * purged these packages (redundant - have 'em on the balug VM and more accessible there): exim4-doc-html exim4-doc-info * exim4 has a very basic local-only config, let's see if we can fairly easily reconfigure that (at least a bit) for Internet ...\\ # DEBIAN_PRIORITY=medium dpkg-reconfigure exim4-config * That should now be enough for basic (mostly) functional Internet email sending/receiving ... but we may generally fail much on outbound due to lack of established reputation on the IPv4 and/or missing "reverse" DNS for IPv4\\ mx-tmp.balug.org. 300 IN A 96.86.170.228\\ tmp.balug.org. 300 IN A 96.86.170.228\\ tmp.balug.org. 300 IN MX 0 mx-tmp.balug.org.\\ tmp.balug.org. 300 IN TXT "v=spf1 ip4:96.86.170.228 ip6:2001:470:1f05:19e::f"\\ 228.170.86.96.in-addr.arpa. 3600 IN PTR 96-86-170-228-static.hfc.comcastbusiness.net.\\ Actually, do have a "reverse" IPv4, so that might be "good enough".\\ Also, don't have mailman configured yet, so that won't work ... yet. * basic test from Internet to @tmp.balug.org worked, lots more to test ... * postmaster@tmp.balug.org to Internet worked ... can the incoming defend against some of the more egregious spam attempts? ... * rejects relay attempt * also rejects relay attempt when spoofing VM host - even when "reverse" DNS also spoofs same localhost name, so that's good, and looks like it may already be somewhat better than on that balug VM (looks like spam was making past that and getting snaged/rejected later in the process - but at huge cost to filesystem space and logs and such). So, now "just" get mailman (mailman2) working "well enough", and should have a base model template that can be applied to balug to improve relative to current state ... though quite a bit more anti-spam should also be added to reject a bunch of other crud too. * So, mailman (mailman2) - need that to be able to handle using pipes in aliases ... that's configured on the balug VM, but not tmp.balug.org VM ... it's //somewhere// in the config ... but not trivial to find and nail down exactly where, so ... resuming working on that ... will try to more directly compare the two configs between the two hosts, ... and see where that may be buried - there's a lot 'o config stuff for exim4. Relevant difference should be somewhere in ... $ diff <(ssh -anx -l root -o BatchMode=yes tmp.balug.org. 'cat /var/lib/exim4/config.autogenerated') <(ssh -4anx -l mpaoli -o BatchMode=yes balug-sf-lug-v2.balug.org. 'sudo cat /var/lib/exim4/config.autogenerated') | wc -l 124 $ And ... crud, doesn't appear to be in there. But, ... appears, in whole or (large) part, ... balug VM may mostly be running what's based on a //much// older configuration - so that may be complicating things more / in addition to the old eximconfig stuff: 34c37 < MAIN_PACKAGE_VERSION=4.92-8+deb10u5 --- > MAIN_PACKAGE_VERSION=4.84.2-2+deb8u3 So, that may also explain why I'm not easily finding the pipe configuration on the balug VM - as it's configuration may be too different/older, for me to be able to find bit relevant bit that would match (notably configuration variables) from the tmp.balug.org VM.\\ So ... let's go back to just more directly trying to enable it on tmp.balug.org VM, without consulting the config on the balug VM to try to determine how to do that.\\ Looks like /usr/share/doc/mailman/README.Exim4.Debian.gz gives pretty good instructions/outline to configure mailman (mailman2) to work with exim4, and how to get the alias bits and such working. Let's try implementing that. Already created one list named mailman ... but might be simpler to rip that out, and recreate it after the other config bits are in place.\\ balug VM already has in /etc/mailman/mm_cfg.py: MTA = 'Postfix' POSTFIX_ALIAS_CMD = '/bin/true' POSTFIX_MAP_CMD = 'chgrp Debian-exim' POSTFIX_STYLE_VIRTUAL_DOMAINS = [ 'lists.balug.org', 'temp.balug.org' ] POSTFIX_MAP_CMD = 'chmod o+r' So, for tmp.balug.org VM we do: MTA = 'Postfix' POSTFIX_ALIAS_CMD = '/bin/true' POSTFIX_MAP_CMD = 'chmod o+r' POSTFIX_STYLE_VIRTUAL_DOMAINS = [ 'tmp.balug.org' ] And let's also fix up on balug VM (POSTFIX_MAP_CMD probably shouldn't be in there twice - probably doesn't hurt as the last probably overrides ... but just once would be more clear for the humans ... also, temp.balug.org is long since obsolete and should be removed) MTA = 'Postfix' POSTFIX_ALIAS_CMD = '/bin/true' POSTFIX_MAP_CMD = 'chmod o+r' POSTFIX_STYLE_VIRTUAL_DOMAINS = [ 'lists.balug.org' ] The documented instructions/example uses exim4 split configuration, ... so, we change that to split configuration ... # DEBIAN_PRIORITY=medium dpkg-reconfigure exim4-config\\ So, ... pipe alias stuff still not quite working. And "of course", we already see evidence of miscreants and their bots: # hostname; tail -n 1 rejectlog tmp.balug.org 2021-04-24 18:04:25 rejected EHLO from [122.228.19.80]: syntactically invalid argument(s): [] # pipe alias bits ... # tail -n 1 mainlog 2021-04-24 19:53:02 1laOKs-0002Km-9R == |/var/lib/mailman/mail/mailman owner mailman R=system_aliases defer (-30): pipe_transport unset in system_aliases router # added: # cat /etc/exim4/conf.d/main/000_localmacros SYSTEM_ALIASES_PIPE_TRANSPORT = address_pipe # That's closer, but still not quite there: The following text was generated during the delivery attempt: ------ pipe to |/var/lib/mailman/mail/mailman owner mailman generated by mailman-owner@tmp.balug.org ------ Group mismatch error. Mailman expected the mail wrapper script to be executed as group "daemon", but the system's mail server executed the mail script as group "Debian-exim". Try tweaking the mail server to run the script as group "daemon", or re-run configure, providing the command line option `--with-mail-gid=Debian-exim'. ... and finally working with: # cat /etc/exim4/conf.d/main/000_localmacros SYSTEM_ALIASES_PIPE_TRANSPORT = address_pipe SYSTEM_ALIASES_GROUP = daemon # So ... that's pretty good. Should add a wee bit more before using that as template for the balug VM: * put TLS cert in place for STARTTLS * reconfigure exim4 to limit what source IPs it uses (listening on all is fine, but sending to Internet should only use IPs properly set up with SPF & "reverse" DNS) Configured the tmp.balug.org VM with proper cert (already had suitable matching wildcard cert) and working STARTTLS using that cert,\\ also configured the tmp.balug.org VM to use only the specified source IP addresses.\\ So, that mostly should be a good "template" for the balug VM ... with, 'of course' suitable specific config changes to be made for the balug VM.\\ Also, before restart of exim4 on the balug VM, should clear out the massive pile of crud that's in the queue ... but without clobbering any legitimate stuff that may still be in there. So, let's move on to cleaning that up first, before making the other configuration changes to the balug VM.\\ So, analyzing what's in queue, we have: 19909 <> 16089 92 72 64 59 Those are envelope FROM addresses, by count, where, the empty <> FROM are bounce messages.\\ So, ... thee top two can be dropped as, at best unimportant (and probably mostly crud ... spam and backscatter thereof),\\ the rest probably deserve some (semi-)manual closer inspection - many may be deferred bits on envelope TO addresses with issues, and hence hanging around for redelivery attempt(s).\\ Bounce messages in queue, we've got (again by count): 14697 <> *** frozen *** error@balug.org 5212 <> *** frozen *** error@balug.org D balug@balug.org So those can all go (simplified/consolidated queue listing down to 1 line per item in queue).\\ Checked further on the bounce (envelope FROM <>) messages - improbable there's anything legitimate/important there ... removed them (19909 messages) from queue.\\ Likewise checked on the queued mail with envelope FROM %%%% - improbable there's anything legitimate/important there ... removed them (16089 messages) from queue.\\ The few hundred or so messages remaining in queue ... (mostly) legitimate? Let's have a look ... and checked a fair randomized sample, they look to be mostly to entirely legitimate.\\ So, to do, remains approximately, notably for balug VM: * (done) "replace" / merge in applicable configuration bits (most notably for exim4) * (done) before restarting exim4, let's //temporary// extent the queue timeout - since we've now had exim4 down for some fair bit (around 24 hours or more). * (done) oh ... should resize the queue directories for efficiency/space (on most *nix filesystem types, directories grow, but never shrink). * (done) restart exim4 * (done) send out relevant follow-up list postings to BALUG-Admin and BALUG-Talk There's then still further anti-spam stuff, etc. to do, but that should be "better enough" to reenable exim4 service.\\ So ... resizing of those directories ... # du -sx /var/spool/exim4/input 9036 /var/spool/exim4/input # df -h /var Filesystem Size Used Avail Use% Mounted on /dev/mapper/balug-var 6.4G 4.9G 1.3G 80% /var # That's way the heck better, but still, directories to resize ...: /var/spool/exim4/input 92 0 96 6 96 C 108 I 116 O 140 U 80 a 120 g 96 m 132 s 108 y 108 1 92 7 100 D 100 J 84 P 84 V 100 b 84 h 80 n 104 t 116 z 100 2 84 8 92 E 84 K 108 Q 120 W 124 c 140 i 88 o 104 u 136 3 104 9 132 F 156 L 116 R 88 X 108 d 116 j 100 p 108 v 120 4 96 A 80 G 116 M 124 S 80 Y 96 e 88 k 84 q 124 w 88 5 128 B 100 H 88 N 96 T 104 Z 108 f 108 l 128 r 120 x # cd /var/spool/exim4/ # mktemp -d /var/spool/exim4/input.tmp.XXXXXXXXXX /var/spool/exim4/input.tmp.r6J0lcwTTJ # (cd input && umask 077 && find . -xdev -depth -print0 | pax -rw -0dl -p e /var/spool/exim4/input.tmp.r6J0lcwTTJ/) # mv input input.BAK && mv input.tmp.r6J0lcwTTJ input # (cd input && pwd -P && ls -sd *) /var/spool/exim4/input 4 0 4 4 4 8 4 C 4 G 4 K 4 O 4 S 4 W 4 a 4 e 4 i 4 m 4 q 4 u 4 y 4 1 4 5 4 9 4 D 4 H 4 L 4 P 4 T 4 X 4 b 4 f 4 j 4 n 4 r 4 v 4 z 4 2 4 6 4 A 4 E 4 I 4 M 4 Q 4 U 4 Y 4 c 4 g 4 k 4 o 4 s 4 w 4 3 4 7 4 B 4 F 4 J 4 N 4 R 4 V 4 Z 4 d 4 h 4 l 4 p 4 t 4 x # find input.BAK -type f -links +1 -exec rm \{\} \; # rmdir input.BAK/*/ input.BAK "replace" / merge in applicable configuration bits (most notably for exim4) ...:\\ // Let's set aside the old ...: # hostname && pwd && ls -d exim* balug-sf-lug-v2.balug.org /etc exim4 exim4.original.pax.xz # mv exim4 exim4.2021-04-25 # mv /var/lib/exim4/config.autogenerated /var/lib/exim4/config.autogenerated.2021-04-25 # cp -p /etc/mailman/mm_cfg.py /etc/mailman/mm_cfg.py.2021-04-25 # // and, we bring over archive of relevant config bits from the tmp.balug.org VM, // and we'll first extract these with a .new suffix a top level relevant directory/file, and we'll leave it at .new until suitably adjusted to move in place. $ ssh -anx -l root -o BatchMode=yes tmp.balug.org. 'cd / && umask 022 && tar -cf - etc/mailman/mm_cfg.py etc/exim4 var/lib/exim4/config.autogenerated etc/letsencrypt/live/exim4 | gzip -9' | ssh -4ax -l mpaoli -o BatchMode=yes balug-sf-lug-v2.balug.org. 'umask 077 && cat >$(mktemp /var/tmp/tmp.exim.XXXXXXXXXX.tar.gz)' # hostname && chown 0:0 /var/tmp/tmp.exim.xBND4qYntI.tar.gz balug-sf-lug-v2.balug.org # ls -ld /etc/letsencrypt/live/exim4 ls: cannot access '/etc/letsencrypt/live/exim4': No such file or directory # lists.balug.org # ls -lLd /etc/letsencrypt/live/exim4/* -r--r--r-- 1 root root 1919 Mar 31 09:11 /etc/letsencrypt/live/exim4/cert.pem -r--r--r-- 12 root root 1586 Jan 3 23:11 /etc/letsencrypt/live/exim4/chain.pem -r--r--r-- 1 root root 3505 Mar 31 09:11 /etc/letsencrypt/live/exim4/fullchain.pem -r--r----- 1 root Debian-exim 1708 Mar 31 09:11 /etc/letsencrypt/live/exim4/privkey.pem # // and that is the appropriate cert for email/exim4 for the balug VM, so no changes needed on that bit # mkdir /var/.new # cd /var/.new # /etc/mailman/mm_cfg.py.new # pwd /root # vi /etc/mailman/mm_cfg.py.new // ... # mv /etc/mailman/mm_cfg.py.new /etc/mailman/mm_cfg.py # diff /etc/mailman/mm_cfg.py.2021-04-25 /etc/mailman/mm_cfg.py | sed -e 's/SECRET = '\''[^'\'']*'\''/SECRET = '\''[REDACTED]'\''/' 73,74c73 < add_virtualhost('temp.balug.org', 'temp.balug.org') < #add_virtualhost('lists.balug.org', 'lists.balug.org') --- > # add_virtualhost('temp.balug.org', 'temp.balug.org') 85,86c84 < # Unset send_reminders on newly created lists < #DEFAULT_SEND_REMINDERS = 0 --- > # set send_reminders on newly created lists 88a87,101 > # If the following is set to a non-empty string, this string in combination > # with the time, list name and the IP address of the requestor is used to > # create a hidden hash as part of the subscribe form on the listinfo page. > # This hash is checked upon form submission and the subscribe fails if it > # doesn't match. I.e. the form posted must be first retrieved from the > # listinfo CGI by the same IP that posts it. The subscribe also fails if > # the time the form was retrieved is more than the above FORM_LIFETIME or less > # than the below SUBSCRIBE_FORM_MIN_TIME before submission. > # Important: If you have any static subscribe forms on your web site, setting > # this option will break them. With this option set, subscribe forms must be > # dynamically generated to include the hidden data. See the code block > # beginning with "if mm_cfg.SUBSCRIBE_FORM_SECRET:" in Mailman/Cgi/listinfo.py > # for the details of the hidden data. > SUBSCRIBE_FORM_SECRET = '[REDACTED]' > 98,101d110 < MTA = 'Postfix' < POSTFIX_ALIAS_CMD = '/bin/true' < POSTFIX_MAP_CMD = 'chmod o+r' < POSTFIX_STYLE_VIRTUAL_DOMAINS = [ 'lists.balug.org' ] 122a132,141 > # ***** START bits per /usr/share/doc/mailman/README.Exim4.Debian.gz ***** > # And yes, the "Postfix" there is on purpose, it should not be replaced > # by "exim4". It causes mailman to (among others) create a list of > # mailman lists, including what virtual domain they should be in. That > # is the information that is used here; the rest is ignored. > MTA = 'Postfix' > POSTFIX_ALIAS_CMD = '/bin/true' > POSTFIX_MAP_CMD = 'chmod o+r' > POSTFIX_STYLE_VIRTUAL_DOMAINS = [ 'lists.balug.org' ] > # ***** END bits per /usr/share/doc/mailman/README.Exim4.Debian.gz ***** # systemctl stop mailman.service # systemctl start mailman.service # cd /etc/exim4.new // # vi ... # mv /etc/exim4.new /etc/exim4 # DEBIAN_PRIORITY=medium dpkg-reconfigure exim4-config // apparently the only bit that changed: # pwd -P && diff ../exim4.BAK/update-exim4.conf.conf update-exim4.conf.conf /etc/exim4 20c20 < dc_other_hostnames='' --- > dc_other_hostnames='balug.org; lists.balug.org' # // temporarily increase max queue time from 4 days to 7 days: # awk '{if($1~/^[^#]/||$1~/^#\*/||$0~/^# temp/)print;}' conf.d/retry/30_exim4-config #* * F,2h,15m; G,16h,1h,1.5; F,4d,6h # temporarily up to 7 days: * * F,2h,15m; G,16h,1h,1.5; F,7d,6h # // Theoretically should be good to go now ... let's start exim4 for a little bit, ... then stop it and look at logs, to see if things seem to be going okay. # systemctl enable exim4.service # systemctl start exim4.service && { sleep 180; systemctl stop exim4.service; } # // checking over logs ... rejectlog looks good (lots of legitimate rejects in 3 minutes, no false positives) // mainlog mostly looks good and as expected - only particularly bits that didn't seem as expected: Berkeley DB error: BDB0058 page 19818: illegal page type or format Berkeley DB error: BDB0060 PANIC: fatal region error detected; run recovery Berkeley DB error: BDB0061 PANIC: Invalid argument Berkeley DB error: BDB1581 File handles still open at environment close Berkeley DB error: BDB1582 Open file handle: /var/spool/exim4/db/retry // used db_recover # systemctl start exim4.service // still getting Berkeley DB error diagnostics // stopped exim4, did a dump & (re)load of DB (with db_dump & db_load), restarted exim4 ... seems to be running okay now without those Berkeley DB errors Analyzed the mail queue again. Found one more abuser with a bunch 'o queued mail.\\ That particular abuser had 332 queued mail messages - all of which were subscription requests that been processed - but not confirmed, for the same email address and all from the same IPv4 address. All the queued emails were confirmation emails - emails to that email address to get confirmation of the subscription request. The email domain appears legitimate, but the IP address dubious at best (no reverse DNS, etc.)\\ Anyway, removed those 332 queued email messages ... that then dropped the queue to only 20 remaining queued messages - all of which appear legitimate.\\ Analyzed logs further, notably for web and email traffic/attempts. Looks like most all that problematic email was from bad web bots repeatedly and voluminously subscribing (well, attempting to subscribe) that, and one other email address, to BALUG's various lists - causing confirmation emails to be queued. Looks like two such emails got delivered, but all (or almost all?) of the others got deferred by the receiving MTAS (there were only 2 email addresses). So, perhaps bad bot trying to do DoS/DDoS against those two target emails? Could potentially block the IP address but ... whack-a-mole - would likely just pop up on another IP.\\ Checked the mail queue again - after subtracting out target addresses that have already been successfully delivered to, there remain at the moment only 6 unique email addresses presently showing any delivery issues. More anti-spam to do ... SPF ... looks like config files can have that enabled ...\\ conf.d/acl/30_exim4-config_check_rcpt # This is quite costly in terms of DNS lookups (~6 lookups per mail). Do not # enable if that's an issue. Also note that if you enable this, you must # install "spf-tools-perl" which provides the spfquery command. # Missing spf-tools-perl will trigger the "Unexpected error in # SPF check" warning. .ifdef CHECK_RCPT_SPF deny message = [SPF] $sender_host_address is not allowed to send mail from \ ${if def:sender_address_domain {$sender_address_domain}{$sender_helo_name}}. \ Please see \ http://www.openspf.org/Why?scope=${if def:sender_address_domain \ $ dpkg -l spf-tools-perl | grep '^ii ' ii spf-tools-perl 2.9.0-4 all SPF tools (spfquery, spfd) based on the Mail::SPF Perl module $ nc -z www.openspf.org. 80 nc: unable to connect to address www.openspf.org., service 80 $ nc -z www.openspf.org. 443 nc: unable to connect to address www.openspf.org., service 443 $ So, is spf-tools-perl still applicable, or is it just the diagnostic that's out-of-date referring to a service that's no longer (at least pesently) reachable? $ dpkg -L spf-tools-perl | sort | grep -e bin/ -e '/man/.*spf' /usr/bin/spfquery.mail-spf-perl /usr/sbin/spfd.mail-spf-perl /usr/share/man/man1/spfquery.mail-spf-perl.1p.gz /usr/share/man/man8/spfd.mail-spf-perl.8p.gz $ man spfquery ... $ spfquery --scope mfrom --identity balug.org --ip-address $(dig +short balug.org. A) pass balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched) balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched) Received-SPF: pass (balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=balug.org; client-ip=96.86.170.229 $ echo $? 0 $ spfquery --scope mfrom --identity balug.org --ip-address 8.8.8.8; echo $? neutral balug.org: Default neutral result due to no mechanism matches balug.org: Default neutral result due to no mechanism matches Received-SPF: neutral (balug.org: Default neutral result due to no mechanism matches) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=balug.org; client-ip=8.8.8.8 3 $ neutral ? - are we missing something that ought say that should fail??? Anyway, looks like spfquery probably works fine, but the web site may be no longer available (DDoS from spammers, or ???). $ spfquery --scope mfrom --identity lists.balug.org --ip-address $(dig +short balug.org. A) pass lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched) lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched) Received-SPF: pass (lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=lists.balug.org; client-ip=96.86.170.229 $ spfquery --scope mfrom --identity lists.balug.org --ip-address 8.8.8.8 neutral lists.balug.org: Default neutral result due to no mechanism matches lists.balug.org: Default neutral result due to no mechanism matches Received-SPF: neutral (lists.balug.org: Default neutral result due to no mechanism matches) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=lists.balug.org; client-ip=8.8.8.8 $ Again with the neutral. Those ought be hard fail. ... Ah ...: balug.org. IN TXT "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2" We're missing the -all at the end. Should check all our SPF records, and fix as appropriate. Should probably also add spf version 2, but first things first ... So ... we have ...: balug.org. 600 IN SPF "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2" balug.org. 600 IN TXT "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2" tmp.balug.org. 300 IN TXT "v=spf1 ip4:96.86.170.228 ip6:2001:470:1f05:19e::f" lists.balug.org. 600 IN SPF "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2" lists.balug.org. 600 IN TXT "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2" berkeleylug.com. 172800 IN SPF "v=spf1 -all" berkeleylug.com. 172800 IN TXT "v=spf1 -all" sf-lug.com. 172800 IN SPF "v=spf1 -all" sf-lug.com. 172800 IN TXT "v=spf1 -all" sf-lug.net. 172800 IN SPF "v=spf1 -all" sf-lug.net. 172800 IN TXT "v=spf1 -all" sflug.com. 172800 IN SPF "v=spf1 -all" sflug.com. 172800 IN TXT "v=spf1 -all" sflug.net. 172800 IN SPF "v=spf1 -all" sflug.net. 172800 IN TXT "v=spf1 -all" sflug.org. 86400 IN SPF "v=spf1 -all" sflug.org. 86400 IN TXT "v=spf1 -all" We should: remove the RRs of type SPF (superseded/obsoleted, per RFC(s)) add trailing " -all" for those that don't have it Our active sending TTLs look rather short, should probably nudge 'em up to ... 3600 or so? ... at least after they're tested out okay. And after updating, we have: balug.org. 3600 IN TXT "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all" lists.balug.org. 3600 IN TXT "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all" tmp.balug.org. 3600 IN TXT "v=spf1 ip4:96.86.170.228 ip6:2001:470:1f05:19e::f -all" berkeleylug.com. 172800 IN TXT "v=spf1 -all" sf-lug.com. 172800 IN TXT "v=spf1 -all" sf-lug.net. 172800 IN TXT "v=spf1 -all" sflug.com. 172800 IN TXT "v=spf1 -all" sflug.net. 172800 IN TXT "v=spf1 -all" sflug.org. 86400 IN TXT "v=spf1 -all" So ... that now looks better. And let's do a little retest on our earlier: $ spfquery --scope mfrom --identity balug.org --ip-address $(dig +short balug.org. A); echo "$?" pass balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched) balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched) Received-SPF: pass (balug.org: 96.86.170.229 is authorized to use 'balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=balug.org; client-ip=96.86.170.229 0 $ spfquery --scope mfrom --identity lists.balug.org --ip-address $(dig +short balug.org. A); echo "$?" pass lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched) lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched) Received-SPF: pass (lists.balug.org: 96.86.170.229 is authorized to use 'lists.balug.org' in 'mfrom' identity (mechanism 'ip4:96.86.170.229' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=lists.balug.org; client-ip=96.86.170.229 0 $ spfquery --scope mfrom --identity balug.org --ip-address 8.8.8.8; echo "$?" fail Please see http://www.openspf.org/Why?s=mfrom;id=balug.org;ip=8.8.8.8;r=balug-sf-lug-v2.balug.org balug.org: Sender is not authorized by default to use 'balug.org' in 'mfrom' identity (mechanism '-all' matched) Received-SPF: fail (balug.org: Sender is not authorized by default to use 'balug.org' in 'mfrom' identity (mechanism '-all' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=balug.org; client-ip=8.8.8.8 1 $ spfquery --scope mfrom --identity lists.balug.org --ip-address 8.8.8.8; echo "$?" fail Please see http://www.openspf.org/Why?s=mfrom;id=lists.balug.org;ip=8.8.8.8;r=balug-sf-lug-v2.balug.org lists.balug.org: Sender is not authorized by default to use 'lists.balug.org' in 'mfrom' identity (mechanism '-all' matched) Received-SPF: fail (lists.balug.org: Sender is not authorized by default to use 'lists.balug.org' in 'mfrom' identity (mechanism '-all' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from=lists.balug.org; client-ip=8.8.8.8 1 $ So, that looks much better now. wordpress also sends mail: From www-data@balug.org Tue Apr 27 02:12:48 2021 From: WordPress So, @berkeleylug.com needs to be set up to send - and at least minimally receive, email (e.g. postmaster ...) So, ... SPF first, as that has the longer TTL presently ... from: berkeleylug.com. 172800 IN TXT "v=spf1 -all" to: berkeleylug.com. 3600 IN TXT "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all" And, added bit more for digitalwitness.org. and sf-lug.org. (latter of which thus far still uses @linuxmafia.com for mail), now have: balug.org. 3600 IN TXT "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all" lists.balug.org. 3600 IN TXT "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all" tmp.balug.org. 3600 IN TXT "v=spf1 ip4:96.86.170.228 ip6:2001:470:1f05:19e::f -all" berkeleylug.com. 3600 IN TXT "v=spf1 ip4:96.86.170.229 ip6:2001:470:1f05:19e::2 -all" digitalwitness.org. 86400 IN TXT "v=spf1 -all" sf-lug.com. 172800 IN TXT "v=spf1 -all" sf-lug.net. 172800 IN TXT "v=spf1 -all" sf-lug.org. 86400 IN TXT "v=spf1 -all" sflug.com. 172800 IN TXT "v=spf1 -all" sflug.net. 172800 IN TXT "v=spf1 -all" sflug.org. 86400 IN TXT "v=spf1 -all" SPF version 2 could be good/better ... but later, not a top priority. So, let's look into enabling SPF checking upon receipt of incoming ... I also noticed what looks like something about a daemon - which may be preferable for large volumes/streams of incoming ... let's look at documentation bit more ... $ man spfd.mail-spf-perl $ systemctl list-unit-files | fgrep spf $ So, nothin' in systemd unit files nor exim4 config that supports the spf daemon, so doing that would mean fair bit more manual configuring. For now let's presume spfquery (non-daemonized) is quite "good enough" for now - we can change later if we need to. So ... let's configure that ... added ...: # tail -n 1 conf.d/main/000_localmacros CHECK_RCPT_SPF = true # systemctl restart exim4.service # That should be enough for that to now be operational - that should stop >> 50% of the incoming spam (attempts). Should see results in logs quite soon (if not already). Not seeing an SPF failure in the logs ... quite yet. Let's test something that should fail ... Drats - test made it through, even though the config should'a rejected it. Oh, let's also add berkeleylug.com to the email domains, so that should work. # DEBIAN_PRIORITY=medium dpkg-reconfigure exim4-config # systemctl start exim4.service Let's try sending to postmaster@berkeleylug.com and yes, that got delivered fine. So ... why is SPF check not working? # systemctl stop exim4.service # ls -d /usr/*bin/*exim*conf* /usr/sbin/update-exim4.conf /usr/sbin/update-exim4.conf.template # update-exim4.conf # systemctl start exim4.service SPF check still not working. Wordpress email ... something to circle back on later. For now, for header it uses: From: WordPress Looks like the only bit of that that's easy to change is the domain. Looks like it uses php mail. There are plugins to change that, but that's then more complications. As for envelope, since it's using Apache, between that and exim, that ends up as: MAIL FROM: Again, not simple to change that. More to circle back on for later. For now, dropped in aliases for www-data and wordpress, so at least attempts to those - and for now at least, won't bounce at those domains if attempted. So, that should help deliverability (and, on the receiving side, probably some more spam for postmaster as I presently aliased those to postmaster ... "good enough" for now). Looks like the SPF checks are now working. I also found an older spdf process running and killed that off - maybe that made the difference? So, yes, and seeing SPF fail/rejects in the log e.g.: # fgrep -ai spf rejectlog 2021-04-28 02:29:33 H=(sweja-se.mail.protection.outlook.com) [183.199.220.44] F= rejected RCPT : SPF check failed. 2021-04-28 03:50:56 H=(smail1.vub.sk) [222.77.253.120] F= rejected RCPT : SPF check failed. # dig +noall +answer +nottl ottawa.ca. TXT ottawa.ca. SPF swebolt.se. TXT swebolt.se. SPF | fgrep \"v=spf ottawa.ca. IN TXT "v=spf1 include:spf.protection.outlook.com include:_spf.esolutionsgroup.ca include:emsd1.com -all" swebolt.se. IN TXT "v=spf1 mx ip4:167.99.44.246 include:spf.protection.outlook.com a:smtp05.dgcsystems.net -all" # spfquery --scope mfrom --id oefydgodea@ottawa.ca --ip 183.199.220.44; echo "$?" fail Please see http://www.openspf.org/Why?s=mfrom;id=oefydgodea%40ottawa.ca;ip=183.199.220.44;r=balug-sf-lug-v2.balug.org ottawa.ca: Sender is not authorized by default to use 'oefydgodea@ottawa.ca' in 'mfrom' identity (mechanism '-all' matched) Received-SPF: fail (ottawa.ca: Sender is not authorized by default to use 'oefydgodea@ottawa.ca' in 'mfrom' identity (mechanism '-all' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from="oefydgodea@ottawa.ca"; client-ip=183.199.220.44 1 # spfquery --scope mfrom --id jhylunrrhc@swebolt.se --ip 222.77.253.120; echo "$?" fail Please see http://www.openspf.org/Why?s=mfrom;id=jhylunrrhc%40swebolt.se;ip=222.77.253.120;r=balug-sf-lug-v2.balug.org swebolt.se: Sender is not authorized by default to use 'jhylunrrhc@swebolt.se' in 'mfrom' identity (mechanism '-all' matched) Received-SPF: fail (swebolt.se: Sender is not authorized by default to use 'jhylunrrhc@swebolt.se' in 'mfrom' identity (mechanism '-all' matched)) receiver=balug-sf-lug-v2.balug.org; identity=mailfrom; envelope-from="jhylunrrhc@swebolt.se"; client-ip=222.77.253.120 1 # Wrote a handy little program to summarize the exim rejectlog failure from the most recent few such log files: # Rejectlog_report 6313 Unrouteable address 1013 relay not permitted 8 SPF check failed 7 SMTP protocol synchronization error (input sent without waiting for greeting) 7 maximum allowed line length 3 unqualified address not permitted 1 SMTP protocol synchronization error (next input sent too soon: pipelining was not advertised) 1 missing or malformed local part 1 syntactically invalid # Look at least the top couple items would be good candidates for adding configurations for fail2ban. Some others beyond that may also be worth doing - but not as high a priority. // reverted the temporarily increase of max queue time from 4 days to 7 days: # awk '{if($1~/^[^#]/||$1~/^#\*/||$0~/^# temp/)print;}' conf.d/retry/30_exim4-config * * F,2h,15m; G,16h,1h,1.5; F,4d,6h # systemctl reload exim4.service #