Diagnosing DNS-sourced SPF problems

Last Updated: Monday, 29 August 2022

So, your outgoing messages are failing SPF, or your own SPF verifier is failing message for no good reason, and you want to figure out why? It might be the DNS of the sending domain.

Example domain here is myatb.com, which once had DNS troubles. Domains like blah.myatb.com will not be covered here, but the process is similar (if a bit more complicated.)

If DNS lookups fail or timeout or otherwise get weird, your software might map the SPF results to "fail" (this behavior may be correct if there is a "ptr" mechanism being used, more on that below). A good verifier knows better, but it's up to the implementer to make this distinction (and up to you to utilize the information which your verification code provides.)

Before suspecting DNS problems, confirm that they have a valid SPF record.

You can retrieve the SPF record with:

$ host -t txt myatb.com myatb.com descriptive text "v=spf1 a mx ip4:207.173.6.116 mx:mail.myatb.com mx:mail.martechservices.com ~all"

But it's probably easier to use the first form at http://www.kitterman.com/spf/validate.html

(fill in the domain only and hit "Get SPF record"... it shouldn't give any errors. (If there is no SPF record, you can skip down to the DNS section.)

Now use the second form at that link. In the fields, fill in "myatb.com" and the contents of the quoted string above:

myatb.com v=spf1 a mx ip4:207.173.6.116 mx:mail.myatb.com mx:mail.martechservices.com ~all

The output includes:

The result of the test (this should be the default result of your record) was, ambiguous . The explanation returned was, SPF Ambiguity Warning: No MX records found for mx mechanism: mail.myatb.com

...their SPF record is strange: the two spaces before "~all" aren't a problem, but they should probably remove mx:mail.myatb.com, and replace mx:martechservices.com for mx:mail.martechservices.com. If martechservices.com was actually a different IP, their record as-is could cause mail from "blah@myatb.com" sent from a martechservices.com IP to fail SPF, since it's IP is not properly included. However, it turns out that martechservices.com has the same IP as myatb.com, so the whole mx:martechservices.com entry is redundant as well. Additionally, the IP specified is identical to their mx, and in fact their A record as well. Their record should probably just be:

v=spf1 a ~all

...or better, because it will save a lookup:

v=spf1 ip4:207.173.6.116 ~all

Likely a case of someone not knowing what they were doing filling in a lot of stuff to be sure, or adding it all in just in case it changes in the future. In the latter case, this would suffice:

v=spf1 a mx ip4:207.173.6.116 ~all

...although it wouldn't be as efficient for resolvers on the other end because it's saying the same thing three times (a==mx==207.173.6.116).

Trying the third form at kitterman.com, we can put in actual values from the email that failed, or we can make up our own. I will make up our own for this example:

IP address: 207.173.6.116 (You can get this from "dig myatb.com +short" or "host myatb.com") SPF Record: (paste in the real record you extracted already for the form) Mail from address: test@myatb.com HELO/EHLO Address: myatb.com

Submit the form and see what happens. In this example, it would pass, despite their redundant and misconfigured SPF record.

Most mail servers also require a "FCRDNS" -- a validating PTR record for the IP: take the IP of the sending mail server (from the Received header of the message) and do a reverse lookup. It will return one or more host names. One of those host names must resolve back to the IP. You may as well let the form above take care of that checking for you.

If there is a "ptr" mechanism in the SPF record (they are almost always advised against) then the PTR records discovered by the DNS lookup must additionally match or end with the domain in question in order for the "ptr" mechanism to match. Note: a DNS error when looking up the PTR records will make the "ptr" mechanism not match. A DNS error when looking up the A records for any domains discovered in the PTR lookup simply means those domains will not be able to be used as possible matches.

If SPF all checks out on the form, but not in your system, there may still be problems with DNS causing failures or intermittent failures in one place (your system) but not another (the SPF checking form website).

If you want to see if the domain lookups are failing on your system, you can check whatever logs you keep for timeouts. You might also scan /var/log/messages on your nameserver machine(s) for this kind of output (obviously, your logs may look different):

Aug 16 04:16:16 app002 named[16870]: lame server resolving 'thesweettree.com' (in 'thesweettree.com'?): 76.74.153.57#53 Aug 16 04:16:16 app002 named[16870]: lame server resolving 'thesweettree.com' (in 'thesweettree.com'?): 216.14.112.235#53

In these logs, the IP at the end is the nameserver (usually one of the TLD zone nameservers) that "named" was working with, but it doesn't tell you the IP of the server that is "lame".

Note that depending on your logging and what your MX is handling, there could be lots of these messages in /var/log/messages, and you may see lots of timeouts in whatever logs you keep. This is usually normal (spammer domains, mostly.)

Let's check out the DNS records:

$ dig myatb.com +trace ; <<>> DiG 9.5.1-P2 <<>> myatb.com +trace ;; global options: printcmd . 13344 IN NS c.root-servers.net. . 13344 IN NS e.root-servers.net. . 13344 IN NS a.root-servers.net. . 13344 IN NS f.root-servers.net. . 13344 IN NS l.root-servers.net. . 13344 IN NS m.root-servers.net. . 13344 IN NS h.root-servers.net. . 13344 IN NS d.root-servers.net. . 13344 IN NS g.root-servers.net. . 13344 IN NS j.root-servers.net. . 13344 IN NS i.root-servers.net. . 13344 IN NS k.root-servers.net. . 13344 IN NS b.root-servers.net. ;; Received 228 bytes from 192.168.1.1#53(192.168.1.1) in 59 ms com. 172800 IN NS K.GTLD-SERVERS.NET. com. 172800 IN NS J.GTLD-SERVERS.NET. com. 172800 IN NS A.GTLD-SERVERS.NET. com. 172800 IN NS E.GTLD-SERVERS.NET. com. 172800 IN NS L.GTLD-SERVERS.NET. com. 172800 IN NS G.GTLD-SERVERS.NET. com. 172800 IN NS D.GTLD-SERVERS.NET. com. 172800 IN NS F.GTLD-SERVERS.NET. com. 172800 IN NS I.GTLD-SERVERS.NET. com. 172800 IN NS C.GTLD-SERVERS.NET. com. 172800 IN NS M.GTLD-SERVERS.NET. com. 172800 IN NS H.GTLD-SERVERS.NET. com. 172800 IN NS B.GTLD-SERVERS.NET. ;; Received 487 bytes from 192.5.5.241#53(f.root-servers.net) in 73 ms myatb.com. 172800 IN NS ns.martechservices.com. myatb.com. 172800 IN NS ns2.martechservices.com. ;; Received 110 bytes from 192.12.94.30#53(E.GTLD-SERVERS.NET) in 239 ms myatb.com. 3600 IN A 207.173.6.116 ;; Received 43 bytes from 207.173.6.116#53(ns2.martechservices.com) in 89 ms

Note the first two ns*.martechservices.com lines: these are the nameservers that the "com" nameservers lists for the myatb.com zone. As it happens with this particular domain, ns.martechservices.com isn't responding, and if you ran the dig command and the resolver happened to chose "ns" instead of "ns2" (the nameserver in parenthesis on the last line was the nameserver responding with the IP) the end of the above output would list a failure instead of listing the IP. Anyway, let's check each nameserver individually to see if it responds (there may be a lot in the list; in this case there are only two. Watch out for unexpected but common ns1.easydns.org vs. ns2.easydns.net discrepancies, etc.)

$ dig @ns2.martechservices.com myatb.com +short 207.173.6.116 $ dig @ns.martechservices.com myatb.com +short ; <<>> DiG 9.5.1-P2 <<>> @ns.martechservices.com myatb.com +short ; (1 server found) ;; global options: printcmd ;; connection timed out; no servers could be reached

Aha, ns.martechservices.com is not responding. This will cause DNS failures (and possibly inappropriate SPF failures as a result) when checking SPF for this domain.

Let's also see if there are any SOA issues ("Start of Authority", a type of DNS record). Valid SOA is not a requirement of SPF, but without it the domain will likely have problems in general.

$ host -C myatb.com host: couldn't get address for 'webserver': not found

...looks like a problem. Normally, something like this comes back:

$ host -C caseyconnor.org Nameserver ns1.server264.com: caseyconnor.org has SOA record ns1.server264.com. hostmaster.caseyconnor.org. 1250735855 16384 2048 1048576 2560 Nameserver ns2.server264.com: caseyconnor.org has SOA record ns1.server264.com. hostmaster.caseyconnor.org. 1250735855 16384 2048 1048576 2560

This is likely the culprit:

$ dig -t SOA myatb.com +short webserver. hostmaster. 11 900 600 86400 3600

Compare to:

$ dig -t SOA caseyconnor.org +short ns1.server264.com. hostmaster.caseyconnor.org. 1250735855 16384 2048 1048576 2560

...I don't know enough about DNS to really call that out as "wrong", but I'm pretty sure it is.