In the past, I mentioned several times that I don’t receive spam. It’s not completely true, but it’s very true. My normal level of spam messages is about one each month. I have achieved this by using approaches like Yahoo! AddressGuard, and translating that same scheme to GMail when I moved to GMail. My e-mail address is publicly accessible by anyone and exposed in this blog. If you take a look at the source of that page, you’ll notice how I wrote it so you can copy-paste the address to your e-mail client while keeping spammers at bay.
When one of my addresses is compromised, I can change it right away, yet I prefer not to change them so often to avoid annoying people wanting to contact me from time to time. For this reason, when I receive a spam message to one of my accounts, the first thing I do is reporting it. If I received 100 spam messages a day, I couldn’t do this, but as I receive one a month, I don’t mind spending 5 minutes reporting the message. Only if the spam doesn’t stop and apparently increases, I change that address.
Reporting a message is quite straightforward, but I don’t have it automated. I could, but I haven’t bothered yet. Basically, I view the source of the message and look for "Received from" headers. I find the first one, in chronological order, that appears to be a valid public SMTP server that people should trust. Then, I run "whois" on that IP address and find the ISP or organization owning that network block, and report the message to the abuse address they provide as part of the "whois" reply. If they don’t provide an abuse address, usually I send it to the technical contact that appears in the "whois" reply, and also to the "abuse@" address of the company’s main domain, just in case it actually exists and is being read.
In my e-mail client I have a template to report spam. I fire a new message from the template, fill the "To" field with the addresses just mentioned and copy-paste the full spam message source at the end of my message, which consists of a very brief message to the person that could be reading it, saying I received a spam message apparently coming from their network block. As I said, this takes 5 minutes and could be automated.
Sometimes, the spam message comes from a Yahoo! account, using their servers, and I follow the same procedure, emailing email@example.com. This is the case of the latest spam message I received, two days ago. I proceeded to report the spam as I always do and received a reply from Yahoo! with the following contents.
To report abuse manually (or to get help with security or abuse related
issues), please go to Yahoo! Abuse: http://abuse.yahoo.com
- Yahoo! Customer Care
Note: Please do not reply to this email as replies will not be answered.
A quick Google search revealed a few people upset by this. Apparently, Yahoo! is applying this policy since the beginning of December. The RFC they mention in that first paragraph is from August. People are upset for several reasons. The RFC is so recent there are almost no tools to handle or create reports in that format yet. For that reason, they are cutting people out of the loop. The second option is going to their website and reporting the spam message there. This means two things: that you have to treat Yahoo! in a special way when reporting spam and that you have to be annoyed by their web form to report spam. It’s annoying because the landing page has no direct form to report spam. As of today, you have to click on "I want to report spam" (this opens a new window or tab), then copy, on separate locations, the full email headers on one box, and the message contents on another one. Fantastic. So you can’t simply upload the message for paste the full contents to a form. No, no. You have to carefully select the message headers first, then copy them, then paste them on the form, then copy the message body, then paste it on the form, then pass a captcha.
I was also a bit upset by this, so I read RFC 5965 a little bit. It looked simple if you only wanted to fill the required parts, and had a simple report example at the end, so I searched for a tool that would convert an e-mail message to a report based on these parameters. I didn’t find any tool immediately. I realized Python has a very comprehensive and easy to use package to handle e-mail messages, so I investigated a little bit and decided to spend the rest of the evening trying to create such a tool. The result has been uploaded to github as the spamreport repository, but don’t try to use it immediately. I have some bad news. Python’s e-mail library is amazingly simple and, in the end, including all the code to check program options and such, the program is exactly 100 lines long, so it’s very short and straightforward, and should work perfectly. However, it doesn’t work.
I have tried submitting an abuse report to Yahoo! in that format several times, making minor changes to the code, tweaking my program here and there, and every time the report has been rejected. Yahoo! does not explain why the report is being rejected in their reply, which, by the way, is a bit against the RFC itself. Section 4:
When an agent that accepts and handles ARF messages receives a message that purports (by MIME type) to be an ARF message but syntactically deviates from this specification, that agent SHOULD ignore or reject the message. Where rejection is performed, the rejection notice (either via an SMTP reply or generation of a DSN) SHOULD identify the specific cause for the rejection.
As they are replying via SMTP with a rejection, they SHOULD explain the reason but they’re not doing it, and that’s why this is so frustrating. At first, I thought GMail was mangling the reports so I sent one to my own accounts at another e-mail provider, and it came out unmangled on the other end. GMail is not manipulating the reports. Just so you get an idea, here’s a screenshot from a test case. I took the simple report example they give in the RFC and attempted to create a similar report with my tool, using the same spam message and the same notification text, just to see what the differences were. Click on the image to view it in full size.
As you can see, apparently the only differences are:
The header order for "To", "From", "Date" and "Subject" differs (this should be irrelevant).
The words "feedback-report" are quoted in my output because Python writes them that way. This should also be irrelevant.
The MIME boundary markers differ (irrelevant and are generated randomly for each message).
The words "us-ascii" are in lowercase in my output. Python writes them in lowercase even if I put them in uppercase, and this should be irrelevant too.
The User-Agent string changes (obviously).
Yet the reports are being rejected by Yahoo! I’m puzzled at this moment and won’t tag the release as 1.0.0 until the reports are accepted or proved to be correct, but I don’t know what more to check. I suspect there’s a minor flaw I haven’t detected. If you spot it, please let me know. The code is on the net.