Oct 16

Verify huge email lists for free with Ruby

As the "tech guy" in my circle of friends I get approached from time to time with questions of all sorts from my less inclined pals. "Can you build me an app?", "What's Bitcoin?", "Do you have a bootleg copy of Photoshop?" being the most common. Of course, like any self respecting "tech guy" I generally ignore these and continue on with my life.

Recently one of these friends approached me with something I found interesting. He had acquired a rather large list of email addresses over the years relating to his business but had never marketed to them. He wanted to start promoting to these people, but was hesitant about using the list of 400,000 emails or so because it had been so long since they were collected.

If you're not familiar — many email delivery systems (SendGrid, CampaignMonitor, MailChimp) will blacklist you or disable your account if you try marketing to a huge list and a large percentage of those addresses are invalid. He mentioned looking into programs that could perform that service for him, but all of them seemed prohibitively expensive.

Here's a selection of the better services I found for email address verification…

With a list of this magnitude, he was looking at thousands of dollars for performing a pretty simple task.

I already had the core of this code from another app written some time ago; so I decided to help in exchange for a nice bottle of sake and some cash on the side. Hey, what are friends for?

The solution involved creating two new pieces of software, written in Ruby, that I'm releasing as open source today. Perhaps you'll find them interesting and useful.

Email List Cleaner

The first of these tools I'm calling Email List Cleaner (surprise).

It does exactly what you'd expect it to; you feed it a list of email addresses in CSV format, and it performs the following actions…

  1. Loads all addresses into a Redis Set, which removes duplicates.
  2. Utilizes the email_verifier gem, which does a number of things, but ultimately connects to each SMTP server to verify email addresses using SMTP commands
  3. Dumps CSV files of "good", "bad", and "todo" (if any remain) addresses

Because the script ultimately needs to connect to SMTP (email) servers, there are issues that you can run into — namely getting banned for connecting too fast, or performing too many validations…

Cloud Proxy Generator

Due to the issue explained above, I created Cloud Proxy Generator — which generates a number of SOCKS 5 proxies that you can then feed to Email List Cleaner to multiplex your connection. This has a few benefits…

  1. You have a smaller chance of getting your IP addresses banned from verifying servers — namely Hotmail / MSN / Outlook
  2. Verifying email addresses goes MUCH FASTER when you can spread out the work over 25+ computers
  3. If you do get blacklisted from a SMTP server (the program will tell you), all you need to do is "spin down" the proxies, and spin some more up with new IP addresses.

Cloud Proxy Generator in action

There are substantial README docs on each in Github, and both tools come with some premade scripts to get you rolling. Hope you enjoy, use, and share.

If you have any questions on the use of either tool, please let me know.

Written by Seth Banks

Seth spends most of his days leading the design team at Green Bits and improving Cashboard. Occasionally he finds time to write about music, design, startups, and technology.

Tagged: ruby