Once upon a time, I had to quickly resolve thousands of DNS names. My first solution was to call gethostbyname repeatedly for each of the hosts. This turned out to be extremely slow. I could only do 200 hosts in a minute. I talked with someone and he suggested to try to do it asynchronously. I looked around and found adns - asynchronous dns library. Since I was writing the code in Python, I looked around some more and found Python bindings for adns. I tried adns and - wow - I could do 20000 hosts in a minute!
In this post I want to share the slow code and the fast asynchronous code. The slow code is only useful if you need to resolve just several domains. The asynchronous code is much more useful. I made it as a Python module so that you can reuse it. It's called "async_dns.py" and an example of how to use it is included at the bottom of the post.
Here is the slow code that uses gethostbyname. The only reusable part of this code is "resolve_slow" function that takes a list of hosts to resolve, resolves them, and returns a dictionary containing { host: ip } pairs.
To measure how fast it is I made it resolve hosts "www.domain0.com", "www.domain1.com", ..., "www.domain999.com" and print out how long the whole process took.
#!/usr/bin/python
import socket
from time import time
def resolve_slow(hosts):
"""
Given a list of hosts, resolves them and returns a dictionary
containing {'host': 'ip'}.
If resolution for a host failed, 'ip' is None.
"""
resolved_hosts = {}
for host in hosts:
try:
host_info = socket.gethostbyname(host)
resolved_hosts[host] = host_info
except socket.gaierror, err:
resolved_hosts[host] = None
return resolved_hosts
if __name__ == "__main__":
host_format = "www.domain%d.com"
number_of_hosts = 1000
hosts = [host_format % i for i in range(number_of_hosts)]
start = time()
resolved_hosts = resolve_slow(hosts)
end = time()
print "It took %.2f seconds to resolve %d hosts." % (end-start, number_of_hosts)
And here is the fast code that uses adns. I created a class "AsyncResolver" that can be reused if you import it from this code. Just like "resolve_slow" from the previous code example, it takes a list of hosts to resolve and returns a dictionary of { host: ip } pairs.
If you run this code, it will print out how long it took to resolve 20000 hosts.
#!/usr/bin/python
#
import adns
from time import time
class AsyncResolver(object):
def __init__(self, hosts, intensity=100):
"""
hosts: a list of hosts to resolve
intensity: how many hosts to resolve at once
"""
self.hosts = hosts
self.intensity = intensity
self.adns = adns.init()
def resolve(self):
""" Resolves hosts and returns a dictionary of { 'host': 'ip' }. """
resolved_hosts = {}
active_queries = {}
host_queue = self.hosts[:]
def collect_results():
for query in self.adns.completed():
answer = query.check()
host = active_queries[query]
del active_queries[query]
if answer[0] == 0:
ip = answer[3][0]
resolved_hosts[host] = ip
elif answer[0] == 101: # CNAME
query = self.adns.submit(answer[1], adns.rr.A)
active_queries[query] = host
else:
resolved_hosts[host] = None
def finished_resolving():
return len(resolved_hosts) == len(self.hosts)
while not finished_resolving():
while host_queue and len(active_queries) < self.intensity:
host = host_queue.pop()
query = self.adns.submit(host, adns.rr.A)
active_queries[query] = host
collect_results()
return resolved_hosts
if __name__ == "__main__":
host_format = "www.host%d.com"
number_of_hosts = 20000
hosts = [host_format % i for i in range(number_of_hosts)]
ar = AsyncResolver(hosts, intensity=500)
start = time()
resolved_hosts = ar.resolve()
end = time()
print "It took %.2f seconds to resolve %d hosts." % (end-start, number_of_hosts)
I wrote it in a manner that makes it reusable in other programs. Here is an example of how to reuse this code:
from async_dns import AsyncResolver
ar = AsyncResolver(["www.google.com", "www.reddit.com", "www.nonexistz.net"])
resolved = ar.resolve()
for host, ip in resolved.items():
if ip is None:
print "%s could not be resolved." % host
else:
print "%s resolved to %s" % (host, ip)
Output:
www.nonexistz.net could not be resolved. www.reddit.com resolved to 159.148.86.207 www.google.com resolved to 74.125.39.99
Download async_dns.py
Download link: catonmat.net/ftp/async_dns.py
See you next time!