As promised in my previous post on xgoogle library, I have added a module to get results from Google Sets.
Google Sets allows to automatically create groups of related items from a few example items. For example, you feed it "red, green, blue," and it will predict other colors such as "yellow, black, white, brown, etc."
One of the most fascinating applications that this library can be used for is predicting domain names. Most sysadmins have a coherent naming policy for their systems. For example, a sysadmin at a university might call his machines "psychology.university.edu", "art.university.edu", "geography.university.edu", etc. Now, if we feed these names "psychology, art, geography" to Google Sets, it would come up with more names such as "history, mathematics, biology, and others". Now we can do DNS scans to find if there really are such machines. This is a pretty powerful method for reconnaissance.
There are many other interesting applications. Black hat SEO's may use it to stuff their pages with related keywords and thus rank for more words on search engines. Linguists can use it for various natural language processing problems. Various word guessing games can be created.
But my personal goal in writing this library was to use it for my English language perfection and correction tool that I will release in one of the next posts about this project. I wrote more about this idea in the introductory post of xgoogle library. Please see that post for more info.
The new module is called "googlesets", and to use it, import "GoogleSets" and create an object of this type. Pass the list of items to create the prediction from to the constructor. Then use "get_results()" member function to get the list of predicted items. It returns a list of Unicode strings, so make sure to use a proper encoding when outputting them.
Here is an example usage of the new module. It finds items related to programming languages "python" and "perl":
from xgoogle.googlesets import GoogleSets gs = GoogleSets(['python', 'perl']) items = gs.get_results() for item in items: print item.encode('utf8')
The output matches that of Google Sets itself:
See the readme.txt file in the xgoogle archive for more examples.
Download xgoogle Library
- Download link: catonmat.net/ftp/xgoogle.zip
Have fun and let me know what projects you built with my library. I'll showcase them in one of my next blog posts.