I found a Google Talk on a topic that's related to the motto of my blog - "good coders code, great reuse".
In this talk, professor Tao Xie speaks about his research on using public code repositories together with code search engines for finding common API usage patterns and anti-patterns.
His research software uses the following four code search engines.
- Google Code Search with billions of lines of code.
- Krugle with 2.5 billion LOC.
- Koders with 760 million LOC.
- Codase with 250 million LOC.
He suggests to view Raphael Volz's analysis for more information about these search engines.
Tao has developed three tools, which use the aforementioned search engines:
- PARSEWeb for finding API usage patterns,
- XWeb for finding forgotten exception handlers, and
- NEGWeb for finding misuses of API calls.
See the code mining project website for more information.
The lecture is done in a very academic manner and it's very hard to follow. Be sure that you are really interested in this topic before watching it.
Some excerpts from the lecture:
- [04:26] A problem with data mining on source code is that it might not have enough data points (usages of API) to discover common patterns.
- [04:58] It is crucial to have a lot of data points to get good results out of data mining
- [08:37] Google Code Search indexes publicly hosted SVN and CVS repositories.
- [09:20] Example of searching for C stdlib's fopen usage on Google Code Search (query: "lang:C file:.c$ fopen\s*\("
- [11:08] Example of the same search on Krugle.
- [16:40] Code search engines return partial code samples. Various heuristics are used for type inference.
- [22:05] Example of integrating Tao's PARSEWeb into Eclipse.
- [28:15] Interesting idea of constructing and issuing multiple queries to find more code samples.
- [36:20] A study showed that a proper deallocation of resources after an exception resulted in 17% performance increase.
I'd like to hear some comments on websites that you use for finding code examples!