Your smart phone OS contains more than 10 million lines of code. A million lines of code takes 18000 pages to print which is equal to Tolstoy’s War and Peace put together, 14 times!
Though the number of lines of code is not a direct measure of the quality of a developer, it indicates the quantity which has been generated over the years. There is always a simpler, shorter version of the code and also a longer more exhaustive version.
What if there is a tool which uses machine learning algorithms to pick out the most suitable code and recommend it with just a click? There is one now.
Aroma, a code-to-code search and recommendation tool which enables developers to get insights from large codebases.
“With Aroma, engineers can easily find common coding patterns without the need to manually go through dozens of code snippets, saving time and energy in their day-to-day development workflow,” say the team behind Aroma at Facebook AI.
The code corpus is indexed as a sparse matrix initially and then a set of features are extracted from the parse tree. The feature vectors then act as index matrix, which is used for searching codes like shown below.
A code snippet is fragmented as above and the result of the dot product of a sparse vector with the feature vectors is used as a threshold for recommendation.
All possible code recommendations with similarities to the original snippet will be clustered.
Since dot product alone cannot avoid the abstractions of the components in a code, pruning is used to rank the similarity.
This cluster of potential candidates are skimmed through iteratively for extra statements useful for recommendations. The remaining code after pruning is recommended eventually.
Aroma is put to test by pitting it against top voted answers to 500 most popular questions asked on Stack Overflow. The Java code from each snippet is extracted by excluding comments. Then AROMA’s recommendations are checked for similarities in the lines used in top voted answers, which are picked randomly.
The recommended code also performs additional operations like showing API methods called in the query code and suggests related statements that commonly appear in the query code. The results show that AROMA takes 1.6 seconds before it recommends a code.
Aroma eclipses its counterparts with the following advantages:
- Aroma performs search on syntax trees. It can find instances that are syntactically similar to the query code and highlight the matching code.
- Aroma automatically clusters together similar search results to generate code recommendations.
- Creates real-time recommendations for very large codebases and does not require pattern mining ahead of time.
Aroma facilitates faster discovery of codes by sifting through large number of lines of code for recurring coding patterns. Now developers instead of worrying about missing a trivial syntax or defining a class for a task specific functionality, they can now proceed with their work at higher level with this semi-automatic tool.
Know more about Aroma here