HotComments: How to Make Program Comments More Useful?

Program comments have long been used as a common practice for improving inter-programmer communication and code readability, by explicitly specifying programmers' intentions and assumptions. Unfortunately, comments are not used to their maximum potential, as since most comments are written in natural language, it is very difficult to automatically analyze them. Furthermore, unlike source code, comments cannot be tested. As a result, incorrect or obsolete comments can mislead programmers and introduce new bugs later.

This position paper takes an initiative to investigate how to explore comments beyond their current usage. Specifically, we study the feasibility and benefits of automatically analyzing comments to detect software bugs and bad comments. Our feasibility and benefit analysis is conducted from three aspects using Linux as a demonstration case. First, we study comments' characteristics and found that a significant percentage of comments are about “hot topics” such as synchronization and memory allocation, indicating that the comment analysis may first focus on hot topics instead of trying to “understand” any arbitrary comments. Second, we conduct a preliminary analysis that uses heuristics (i.e. keyword searches) with the assistance of natural language processing techniques to extract information from lock-related comments and then check against source code for inconsistencies. Our preliminary method has found 12 new bugs in the latest version of Linux with 2 already confirmed by the Linux Kernel developers. Third, we examine several open source bug databases and find that bad or inconsistent comments have introduced bugs, indicating the importance of maintaining comments and detecting inconsistent comments.