Five More Lessons About Documentation: A follow-up to Mark Birch’s “Developer Documentation”

Last week’s DevBizOps blog entry (“Developer Documentation: Developers don’t like writing docs, what’s the alternative?“) asked: How programmers can get their answers from documentation, and are there alternatives? As ever Mark’s post advises on questions every developer has to ask in understanding software. Open source or proprietary, documentation is necessary either for using, extending, or changing software. The costs of searching for answers are known to developers and project managers. Empirical literature and decades of research in software comprehension and reverse engineering show that the costs of understanding software are significant. What, then, can be done?

We offer five more lessons in addition to Mark’s post, presented as “formulas”:

Self-Documented Design = Self-Explanatory Design: Self-documented code and self-documented design
Document Value = ^Accuracy⁄_{Technical-Debt}: Technical debt diminishes the value of any fixed document
Reverse-Engineered Analysis > Static Documentation: Knowledge extracted from code is much more accurate than any old paper
AI Analysis >> Reverse-Engineering: AI can improve over traditional reverse engineering
AI_n+1 > AI_n: AI progresses and “the sky is the limit”

1. Self-Documented Design = Self-Explanatory Design

Mark Birch urges developers to write self-documenting code, and I couldn’t agree more: The ultimate ‘documentation’ is the source code itself — it literally defines how the program works. But naming conventions for classes and methods/functions are only the start: Modular, well-designed code is navigated much faster.

A clear and simple architecture, in and of itself, is clear and simple documentation. Developers should strive to make their entire ontology explicit, for example by building the object-oriented class hierarchy as the requirements [Booch OOA&D]. For example, a whopping 90% of the code developed in Java could be eliminated in certain areas using functional programming, and therefore eliminate 95% of the documentation needed. So choose your programming paradigm wisely and use it to simplify your design, because simple design is self-explanatory.

2. Document Value = ^Accuracy ⁄ _{Technical-Debt}

It is interesting to learn that 83% of developers use the official documentation. We should also ask, Does it help? Does software documentation answer questions, or does it help only a little, or perhaps it makes no difference — a burden?

The return on the effort to read and understand documentation diminishes with technical debt, a subject which you presented well in a previous post [Birch 2020]. Since developers’ time is rarely spent on refactoring, streamlining design, the accumulation of technical debt eventually renders the documentation irrelevant, however much effort has been spent on it. Developers could spend considerable time on hundreds (or more) pages only to realize that the paper they are reading is–

Too abstract: The code is described correctly but only in very general terms
Eroded: The original code is described correctly but changes made conflict with the documentation (ie violate its architectural decisions)[Garlan Architectural Mismatch]
Obsolete: The original code is described correctly but major changes made the documentation irrelevant

What, indeed, is the alternative?

3. Reverse-Engineered Analysis > Static Documentation

Your post mentions one Quod.ai, which uses machine learning methods to mine for answers to specific questions. Code analysis has merit for several reasons: First, Quod.ai’s tool focus on interactive replies to specific queries. Rather than generating documents, Quod.ai take a specific query and analyses the code in search for the answer. Effective static and dynamic analysis tools could eliminate the need for most trivial documentation:

Static analysis: Create a “picture” of the program’s modules and their dependencies, eg by learning class diagrams or Codecharts [Eden: Wiley]
Dynamic analysis: Find out the various possible behaviours of the program, eg by learning formulas in temporal logic [the model checking approach]

This approach has been taken by reverse engineering tools since the 1980s with mixed results [Kazman Reengineering]. Design recovery tools can only gone so far by generating UML diagrams [Gueheneuc Design Recovery]. Our own team’s developed a tool for visual navigation in programs and generating roadmaps at varying degrees of abstraction [Gasparis Design Navigator]. Later our team has also developed a round-trip engineering tool [Eden Round-Trip]. However, the low-hanging fruits of round-trip engineering are yet to be picked.

4. AI Analysis >> Reverse-Engineering

The recent successes of data science raises the obvious question: Can AI help understand programs? There are good reasons to believe that AI can do great deal to help. If nothing else then simply by integrating the information that has already been made available by existing techniques, (1) static and (2) dynamic analysis:

Analyze natural language: use NLP to find requirements in written documentation
Analyze drawn diagrams (eg UML): use image segmentation and analysis (eg translators) to find design decisions in diagrams
Analyze secondary sources: Extract information from versioning history and features such as specific individual who made the change, including their comments.

This short list is only the beginning. As for the more distant future (2-10 years):

5. AI_n+1 > AI_n (“the sky is the limit”)

Looking forward in time we ask, Exactly how much AI will be able help developers? First, keep in mind that AI has diminishing limits on speed and space, therefore it could parse and organize very large programs nearly instantaneously. Second, and more importantly, the intelligence of its answers is only limited by the intelligence of the tool. How “intelligent” can we expect AI analysis tools to be?

As much as one can use history to forecast technological impact [eg Gartner’s Hype Cycle 2020], the history of AI does show a steady improvement in the intelligence demonstrated by machines. From the General Problem Solver (1957) and Aliza (1966) to Deep Blue’s Chess championship (1997), Watson winning Jeopardy! championship (2011), and AlphaGo Zero beating AlfaGo 100:1 after AlphaGo beat Go champions (2017), it is evident that AI has become significantly more “intelligent”.Self-driving cars, personal assistants/chatbots, and AI “shown to be as effective as humans” in the diagnosis of a broad range of medical conditions [BMJ 2018] have been largely imaginary only a decade or two ago. In 2020, AI has replaced humans in filtering CVs and passport control, and in increasing number of application areas outperforms humanity. Progress in AI has even had leading computer scientists consider the possibility of machine superintelligence [Eden: Springer].

It is fairly safe to conclude that intelligent use of AI could lead to the paradigm shift in software analysis that is long overdue. So yes, leading AI researchers and computer scientist alike believe that AI is the future!

References

BMJ 2018	Loh, Erwin. ‘Medicine and the Rise of the Robots: A Qualitative Review of Recent Advances of Artificial Intelligence in Health’. BMJ Leader 2, no. 2 (June 2018): 59–63. https://doi.org/10.1136/leader-2018-000071.
Birch 2020	Mark Birch, “Circuit Breaker: How to balance the work to do with the work of improvement”, April 2020, http://devbizops.substack.com/p/circuit-breaker-bb872cc3a47#
Booch OOA&D	Booch, Grady. Object Oriented Design with Applications. Redwood City, CA: Benjamin/Cummings Pub. Co., 1991.
Eden: Wiley	Eden, Amnon H, with contributions from Jon Nicholson. Codecharts: Roadmaps and Blueprints for Object-Oriented Programs. Hoboken, N.J.: Wiley-Blackwell, 2011. https://onlinelibrary.wiley.com/doi/book/10.1002/9780470891032.
Eden: Springer	Eden, Amnon H., James H. Moor, Johnny H. Søraker, and Eric Steinhart, eds. Singularity Hypotheses: A Scientific and Philosophical Assessment. The Frontiers Collection. Springer, 2013. http://www.springer.com/engineering/computational+intelligence+and+complexity/book/978-3-642-32559-5.
Eden Round-Trip	Eden, A.H., E. Gasparis, J. Nicholson, and R. Kazman. ‘Round-Trip Engineering with the Two-Tier Programming Toolkit’. Software Quality Journal 26, no. 2 (1 June 2018): 249–71. https://doi.org/10.1007/s11219-017-9363-9.
Garlan Architectural Mismatch	Garlan, David, Robert Allen, and John Ockerbloom. ‘Architectural Mismatch or Why It’s Hard to Build Systems out of Existing Parts’. In Proceedings of the 17th International Conference on Software Engineering, 179–85. Seattle, Washington, United States: ACM, 1995. https://doi.org/10.1145/225014.225031.
Gasparis Design Navigator	Gasparis, Epameinondas, Amnon H. Eden, Jonathan Nicholson, and Rick Kazman. ‘The Design Navigator: Charting Java Programs’. In Tool Demonstrations, Proc. of 30th IEEE Int’l Conf. on Software Engineering—ICSE 2008. Leipzig, Germany: IEEE Computer Society Press, 2008.
Gueheneuc Design Recovery	Gueheneuc, Y.-G., K. Mens, and R. Wuyts. ‘A Comparative Framework for Design Recovery Tools’, 10 pp. – 134, 2006. https://doi.org/10.1109/CSMR.2006.1.
Kazman Reengineering	Kazman, Rick, Steven G. Woods, and S. Jeromy Carrière. ‘Requirements for Integrating Software Architecture and Reengineering Models: CORUM II’. In Proceedings of the Working Conference on Reverse Engineering (WCRE’98), 154. IEEE Computer Society, 1998. http://portal.acm.org/citation.cfm?id=837030.

1. Self-Documented Design = Self-Explanatory Design

2. Document Value = Accuracy ⁄ Technical-Debt

3. Reverse-Engineered Analysis > Static Documentation

4. AI Analysis >> Reverse-Engineering

5. AIn+1 > AIn (“the sky is the limit”)

References

Related Posts

Leave a Comment

2. Document Value = ^Accuracy ⁄ _{Technical-Debt}

5. AI_n+1 > AI_n (“the sky is the limit”)