A Review of Peer Reviews

Software development fads come and go.  Whether technology or process, little remains constant it seems. Peer reviews, however, represent one of the exceptions.  By “peer review” we are, of course, talking about design reviews and code reviews performed by fellow developers.  But do these reviews really add value?  After all, peer reviews are part of engineering process, and all too often developers note that process is a mixed bag for productivity.  Certainly most every team has a voice that decries peer reviews as counter-productive.

The truth is that value in peer reviews depends on two independent factors.  First, there is a correct and incorrect way to perform a peer reviews.  The studies which repeatedly show a significant positive impact from peer reviews, assume a specific model of how they are conducted.  Second, peer reviews require two elements of successful team dynamics. 

Let’s look at both of these more carefully.

Peer Reviews Done Right

What many organizations call “peer reviews” are actually not.  Usually the term is misapplied for what is actually a “walk-through.”   The distinction pertains to both purpose and format, and has significant ramifications for quality and schedules.  Whenever possible it is peer reviews that should be exercised, not walkthroughs.  Let’s look closer at the differences.

In a walkthrough, the purpose is to impart familiarity with design or code to the rest of the team.  The format assumes the author presents the content as others listen and ask questions.   Besides the author, usually this is the first exposure anyone has to the materials.  There can be any number of participants, including management.

In a peer review, in contrast to a walkthrough, the purpose is to locate defects. 

  • “All participants prepare by reading the design or code and looking for errors…The emphasis is on error detection” (McConnell, p493)
  • “Code review is systematic examination (often as peer review) of computer source code intended to find and fix mistakes” Wikipedia, “Code Review.”
  • “An inspection in software engineering, refers to peer review of any work product by trained individuals who look for defects using a well defined process” Wikipedia, “Software Inspection”

Other positive benefits are welcome, such as knowledge transfer and training, but are considered secondary in purpose.  The format assumes that someone other than the author conducts the review.  Furthermore, the materials being reviewed are sent (3 days) in advance of discussion.  Most of the reviewing work is done before participants gather together.  The optimal number of participants is three, including the author.  Management should not attend.

Again it must be emphasized, to achieve the advertised benefits of peer reviews, they must actually be peer reviews, not walkthroughs.  Don’t confuse the two!

Team Considerations.

If there is an objection to peer reviews, it usually centers on fear of judgment or personal attacks.  When this objection surfaces, it should be a red flag to managers of broader team problems. 

For a team to be successful, the members must trust the others will not attack them personally.  If that trust exists, then fear of conflict subsides.  This does not mean that conflict does not happen but, rather, that no one takes debate personally.

This is not some kind of rarified dynamic that only applies to successful peer reviews.  It is a dynamic that teams need with or without peer reviews.  Thus if team-mates complain that peer reviews engender hostility that must be avoided, then there certainly is a meta-issue that should be addressed.

Peer Reviews: The Benefits (ROI)

The following bullet points drive home why peer reviews are not an empty procedure. Soak yourself in them.

  • Identified as one of 9 principle industry best practices by the Airlie Software Council.  Examples of corroborating results:
    • Hewlett-Packard’s inspection program measured a return on investment of 10 to 1, saving an estimated $21.4 million per year. Design inspections reduced time to market by 1.8 months on one project
    • Inspections contributed to a ten-fold improvement in quality and a 14 percent increase in productivity at AT&T Bell Laboratories
    • Inspecting 2.5 million lines of real-time code at Bell Northern Research prevented an average of 33 hours of maintenance effort per defect discovered
    • IBM reported that each hour of inspection saved 20 hours of testing and 82 hours of rework effort had the defects found by inspection remained in the released product
    • At Imperial Chemical Industries, the cost of maintaining a portfolio of about 400 programs that had been inspected was one-tenth the cost per line of code of maintaining a similar set of 400 uninspected programs
  • “Typical organizations use test-heavy defect-removal and achieve only about 85% defect removal efficiency.  Leading organizations use a wider variety of techniques and achieve defect-removal efficiencies of 95%.” McConnell p470.
  • “Jones (“Software Defect-Removal Efficiency” 1996) points out that a combination of unit testing, functional testing, and system testing often results in a cumulative defect detection of less than 60 percent, which is usually inadequate for production software.” (p471).  Code inspection and test runs both produce about as much defect detections.  However, the variety of defects found vary sufficiently that a combination of both techniques produces almost double the defects detected (Myers 1978b [McConnell p 470]).
  • Testimony of Jeff Atwood
  • FINDING: “A study at the Software Engineering Laboratory found that code reading detected about 80 percent more faults per hour than testing (Basili and  Selby 1987).  Another organization found that it cost six times as much to detect design defects by using testing as by using inspections (Ackerman, Buchwalk, and Lewski 1989).  A later study at IBM found that only 3.5 staff hours were needed to find each error when using code inspections, whereas 15-25 hours were needed to find each error through testing (Kaplan 1995). [McConnell, p470]
  • FIXING: Microsoft’s applications division has found that it takes three hours to find and fix a defect by using code inspections, a one-step technique, and 12 hours to find and fix a defect by using testing – which is a two-step technique (Moore 1992).  Collofello and Woodfield reported on a 700,000 line program built by over 400 developers (1989).  They found that code reviews were several times as cost-effective as testing – a 1.38 return on investment vs. 0.17. [McConnell, p471]

Final Notes

Two more points should be made about successful peer reviews.  First, a peer review that does not identify defects – is suspect.  It usually means either that reviewers had insufficient review time, or that they haven’t the correct skill set to review the materials.  Second, peer reviews should not wait until the materials are complete.  Ideally the work of design or coding should be approximately 50% complete before being reviewed.  This gives time for redirection, if that is necessary.

If members of your team are reluctant to get started, even with all the compelling rationale, I suggest you make your own materials the first to be reviewed.  There’s nothing like leading by example.

  • http://www.facebook.com/clamle Cory Lamle

    Excellent article Brent! The peer-review aspect is really a colossal point that so many developers just don’t do or get well. For me the most desirable outcomes of a good peer-review help generate well written documents or hopefully a clean API (if they were not even considered first), heck even nice clean detailed comments get generated to help further explain the rationale behind the design.

    Most developers just write code for their own consumption and don’t even think of the various consumers of it (usually their peers). Too many assumptions are made that anyone can just read the code and understand the process. However in larger systems this is not always the case, and following a rabbit trail down a stack that is at minimum a 100 levels deep just isn’t any fun. In an ideal world peer-reviews should be the norm, but either laziness or busyness tends to mask out these much need practices.

  • Quester

    Liked the article Brent. There are also ancillary benefits that are probably not as quantifiable as code quality metrics. In a positive cutlture that embraces quality, there can be positive effects on both team cohesion and the professional development on the developers. The latter through the sharing of experience.