top of page

Silverthread predictive analytics and benchmarking

Writer: Dan SturtevantDan Sturtevant

Updated: Mar 5

Modeling the impact of technical health on software economics


Technical health and its impact on software economics

The health of a software codebase – an asset that may already have hundreds or thousands of person-years of economic investment – can have a dramatic impact on future productivity, quality, and other economic outcomes.  Unfortunately, it is difficult to understand how healthy a codebase is, assess the business impact problems will have, and use this information to formulate managerial courses of action. Silverthread’s approach aims to help decision-makers by helping you:

  • Assess ‘technical health’ along code quality and design quality dimensions

  • Quantify the impact that health challenges can have on productivity, quality, and other economic outcomes using predictive analytics

  • Benchmark technical health metrics and predicted economic outcome metrics against other systems to understand room for improvement, waste relative to the best systems, and the potential ROI of improvement efforts.

 

The goal of this paper is to help you understand Silverthread’s methods from a high level.  These methods were developed during 15 years of R&D at MIT and Harvard Business School and 5 years of commercial practice.  Precise scientific details can be found in our publications and international patents



Technical health measures

Our tools analyze a codebase and capture various code quality and design quality metrics across it.  We can examine each source code file within a codebase and capture many metrics including language, size in LOC/SLOC, McCabe’s Cyclomatic Complexity metric (a measure of code quality within that file), and Core-Periphery metrics (measures of the modularity/design health for the file.)  In addition, our tools capture information about the overall size of the system and the size of architectural problems (‘Cores’) indicating a breakdown of modularity across files.

 

Statistically measuring productivity and quality impact

We have done several projects statistically measuring the relationship between technical measures and business outcomes in commercial environments.  Working with partners, we mined information from version control systems (e.g. Git), issue tracking systems (e.g. Jira), and HR databases in order to be able to run statistical regressions to test significance and quantify the impact of complexity and modularity on things like:

  • Developer productivity when doing feature work

  • Developer productivity when fixing bugs

  • Defect density – volume of code that must be fixed per 1000 LOC of feature work

  • Released defects – fraction of defects that escape the testing process and may have an impact in the field

These statistical models have been run on several codebases to calibrate them in different organizational settings.

 

Predicting productivity, quality impact for your system

Our tools capture technical health metrics from your code, put them through statistical models (calibrated from other organizations’ data), and make predictions about the economics of working in your codebase.  Predicted quality and productivity estimates are generated for each source code file in your system.  Aggregate system-wide quality and productivity estimates are generated by averaging across the system, weighting the impact of each file by the volume of code within it.  (With additional work, change-tracking information can be used to refine these models by weighting each file’s impact based on the amount of recent work inside it.)

 

Our tools then use predicted productivity and quality estimates to compute higher-level software economic metrics of importance to the business including:

  • Days to develop a defect free feature (1000 LOC)

  • Bug/feature labor hour ratio

  • Required number or volume of bug fixes in codebase per year

  • Bugs with a market risk impact - % shipped in product / customer reported

  • Cost to produce features

  • $ wasted per $1M spent on future development due to productivity loss & extra bugs (vs an ‘optimal’ codebase)

 

Benchmarking process

Silverthread has collected several thousand codebases for benchmarking purposes spanning many languages.  This list is constantly growing.  For each benchmarked system, our tools have captured technical health measures and used the predictive techniques described above to produce productivity, quality, and software economic predictions.  With this information, Silverthread tools can report percentile information for comparison purposes.  To compute a percentile, your system can be compared against all systems in this data set and against a subset of comparable systems.  By default, comparable systems are chosen by selecting those written in the same (or structurally similar) language and of a size between half to double the number of lines of code (LOC) in the codebase.  If too few codebases exist within that exact size range, this window is expanded until at least 30 comparable points are obtained.  In new versions of our reporting (to be released in 2019 Q3), direct comparison to individual codebases will be replaced with Gaussian kernel density estimation techniques.

 

 

False positives & false negatives

Many factors impact quality, productivity, and software economics experienced on a project including:

  • Technical health challenges (code & design quality)

  • Development process & project management maturity

  • Quality of leadership and team composition

 

Data collected from many projects and prior studies indicate that technical health problems are highly predictive of problems on a software development project that builds on that system going forward.  In fact, Silverthread has never encountered a ‘false positive’ – i.e. in over 100 cases, we have never encountered a client whose code quality or design quality was impaired, yet but the development project seemed to be going very well.  In most cases, predicted problem areas aligned with engineers’ expectations.

 

The opposite is not always true, however.  Silverthread tools may find good technical health and may predict good resulting software economics as a result.  Nevertheless, a project may still be going poorly from a productivity or quality standpoint due to issues unrelated to the technical health metrics we capture.

 

Benchmark  dataset (as of Jan 2025)

Language

Number of systems

C/C++, Fortran, & Ada

4000

Java & C#

1300

Python

1000

JavaScript

900

 

Contact Us

Silverthread’s mission is to advance the state of software measurement practice by quantifying complexity and design quality. Our measurement know-how can establish a more trustworthy foundation for improving software economics.

 

 

Comentarios


bottom of page