top of page

Finding the Root Cause of a Billion-Dollar Failure – and Preventing the Next One

Writer's picture: Dan SturtevantDan Sturtevant


The Problem


SysPrime was a critical platform leveraged by 43 subsystems in a program vital to national security. Despite its importance, the platform had become so bogged down with technical debt that it was effectively unworkable. Developers could not adapt or refactor the system due to its immense complexity and entanglement.


Congress initially allocated $350 million for the program, but after over $1 billion had been spent without delivering a functional system, scrutiny increased. In response, the government began work on SysNext, intending it to replace SysPrime. However, concerns emerged that SysNext might inherit many of the same problems, as the development process and contractor accountability mechanisms remained unchanged.


The Request


Silverthread was called in to perform an independent technical health evaluation of both SysPrime and SysNext using CodeMRI Discovery. Within one day, we generated diagnostics for both platforms and prepared for a contentious meeting involving government stakeholders and contractors.


What We Found


SysPrime’s technical health confirmed the worst suspicions. By any objective measure, it was the most architecturally challenged Java codebase we had encountered in 15 years of MIT research and commercial application.

  • A Critical Core of Spaghetti Architecture: SysPrime contained a single, massive core of tangled dependencies, encompassing over 6,000 files and 2 million lines of code. Developers were trapped in this web, unable to make changes without introducing new defects or destabilizing the system.

  • Code Quality Issues: Over 2,600 files (also 2 million lines of code) failed to meet basic NIST code quality standards.

  • Economic Impact: Cost models predicted that $810,000 of every $1 million spent on SysPrime would be wasted, as technical debt overwhelmed development efforts.


SysNext, though less mature, was already showing signs of trouble:

  • Multiple Small Cores: SysNext’s Java portion contained six critical cores (problematic tangled clusters) and ten emerging cores, collectively involving 4,000 files and 800,000 lines of code. The largest core had 1,500 source files.

  • Code Quality Issues Persisted: Over 2,000 files failed to meet NIST standards.

  • Contractor Variability: Analysis revealed stark differences in quality between subsystems delivered by different contractors, enabling stakeholders to pinpoint sources of risk.


The Meeting


The meeting brought together government stakeholders and the contractors responsible for SysPrime and SysNext. The contractors began with handwaving explanations and deflections, but CodeMRI Discovery allowed us to cut through the spin:

  • We drilled into SysPrime’s architecture, identifying its critical core and quantifying the impact of its complexity on agility, defect density, and development costs.

  • For SysNext, we pinpointed the emerging cores and identified contractors responsible for subpar subsystems. This gave the government the ability to hold specific parties accountable.


Our findings weren’t just technical—they were economic. By translating architectural issues into financial terms, we showed the staggering inefficiencies these problems caused and how they undermined agility, developer productivity, and timelines.


The Outcome


Within a month, the Air Force made sweeping programmatic changes:

  • Leadership Shift: Responsibility for the overall program was transferred to a different group within the government.

  • Contractor Accountability: Contractors responsible for problematic subsystems in SysNext were removed or sidelined. Those delivering quality work were elevated to leadership roles.

  • Development Overhaul: New Agile processes and contracting methods were introduced to prevent future failures.


Key Takeaways


  1. SysPrime Was Preventable: If CodeMRI Discovery had been used before selecting SysPrime, the Air Force could have avoided a billion-dollar failure.

  2. SysNext Was Salvageable: The analysis provided a roadmap to address issues before SysNext followed SysPrime’s trajectory.

  3. Objectivity Changed Everything: By providing objective, data-driven insights, CodeMRI Discovery eliminated knowledge asymmetry, enabling the government to make defensible decisions that could withstand scrutiny and even legal challenges.


A Brighter Future


Two years later, the story took a turn for the better. One of the key stakeholders from the original project returned with news: the contractor responsible for the troubled system had taken the criticisms to heart. Armed with the insights from CodeMRI Discovery, the contractor had spent two years refactoring some parts of the system and rewriting others. The result was a refactored version of the system that far exceeded expectations, both technically and economically.

The new system was assessed using CodeMRI Discovery, and the transformation was undeniable:

  • A Leaner, Cleaner Codebase: The codebase now contained 500,000 lines of code, compared to the original 2 million.

  • Eliminating Cores: The new system had no critical cores larger than 30 files, making it highly modular and adaptable.

  • Improved Code Quality: Fewer than 2% of files failed to meet NIST code quality standards, compared to 2,600 files previously.

  • Benchmark Excellence: On economic and technical health benchmarks, the system ranked in the top 5% compared to industry standards.


The organization had done the hard work and created a system that was agile, modular, and evolvable—one that served the mission and the government far better than its predecessor. For the first time, they had a system that could adapt to changing needs without introducing technical debt or bottlenecks.


“You can’t manage what you don’t measure.” This well-known principle, popularized by W. Edwards Deming, became the cornerstone of their success. By using CodeMRI Discovery to provide objective quality metrics, the government gained the visibility needed to make tough decisions, hold contractors accountable, and drive meaningful change.


Managing Technical Health for the Long TermThe success didn’t stop there. The new system was continuously monitored using Silverthread’s Comprehensive Architecture and Refactoring Engine (CARE), which helped control and prevent the reintroduction of technical debt. Regular updates and analyses ensured that the system remained healthy, modular, and ready to evolve with the mission’s demands.


To reinforce confidence in the system, Silverthread awarded its Architectural Seal of Approval (SASA), certifying that the system met rigorous architectural and quality standards. Additional tools from Silverthread, such as ROI modeling, scheduling analytics, and componentization support, further ensured that the system stayed on track.


Broader Impacts


The success of this effort inspired the organization to broaden its scope, applying these techniques to other systems in its portfolio. By linking CodeMRI Discovery with internal version control and issue tracking systems, they unlocked deeper insights into their software processes, enabling smarter decisions across their enterprise.


This brighter future was made possible because Silverthread and the government partnered to shine a light on hidden problems, set objective metrics, and work collaboratively toward solutions. The result was a system that taxpayers deserved—one that was efficient, reliable, and capable of serving its mission for years to come.


This case demonstrates that contractors and the government can be partners in change. With tools like CodeMRI Discovery, clear visibility, and a commitment to objective quality metrics, even the most troubled systems can be transformed into benchmarks of excellence.


Closing Thought


If the government had used CodeMRI Discovery before selecting SysPrime, they could have saved a billion dollars and years of effort. The lesson is clear: if you have a critical infrastructure decision to make, run CodeMRI Discovery now—before it’s too late.


This case demonstrates the transformative power of objective analysis. When billions of dollars and critical national security capabilities are on the line, relying on guesswork or vendor assurances is too risky. CodeMRI Discovery gave the Air Force the clarity it needed to avoid another catastrophic failure and restore program health.


If you’re facing a critical infrastructure decision, don’t wait for failure—run CodeMRI Discovery now and gain the insights you need to succeed.

Comments


bottom of page