A Government Agency’s Challenge
Software doesn’t just sit still. As Alan MacCormack puts it, “Code rarely dies.” Features are added, bugs are fixed, patches are applied, and systems are extended to integrate with new technologies. Over time, codebases grow larger, more complex, and harder to manage.
Recently, I worked with a large government agency struggling to sustain and evolve its software systems. The agency oversees approximately 200 semi-independent applications written in Java, C#, Visual Basic, and Javascript/Typescript, totaling 17 million lines of code. Using CodeMRI Discovery and working through Silverthread Consulting, we analyzed their codebase and the size of their development team.
The team supporting this vast codebase consisted of 36 developers, resulting in a ratio of approximately 472,222 lines of code per developer. The CIO described the team as feeling like they were “drowning.” Developers were focused almost entirely on sustainment—keeping up with security patches, making customer-requested modifications, and addressing bug fixes. Innovation was nearly impossible, and the CIO’s vision of delighting customers with new features seemed increasingly out of reach.
To put this in perspective, I compared their situation with a company I had worked with: a scientific computing company actively innovating. Over a four-year period, their platform codebase grew from 5.5 million lines of code to 7.3 million lines, driven by active innovation. During that time, their development workforce doubled from 35 to 70 people, cutting the ratio of LOC per developer from 160,000 to 100,000—a 50% reduction. Developers in this group created between 7,000–10,000 lines of new or modified code annually (after filtering out automated changes and code movement). Their productivity and focus on innovation allowed them to remain competitive in their market.
To dig deeper, I also looked up industry benchmarks for lines of code per developer in different situations, accounting for factors like sustainment versus innovation and well-structured versus complex codebases.
Industry Recommendations for Lines of Code (LOC) Per Developer
The number of developers needed to maintain or develop a codebase depends on several factors, including whether the codebase is in sustainment or active innovation, as well as its complexity. Industry guidelines provide useful benchmarks to estimate the appropriate LOC per developer in different scenarios:
Situation | Well-Structured Code (LOC/Developer) | Complex Code (LOC/Developer) |
Sustainment | 500,000–1,000,000 | 100,000–500,000 |
Innovation/Development | 200,000–500,000 | 50,000–200,000 |
Sustainment Mode Considerations
Well-structured code: In sustainment, teams focus on bug fixes, security patches, and compatibility updates. Modular, well-documented systems require fewer developers because changes are isolated and predictable. A single developer can typically handle 500,000 to 1,000,000 LOC in these environments.
Complex code: Legacy systems with poor modularity or high cyclicality place a much higher burden on developers. In these cases, a single developer can typically manage only 100,000 to 500,000 LOC, as they spend much of their time navigating dependencies and addressing unexpected issues.
Innovation or Active Development Considerations
Well-structured code: When actively developing new features or overhauling architecture, developers need to make frequent changes. Modular systems enable focused work with fewer side effects, allowing a single developer to handle 200,000 to 500,000 LOC.
Complex code: In non-modular systems with high coupling, developers must coordinate across teams and anticipate how changes will propagate. This drastically reduces efficiency, with developers able to manage only 50,000 to 200,000 LOC in such scenarios.
Understanding the Intellectual Burden of Code
Before diving into what it means for a developer to be responsible for a codebase, let’s visualize what 100,000 lines of code (LOC) or 500,000 LOC looks like in physical terms. Here’s a breakdown:
Lines of Code | Pages | Stack Height (inches) | Stack Height (feet) | Equivalent Novels |
100,000 | 2,000 | 8.6 | 0.72 | 6.67 |
500,000 | 10,000 | 43.0 | 3.58 | 33.33 |
What does this mean for a developer?
Physical Scale
100,000 LOC fits onto 2,000 pages, which creates a stack almost 9 inches tall—about the height of a standard textbook.
500,000 LOC expands to 10,000 pages, forming a stack nearly 3.6 feet tall—reaching chest height for many people.
Equivalent Novels
100,000 LOC is equivalent to reading roughly 7 novels (assuming each novel is ~300 pages long).and would take about 2–3 weeks to read.
500,000 LOC is like reading 33 novels and could take 3 months or more, depending on the complexity.
But here’s the catch: software isn’t like reading John Grisham. If the code contains control logic, algorithms, or domain-specific knowledge, the reader must also comprehend and apply specialized knowledge—concepts that might have taken decades to learn or require deep academic understanding. And while these stacks might be equivalent in size to reading 6 or 29 novels, software code isn’t read linearly. Unlike a book, where you turn pages in order, a codebase is more like an incredibly complex “choose your own adventure” book.
Code: The ultimate knowledge work
A codebase isn’t just a collection of instructions for computers—it’s the embodiment of collective knowledge. As Stuart Feldman once said, “Writing code is like writing poetry: every word, each placement counts. Except software is harder, because digital poems can have millions of lines which are all somehow interconnected.”
A codebase represents the cumulative energy of many people over long periods of time, laying down their understanding to solve problems too large for any one person to handle. It’s the legacy of countless developers—standing on the shoulders of giants to create systems that outlast their jobs, and sometimes even their lives.
The word legacy often gets a bad rap in software, but in reality, it’s something to be proud of. A well-designed utility or library can become a sedimentary layer of the technological landscape, rarely needing updates. It’s the gift that keeps on giving, because software has no marginal cost once written.
But that’s the theory. In reality, legacy code must still be understood, maintained, upgraded, depended upon, and fixed. It’s not just the foundation for future work—it’s an active burden that requires intellectual energy. Developers must juggle two priorities: maintaining and upgrading the legacy systems and building new functionality on top of them. Without first addressing legacy maintenance, organizations can’t keep up with technological change or innovate effectively.
This is why the government agency’s developers felt overwhelmed. They couldn’t modernize their systems or introduce new features because they were too busy keeping the existing systems afloat. They were stuck maintaining outdated versions of Java, unsupported operating systems, and even mainframes. The first priority had to be sustaining and upgrading the legacy systems—only then could they innovate.
Measuring the Burden of 17 Million LOC
The government agency’s ratio of 472,222 LOC per developer might seem reasonable in a well-structured system focused on sustainment, but their codebase had significant issues that increased its complexity:
Duplication and Slight Modification: Over 40% of the codebase was removable duplication caused by decades of copying and pasting functionality across applications. Slight modifications to these duplicated segments introduced inconsistencies, increasing maintenance challenges and making the code harder to sustain or improve.
Code Quality Challenges: While the codebase did not have large monolithic cores, it exhibited significant cyclomatic complexity in critical areas. High cyclomatic complexity, a measure of the number of independent paths through the code, made it difficult to test, maintain, and predict the impact of changes.
Complicated SQL and Databases: The codebase relied on intricate SQL queries interacting with extraordinarily complex databases. These databases had evolved over decades and contained deeply nested tables, extensive joins, and non-standard conventions, creating significant barriers for developers trying to implement even minor changes.
Considering these factors, the codebase should be classified as complex, not simple. According to industry benchmarks, such a codebase would require a ratio closer to 100,000–500,000 LOC per developer to ensure effective sustainment. The nearly 500,000 LOC per developer ratio in this case impaired the team’s ability to even sustain the system, let alone innovate or modernize it.
These challenges underscored the urgent need to reduce the size and complexity of the codebase to align it with the capabilities of the existing workforce.
A Path Forward: Shrinking the Codebase to Enable Innovation
The CIO and I turned our attention to a pressing question: How could the team stop drowning and finally have the capacity to innovate? Using CodeMRI, we identified key challenges within the codebase that offered opportunities for dramatic improvement.
Duplication: A Hidden Burden
While the codebase spanned 17 million lines of code, CodeMRI revealed that over 40% of it was removable code duplication. Over the past 30 years, developers had developed a habit of copying and pasting code with slight modifications. This practice, while expedient at the time, had compounded over the years into a significant maintainability challenge, defect challenge, and security challenge.
By properly creating and using shared libraries, we determined that the 17 million LOC could be reduced to around 10 million LOC. This change wouldn’t just shrink the codebase—it would also make it far easier to maintain, test, and secure.
The Role of Language Efficiency
Another area we examined was the use of older languages like Visual Basic (VB). Older languages often require more lines of code to achieve the same functionality because newer languages operate at higher levels of abstraction. Rewriting some of the VB code in modern languages like Python or Ruby could further reduce the codebase.
Language | Relative LOC Factor | Adjusted LOC (10M Baseline) |
Visual Basic | 1.0 | 10,000,000 |
C++ | 0.8 | 8,000,000 |
Java | 0.7 | 7,000,000 |
C# | 0.7 | 7,000,000 |
JavaScript | 0.5 | 5,000,000 |
TypeScript | 0.45 | 4,500,000 |
Python | 0.4 | 4,000,000 |
Ruby | 0.35 | 3,500,000 |
If the VB code were replaced with a higher-level language like Python or Ruby, the 10 million LOC could shrink to as little as 6 million LOC, further easing the team’s workload.
Leveraging AWS for Low-Code/No-Code Solutions
Beyond reducing duplication and transitioning to modern languages, we explored how the agency’s use of AWS could help streamline development. AWS offers a range of low-code/no-code solutions, such as:
AWS Lambda for serverless functions,
Amazon Honeycode for building apps without writing code, and
API Gateway for creating scalable APIs.
By automating common tasks and reducing the need for custom code, these tools could handle some of the agency’s sustainment and integration needs, freeing developers to focus on higher-value work. Management believes that these interventions in combination can realistically bring the code down into the to 3-5 million LOC range.
Early Results: Modernization in Action
The modernization effort at the government agency is already showing promising results. The team has focused on creating common libraries and rewriting and consolidating six of the smaller applications. These early projects have achieved remarkable outcomes:
Significant codebase reduction:
By consolidating and refactoring these applications, the team has reduced the size of the codebases to just 10–20% of their original size. For example, one application that previously required 200,000 lines of code was rewritten in just 30,000 lines by leveraging modular design and eliminating duplication.
Transition to modern languages:
These rewritten applications are now in modern languages like Python and TypeScript. This transition allows the agency to hire new developers straight out of school, who are already trained in these technologies. It also makes the code easier to maintain and evolve, as the team can leverage modern tooling and frameworks.
Improved ability to keep up:
With the move to modern platforms, the team can now keep up with the latest technologies and security standards. Upgrading systems no longer feels like a monumental challenge—it's becoming a routine part of their workflow.
Enhanced security:
The rewritten applications benefit from better-designed architectures, automated testing, and modern security features. By removing outdated dependencies and following best practices, the agency has significantly reduced its risk of vulnerabilities.
A Brighter Path Forward
Through a combination of CodeMRI Discovery analytics and Silverthread Consulting, the agency now has a clear modernization roadmap:
The workforce wasn’t going to grow, but the codebase could shrink dramatically, making it manageable for the existing team. These changes would enable the organization to move beyond sustainment work and focus on innovation, empowering the CIO to achieve their vision of delighting customers with new features.
Silverthread helped the agency build this roadmap, articulate its value to executives, and secure support for the plan. By aligning their codebase with industry best practices and benchmarks, the organization is poised to transform its operations and unlock its potential for innovation.
Comments