Integrating Legacy Code with Agile Processes
One of the biggest challenges software developers encounter is inheriting a large amount of legacy code and having to integrate it with a newer, evolving code base. For example, an app for iOS written in Objective-C that needs to be converted to Swift would require rewriting entire screens of the app, as well as supporting classes and the core structure of the app. This is not an insurmountable problem, but it introduces some unique and time consuming problems. If the risks and benefits of the possible solutions aren't analyzed thoroughly every time the team encounters an issue while trying to integrate their legacy code base, teams can easily fall into development traps requiring constant rewrites and wasted time. This analysis provides an approach to addressing these issues and provides some guidelines for incorporating legacy code in the development of a new code base.
Risky Solution: Start over
A potential solution that the team could employ is to abandon the legacy code and implement everything from scratch. As such, they would gather new requirements, and from the get-go be able to implement new architecture and design patterns, as well as focus on inheriting small units of work to iterate over the newly developing code base. This allows the team to emphasize quality immediately, which can lead to quickly evolving design patterns and better technical solutions since the legacy solutions can be completely ignored. Unfortunately, this also increases the amount of time before a new feature can be implemented. In the case of the iOS app, a new version likely wouldn't be available on the AppStore until the new app has reached the same number of features as the current app, or it has fewer features but a vastly significantly improved user experience, which still could take a lot of time. In essence, the app would only be updated on the AppStore when the new app is not a downgrade from the current app. Additionally, if bugs are found in the current production app, they would need to be resolved, which means having some development resources allocated to maintaining the legacy app, albeit far fewer than an incremental rewrite. This increases team agility as well as code quality, but can be extremely expensive if the code base is large.
Ideal Solution: Isolate the old code
This is one of the best solutions for dealing with legacy code, but is the least likely to be possible. For some systems the legacy code can be completely isolated in a manner where it should not need to be significantly altered. Essentially, the legacy code would be treated as a library or package dependency. If bug fixes or minor features need to be added to it, then they get introduced, but major rewrites of the code base aren't done, and large features are implemented as part of a new library or dependency. This solution would ultimately result in high quality of new software, and would allow the team to focus on the quality of the code that they are introducing with each new feature instead of trying to refactor the existing code, which can allow the team to be more agile. It would also be a very inexpensive solution if the isolated code does not have to be maintained. However, from my perspective as a mobile application developer, this isn't always feasible. In the Objective-C to Swift example, it would not be possible to ignore the Objective-C portion of the code base, and only implement new things in Swift. Due to the nature of mobile apps, a major or minor feature might need to include a significant rewrite of a screen written in Objective-C.
Bulk Refactoring isn't very agile
It is important to understand that fundamentally from the perspective of an Agile development process, introducing large amounts of refactor work can be seen as inherently un-agile. Ideally, a system or a code base would be iterated over continuously so that incremental improvements can be made while incremental value is added to the code base. By either isolating the legacy code or starting from scratch, the team would be able to iteratively develop their code base, and follow other agile principles such as being able to work with flexible requirements and focusing on technical excellence and good design. It's not that these are principles that can't be done if the legacy code base has to be integrated, it just becomes much more difficult.
Compromise Solution: Rewrite with new features
So, if the code needs to be completely rewritten because of the nature of the system itself, then why not inherit those refactoring changes in bulk with the work added by new features? Well, the improvement to the code base then comes at the expense of waiting till the next feature comes along. As time goes on, design patterns evolve and can change drastically. Every new feature implemented that requires a significant rewrite becomes a snapshot in time of the design patterns at that moment. Granted, this only becomes a real problem if the focus becomes continuously adding new features in order to touch every screen to rewrite them. If new features are added and newly-added features are being improved simultaneously, then it becomes easier to introduce small amounts of refactor work on the newer screens that are now out of date with the current patterns.
Compromise Solution: No new features
The other way to approach this problem would be to put a freeze on new features for as long as the code base needs to be refactored. This would allow design patterns to be developed and iterated over constantly, and in a much shorter time frame. This gives the development team time to develop their patterns with the new code base according to their requirements and find new solutions that apply across multiple areas, rather than just trying to find a way to integrate one recently conceived, but specific solution for a new, evolved problem. The obvious detriment to this option is the fact that new features would not be developed until the code base is sufficiently refactored
How do we integrate our legacy code base?
The trade off becomes, do we want to rewrite everything as fast as possible then go back and bring everything up to the same standard incrementally after that, or do we want to maintain a high standard for everything that is rewritten currently, and take longer to get the oldest legacy code up to the new standard. The answer lies in the reason why the bulk refactor started in the first place. If the refactor is initiated because a new tool or new language version is needed, then it is imperative that the areas of code that are most heavily affected are refactored as soon as possible to accommodate for the new tool or language upgrades. If the reason is because the code quality is extremely poor, maybe even to the point where unit test coverage is suffering, then the goal should be to write new code without integrating new features, so that a reasonable code quality can be the priority without having to hinge it on the new feature. These trade offs also have issues with particularly large applications. While integrating features, large applications can still take so long to rewrite that when everything is up to a new standards, the disparity between the oldest rewritten code, and the latest design pattern might still necessitate another full rewrite. At that point, incremental changes might be too small to meaningfully improve the architecture or design patterns.
The Dilemma
The unfortunate reality is that the Ideal and Risky solutions are very rarely applicable. Usually, the cost of rebuilding the app is too high, and isolating legacy code just isn't a solution in some situations. In these cases, it is imperative that the team chooses between one of the two compromise solutions. Either continuously iterate over the currently evolving, 'new' code base, or re-write the old code in its entirety as fast as possible. In my opinion the decision has to be one of the two choices, and not a compromise between them. A compromise between them creates a worst-of-both-worlds situation, where large scale refactors are dependent on new features, or seemingly at a whim of what is the most 'critical', but newer code can't be iterated over in small parts in order to refine design patterns.
The end result is yet another divergent code base where quality is still split between oldest, newer, and newest, and consistent patterns are not easily determined from the code base. For our earlier iOS app example, if an initial goal was to convert from Obj-C to Swift because of the increasing support for Swift and lack of support for Obj-C libraries, yet a new library requires Swift 3 - which has deprecated functionality in comparison to Swift 1 or Swift 2, then the code base is stuck in a middle ground where old solutions for problems that have not been fully fixed in the old, unsupported language need to be deciphered in the current language, and new libraries are inaccessible because the current language doesn't support the code base until the next significant rewrite happens.
Integrating legacy code will impact the code quality of the application for as long as the legacy code has to be maintained and adds additional restrictions to how agile the team can be. Every hour spent updating legacy code is an hour spent in an area of the code that already has concrete requirements and is difficult to iterate over. On top of that, the magnitude of the changes that have to be done on legacy code can never be a small unit of work, if the goal is to have good quality, because all code around the minor change should be brought up to standards. The amount of time that has to be invested for a large amount of legacy code can be so significant that the overall quality of the 'modern' code base will be sub par for longer than if the legacy code was ignored or discarded.
Conclusion
There are two key principles in the Twelve Principles of Agile that are related to this idea as a whole. They are:
- `Continuous attention to technical excellence and good design enhances agility.`
- `Simplicity--the art of maximizing the amount of work not done--is essential.`
Keeping these principles in mind, the most reasonable solution for an agile team with a need to integrate a legacy code base is to make a decision on how to approach their situation. In order to `maintain a standard of technical excellence` and `maximize the amount of work not done`, the team must fully commit to either keeping all new code up to date while integrating new features and incrementally improving parts of the old code base, OR refactoring the entire codebase without taking on new features. If a half-step between these solutions is used, such as undertaking large scale refactors while integrating new features, the timeline for refactoring the current code base, and the quality of refactored code will both suffer.