Application Modernization Assessment and Oversight Methodology
© Copyright 2015, Don Estes
The greatest mistake that a modernization project can make is to proceed as if modernization were new development. Few sites have specific modernization experience readily available on staff, so it is understandable that people will proceed based on what they know.
Our application modernization consulting services are designed to augment your staff experience and assist framing your decisions on transitional planning. We can also provide on-going oversight regarding execution. We work with you to chart a path through the rapids, and continue to ensure that you shoot through successfully.
Proceeding like new development can be unfortunate for sizable, mission critical applications, with unforeseen problems causing delays and cost increases as well as missed opportunities to enhance the return on investment. We provide clear, business based parameters for deciding on the best path from where you are now to your chosen destination. Many times, vendor products and vendor services can ease that transition, but in other cases a site is perfectly capable of doing the job internally.
This essay on assessment methodology focuses on how we assess the relevant decision parameters for the transition, but it is also intended to provide a detailed overview of where modernization differs from both maintenance programming and new system development programming. We prepare our findings and recommendations, and present them to our clients. We are also prepared to continue in an oversight role and work as an adjunct to your project staff, or to take a direct role in the technical management of the project. In this role, we focus on ensuring that the planned procedures are in fact implemented as intended.
There is usually not a big decision regarding using in-house versus retaining outside vendor services. Many if not most sites would prefer to do the job themselves, time and resources permitting, though perhaps with some judicious assistance or tooling where appropriate. Others would prefer to outsource the learning curve and the project risk. Still others have tried to outsource the project, failed, and are bringing it in-house out of necessity. Every situation has distinct and often unique characteristics.
When we discuss costs of each alternative transition strategy, we utilize the preferred approach, whether it be in-house, outsourced, or mixed. On request, we can cost multiple approaches to the same transition strategy, given fully burdened cost information and productivity information on the potentially available resources. Although the methodology following assumes for discussion purposes that all alternatives will be evaluated, if any path can be eliminated a priori (e.g., there are no candidate COTS packages), then that step in the methodology is bypassed.
The biggest question is this regard is the first question – whether your project is most appropriate for a strategic modernization, a tactical modernization, or perhaps a hybrid of the two. By focusing on one direction we can eliminate a lot of time and effort toward alternatives that are not appropriate for your circumstances.
This essay assumes that we are discussing a single application as the scope of the project. However, if we are asked to review a portfolio of applications, we might perform a preliminary assessment across the portfolio before conducting the full assessment on the individual applications chosen to be addressed first.
Finally, please note that, as we discuss both business rules and requirements, these are definitely not interchangeable terms. As we discuss in greater detail in our Business Rule Extraction essay, business rules do not change during a modernization (they are said to be “invariant”) but requirements that are not impacted by business rules are free to change in order to add business value, especially in business process optimization. When we refer to business rules, we mean specifically the invariant rules that govern update processing, while requirements can vary within limits imposed by practicalities.
Strategic Vs. Tactical Modernization
A strategic modernization will be addressing organization wide processes to provide benefits to the whole business while a tactical modernization maintains focus on costs and benefits primarily inside IT. This is primarily a business question, not a technical question. What results does management expect to derive from the investment?
There are four primary questions for the preliminary assessment:
- What benefits do you seek to bring to the wider organization?
- Are you prepared for the budget and time frame that will be required for a strategic modernization?
- What is your organizational appetite for risk? What is the potential cost of errors and omissions in the business rules?
- How do you see achieving these objectives?
The answers to these initial questions will guide the balance of the assessment.
Strategic Modernization Issues
If the preliminary assessment points toward a strategic modernization, then we need to examine the major elements of a strategic focus that will impact the course of the project.
- Process Optimization Strategy
- Data Migration, Cleansing and Synchronization
- Data Modeling
- Business Rule Extraction
- Program and Project Management
- Commercial Off The Shelf (COTS) Applicability
We will address these issues in more detail in Step 7 of the Assessment Methodology. Note that a COTS solution could be strategic or tactical, depending on the package.
Tactical Modernization Issues
If the preliminary assessment points toward a tactical modernization, there are multiple practical technical alternatives to reach those goals:
- Replace with a manually rewritten application based on the legacy design, (which is quite distinct from a strategic re-design and rewrite).
- Replace with a commercial off the shelf (COTS) package.
- Re-architect the legacy design into a modern implementation framework using a tool based approach.
- Re-host your existing code base, typically from a mainframe to a distributed client-server environment with or without code renovation.
- Renovate your existing code base with a technology refresh (language, database, user interface, etc.) with or without re-hosting.
- Retain the existing application as is but extend it by SOA enabling its services.
Choosing the best technical strategy (or combination of strategies) is non-trivial, and involves understanding the business case, schedule goals, resources available, residual value in the legacy assets, technical details of the application implementations and deployments, and appetite for assuming IT risk. Then, if either outsourcing some or all of the project, or if a COTS package is under consideration, we add vendor risk to the list.
We argue that choosing the best strategy for your legacy modernization initiative is primarily a business question, not a technical question, except insofar as technology directly impacts the business. Just like any other investment decision, legacy modernization issues can be categorized under assets, liabilities, cost, risk and business value.
Furthermore, there is no single path from where you are to where you want to go, if indeed you have identified your destination platform. Instead, there are often multiple alternative paths, and no single alternative may constitute the obvious choice. Each path will have advantages and disadvantages, and each must be weighed appropriately in the context of your unique organization. There is no one-size-fits-all answer.
The most typical destination platforms are:
- Java in a J2EE environment
- C# in a .NET environment
Both will usually include a relational database management system (RDBMS) as the primary data store, but in some cases an object or NOSQL database may make more sense.
However, these are hardly the only destinations. Open source alternatives to proprietary environments are becoming robust enough to constitute significant competition for proprietary vendors, though the majority of sites stick with proprietary products for mission critical applications. Rules engines, or business rule management systems (BRMS) as they prefer to be known, are gaining traction in application architectures. Business Process Management (BPM) platforms can have an adjunct or a primary role as well. Finally, newly emerging technologies such as declarative programming might offer advantages that outweigh the risks from investing in early stage technologies.
Large mainframe sites are often times not ready to abandon the security of the mainframe for newer architectures, and are running mixed CICS/COBOL and Java environments, either on z/OS with zAAP processors for Java or on z/Linux using inexpensive IFL engines. The substantial economic advantages of using these specialized processors, with CPU & software costs a tiny fraction of general purpose CPU & software costs, is not always fully appreciated by staff who focus on technical issues rather than costs.
Hybrid Modernization Issues
As its name implies, a possible hybrid strategic/tactical project will require us to look at both approaches for your project, on the assumption that it will start out with tactical improvements and eventually transition into a strategic project. This hybrid approach allows a site to achieve some of its goals sooner, demonstrating value in the project, and especially reduces risk in the overall project.
Steps 1-8 are common to both strategic and tactical projects, and so we discuss them prior to the assessment steps that apply only to one or the other. Steps 9S and 10S apply only to strategic projects, while 9T and 10T apply only to tactical. Hybrid projects may evaluate both branches.
Step 1 – Residual Value in Legacy Assets
Summary: step 1 is a deep technical analysis of the existing legacy system.
For both strategic and tactical assessments, we need to collect and analyze the legacy artifacts both qualitatively and quantitatively, under appropriate conditions of confidentiality and security of course. We need to understand technically:
- Legacy hardware and operating system(s)
- Legacy language(s) and quantities of source code in each
- Legacy database(s), data model(s), and physical quantities of data
- Legacy online processing transactions, TP monitor(s), and messaging protocols
Then we need to conduct a detailed source code analysis. We start with a parsing tool that will allow us to perform some level of code analysis, even if it is only a lexical analysis. Ideally, it will be the same tool planned for business rule extraction usage.
- Source programs broken down by application area
- Source programs broken down by update versus query only
- Source programs further broken down by online (conversational vs. pseudo-conversational / non-conversational) and batch processing types (periodic and transactional batch)
- Data integration dependencies and other issues among applications that are planned to remain on the legacy platform and between applications remaining and being modernized off the legacy platform
- Ideally, the programs can be evaluated for their Cyclomatic Complexity
We also need to understand the service level agreements under which operations are conducted for both batch and online processing.
A difficult task for many sites is to analyze their legacy applications as financial assets and liabilities. As IT people, we tend to think of our computer systems and the software running on them as somehow different from buildings, equipment, intellectual property, and other investments. Yet senior management is obligated to manage the organization’s assets for the greatest good, and this requires objective analysis.
We meet many people who start off with a preferred strategy. Some people want to just throw out the old software and start fresh. Other people have a longstanding emotional investment in the software that they have spent a career building and expanding. These people may be too close to the software to analyze its value dispassionately.
We perform separate structural and functional analyses of the applications under consideration for modernization. We separate them because an application design may be trapped in old technology and yet fully serve the business function. We have seen an application written in the 1960s, consisting of over 10 million lines of assembler code, which has a perfectly fine design that is serving the business well.
Excessive costs and inflexibility may not be a characteristic of the application design per se, but of its implementation. Sometimes fixing the implementation can liberate a great deal of value. On the other hand, sometimes the problem is the opposite: inherent functional design problems are implemented in a reasonably modern and low cost manner.
The structural analysis focuses on how each application is built and on how it does its job, but not so much on what it does. We look at the language(s), the database(s), the hardware platform(s), the operating system(s), the user interface(s), and deeper structural issues such as data model normalization. If practical, we compute or estimate formal complexity metrics to guide subsequent analyses. We evaluate adjunct software such as reporting, document management and data analytics. We group these results under infrastructure issues.
The functional analysis reverses this point of view. We focus on what each application does, and not at all with how it does it. When we focus on functionality issues, we completely ignore the language, database, platform and other infrastructure issues. We also ignore functionality that is related to the infrastructure. For example, upgrading a legacy application from indexed files to a relational database would automatically allow the use of an ad hoc reporting system. Ad hoc reporting through a tool would not be counted as part of the legacy system functionality, because it derives from the infrastructure. We only consider functionality that results from business rules expressed in the application programs.
If we were to construct a Venn diagram of the functionality of the old, “As-Is” system and show its intersection with the functionality of the desired new “To-Be” system, we would see 4 distinct areas. The most obvious is usually the Obsolete area, functionality that is no longer needed at all. Perhaps the least obvious is the Preserved area, where the functionality is just fine, however uncomfortable some technologists may be with the infrastructure and style of the implementation. The third area, Enhanced, contains the existing functionality that requires functional enhancements (though without changing business rules). And, finally, we have the fourth area which consists of Wholly New functionality.
When we consider Wholly New functionality, we have to carefully consider the possibility of using a Business Process Management (BPM) platform to integrate manual and automated work flow processes. If so, we need to exclude from this analysis the part of Wholly New that would be implemented with the BPM platform on top of the legacy application, and focus on Wholly New functionality within the legacy application itself. Similar consideration will be given to other adjunct platforms, such as document imaging or data analytics.
Now we come to the most important question of the functionality analysis: what percentage of the business logic in the legacy programs is to be assigned to each bucket: Obsolete, To Be Preserved, and To Be Enhanced. We generally assign whole program sources to each bucket and count the results, though estimates by subject matter experts (SMEs) generally suffice for a qualitative assessment, rounded to the nearest 10%. As an example, let’s say that these three percentages were 10%, 50% and 40%, respectively, totaling 100%.
Once this is complete, we estimate the quantity of code that would be required to implement the Wholly New functionality, and express that as a percentage of the existing code base. To extend our example, let’s say that this would be 30%, for a total of 130%.
These percentages drive much of the ensuring analysis, as we consider ratios of these percentages. A key ratio is (Preserved + Enhanced):(Wholly New). Using our example, this ratio would be 3:1, (50%+40%):30%.
A similar question is,
“how well does the legacy functionality support the business process?”
To answer this question, we ask to what extent has the underlying business process changed since the system was designed and implemented, and how much of the current business process is being supported by the system. This might seem odd to some people who could reasonably expect the answer to be 100% by definition, but we have seen systems where fields in the database have been re-used for new purposes, output reports imported into spreadsheets and re-analyzed according to different business rules, and server based databases operating in parallel that contain overlapping data. Clearly, these are cases where the business process has diverged from that of the original design, so that the Preserved area will tend to be relatively small while Enhanced will tend to dominate. These will tend toward re-architecting if a replacement strategy is not ultimately selected, as implementing a renovation strategy will be complex.
More typical is the situation where the legacy system has mostly kept up with the business process changes, but there are a backlog of changes to the business process that cannot be implemented until the software is modified to support those changes. For these, renovation can be considered without the negative bias we apply in the case where the business process has diverged.
Why do we consider re-architecting and renovation at all? Isn’t it better to just redesign and rewrite in all cases? There are those who say that it will cost you 80% of the cost of a rewrite to change more than 20% of an existing system, so why not just bite the bullet and rewrite it? This is a seductive argument, and for smaller scale applications, we would sign on to it.
But for larger applications, consider the very high failure and cost/delivery overrun rates found by Capers Jones and by Warren Reid. A more recent report by Gartner, though focused on the insurance industry, reaches a similar conclusion and is largely applicable to all:
Poor planning and an overly high level of optimism are resulting in a significant risk of failure for the legacy modernization initiatives of CIOs at life and P&C insurers. According to a Gartner survey, only 42% of projects meet the original budget, and 82% take longer than expected.
This simplistic “80-20 rule” argument does not scale to large applications. When it makes sense to adopt a strategic modernization strategy, we are perfectly happy to recommend it, but we do so in the context of financial feasibility. The ROI may fail organizational financial decision parameters, or the cost of a full re-design and rewrite is not affordable regardless of the benefits. So, we have to approach this part of the analysis very carefully, to ensure that management has full information regarding the costs, risks, and opportunities.
Clearly, if an application were 0% Preserved and 10% Enhanced, then the legacy assets have so little residual value as to be mostly a liability. In this case, a replacement of some kind, either a COTS package or a de novo redesign/rewrite, are the only realistic options.
If our findings are at the other end of the spectrum – that 90+% of an application’s functionality should be preserved and enhanced – it would make sense to at least consider some type of renovation and/or re-architecting strategy. (Remember that we are not considering the infrastructure, the implementation, external tools such as ad hoc reporting, external integrated applications such BPM based workflow, or maintenance cost issues at this time.) Of course, life is seldom so neat. The typical case is somewhere in between, so we are dealing with shades of grey rather than a black or white result.
Our rough and ready rule is that if our key ratio is 1:1 or greater, then we are biased toward a renovation or re-architecting strategy that will extract residual value out of the legacy code base, but at less than 1:1 our bias turns toward a replacement strategy. As we will see below, this 1:1 boundary will move up or down depending on the other parameters of the analysis.
The residual value in the application is expressed qualitatively for an initial assessment analysis. If the initial assessment analysis points to significant business decisions that may be based on the results, then it may be worth the investment of time and money to derive a quantitative value.
For our qualitative assessment of residual value, we use the Preserved and Enhanced percentages offset against the effort of modernizing the infrastructure. A high key ratio value can be offset if the application is written in an obscure language on an obsolete platform using a non-standard database or file system. Conversely, a lower key ratio can be offset if implemented in a standard language on a standard platform with a relational database system.
So far, we have not considered the item of greatest importance to many projects – the optimization of business processes. There is a subtlety here that is rarely recognized – optimization occurs in the work flow and query part of the application. Risk control is centered in the update transactions. If we ensure that we have the update business rules correctly specified, then the work flow and query processes can be optimized with much less concern. However, if the update business rules are incomplete or in error, then work flow and query related rules won’t matter because the new system will be DOA – dead on arrival.
Step 2 – IT Risk Tolerance
Summary: step 2 analyzes the willingness of the organization to tolerate and accept risk, and challenges the findings by asking what management would be willing to pay to minimize risk.
The second step of our analysis addresses IT risk, including both financial issues and business culture issues, at a more detailed level than the preliminary analysis. Cost and business value are clearly financial issues, but risk can be more of a business cultural issue than a financial issue. We focus on how IT addresses the nuts and bolts of getting the job done.
Discussions of risk tend to make technology people uncomfortable. Technologists deal with finite state machines, so there should be no place for probabilistic assessments of risk. Everything we do as technology professionals is underlain by the assumption that these systems we build and maintain are fully deterministic.
However, the findings of complexity theory tell a different story. Modern software systems are fiendishly complex. Once we get beyond the level at which we can simultaneously hold all elements of an application design in our minds, we must begin to deal with probabilities, so we talk about the probability of an error occurring. Complex deterministic systems acquire inferential and probabilistic characteristics in practice, if arguably not in theory.
Consider a fundamental error in the way that we control IT risk – testing. We always ask, “is it tested?” We never ask the correct question, “how much was it tested?” The reality is that non-trivial IT systems are never 100% tested, because such testing (including edge cases) is not affordable. (Our dynamic business rule extraction process focuses on 100% branch and path coverage, not on 100% coverage of all permutations and edge cases which would indeed be unaffordable.) Programs may be 99% tested, or 90%, or 50%, or even less, and the higher the complexity the lower the percentage of testing is likely to be – simply because of cost. So, we have risk in our systems everyday, risk that we largely ignore, just as we ignore the highway risk as we plan our commutes to work. We infer that the residual risk is trivial, and act accordingly whether or not that is factually true.
And it is rare for senior management to be fully apprised of this key fact, and take proper precautions for the business. In 1995, an Ariane 5 rocket was destroyed because of a software error that had not been found in testing, at a cost of $400 million. To save money, it was decided to forego the $100 million premium for launch insurance on that rocket. It’s a safe bet that the executive who decided to do without the insurance did not know that the software testing was less than 100%.
Since all systems contain latent risks of incorrect functioning or outright failure, we need to ascertain the level at which people are willing to tolerate risk in their systems. Obviously, if the consequence of an error is that you lose a $20 sale, that’s very different from losing a $400 million spacecraft or a $50 million aircraft with 100 people on board.
As part of the second step of our analysis, our interviews with senior management can generally provide a good grasp on sensitivity to cost issues (available resources) and on the business value issues (quality of service and business agility goals), but discussions of risk can expose communication problems. If you simply ask someone what is their tolerance for risk, you will usually get an unhelpful answer: “none.” This is patently false, of course. Did they let their children out of their sight? Did they drive to work this morning? Are their systems set up for real-time redundancies so that no single event can take them offline?
The reality that must be confronted head-on in our analysis is that not only are risk and cost in mutual opposition, but that risk, cost, and business value (quality of IT service with consequent business agility) are all intertwined.
It’s the old consultant’s analysis in a new form:
- minimum risk,
- minimum cost,
- maximum business value (quality+agility).
Pick any two.
And why is this? We can speak of complexity all we want, but that is an abstraction. The practical explanation is that, for a replacement strategy, we only think that we can express the business rules succinctly enough for an actual software implementation. It is not unusual to think that 500 business rules will define a system, only to discover well into the project that 500 has become 800, or even 2,000.
The project risk derives directly from a human inability to fully specify all implementation level detail in advance of a project starting with a white sheet of paper. It is simply too complex a task for the human mind when dealing with modernizing systems of moderate to large scale. People can tell you how they want the new system to be different, but they can’t tell you 100% completely how they want it to be. It was from this observation that our dynamic business rule extraction methodology was derived so that we could get to 100% of the business rules without an unreasonable expectation on the human subject matter experts.
Tool assisted tactical modernization projects using an automated re-architecting or renovation strategy have lower risk profiles, but these are not zero risk either, due to the nature of software analysis and transformation tools. We do consider re-architecting or renovation project designs, in the absence of dynamic BRE, to be lower risk and therefore constituting a more conservative technical strategy. Even though there are different optimization scenarios, there is no escaping the relationships among risk, cost and business value.
As business value (quality and agility) requirements rise, so do costs, so there is a temptation to skimp on risk mitigation, e.g., testing, or to believe vendor assertions about a “silver bullet” solution. Conversely, as resources are utilized to control risks, business value will decrease, because resources are diverted from improvements to quality and agility. This can be seen in terms of features implemented, the robustness and flexibility of the implementation, the ease of use of the system, and other attributes. Similarly, there may be compromises to save money in the short term that increase costs in the long term.
One common fallacy is that project risk can be contained by a fixed price bid from an outsourced vendor placement. However, only financial risk can ever be constrained in this manner, not opportunity costs, and even the ability to constrain financial risk is not 100%. We’ll return to this point below as we discuss vendor risk.
The only way to quantitatively evaluate risk tolerance in a meaningful way is to ask, “how much will you pay to minimize risk on this project?” In other words, how much will you pay as an insurance premium and how much will you self-insure (i.e., how big a policy and with what deductible?) You have to put it in dollars and cents. Do they have armed guards for their children? Are they willing to pay for a giant SUV and the corresponding gasoline costs in order to minimize their commuting risk? Are they willing to pay for geographically dispersed data centers in order to ensure constant up-time?This is an exceedingly difficult question to put to management, and frequently the question will be passed off to IT in the form of, “how much testing should we do?” But this is a business question masquerading as a technical question.
The relationship between risk mitigation achieved (low residual undetected defects) and cost of doing so forms a diminishing returns curve. What point on this curve can you pick that is clearly superior to another?
All too often IT will attempt to provide an answer to this question. But we are interested in the basis for this answer rather than for the answer itself. The proper response is, “we’ll never find all the bugs, we just work diligently to find the ones that are likely to bite us. We’ll test as much as you want, and after that we just have to assume the operational risk that something we missed will be revealed in production.” There is no technical answer to this question, and management is ill served by any attempt at a technical answer to what is truly a business question.
What IT can do and do well is to optimize the results of testing for a given testing budget, just as IT maximizes business value for the budget available. There are a variety of strategies, from minimum risk project designs to test automation, comparison testing, and SME testing, as we will discuss below.
As we ask this question, we accompany it with this explanation to ensure that it is considered in the proper context. We point out that there are residual undetected faults in the software running today, and that maintenance programming can inadvertently trip one or more of these faults, or create new ones. This is not a reason to adopt the “do nothing” strategy we will discuss in a moment.
Once we have established a meaningful measure of risk tolerance for each application under consideration, we apply that measurement to the boundary line between tending toward a replacement strategy or tending toward a re-architecting/renovation strategy or a hybrid incremental modernization strategy. The lower the risk tolerance, the higher the boundary, and the more we are biased towards a conservative technical strategy, i.e., a renovation or re-architecting strategy. Conversely, the higher the risk tolerance, the lower the boundary, and the more we are biased toward a complete replacement strategy. Cost sensitivity has a similar effect, in that a high sensitivity to cost will bias toward a renovation or re-architecting strategy, but a preference for business value can result in a bias toward replacement (either COTS or re-design/re-write). When there are business optimization opportunities that are driving the modernization, high business value can bias the analysis towards incremental modernization (conservative but resulting in a re-written application) versus a go-for-it rewrite strategy that would argue for a test driven modernization strategy to control risks while accelerating delivery over incremental strategies.
So, as we conclude step 2, we recognize the relative importance of the risk, cost and business value issues for each unique organization, and use this to guide the subsequent analysis.
Step 3 – Business Case Analysis
Summary: step 3 is understanding the business case, the why of pursuing application modernization, and of identifying the expected benefits to the business.
Is this trip really necessary? Is “do nothing” an option? Despite protestations to the contrary, “do nothing” is always an option in modernization analyses, because the systems in question are functioning in daily production. Indeed, a not infrequent occurrence of an issued RFP for modernization is a decision to reject all proposals, resulting in “do nothing” being the “winning” strategy. Therefore, the business case analysis must objectively analyze the pros and cons of continuing as is, because the costs of transitioning to the proposed new system could exceed return on investment requirements. As part of this, we ask about the direct costs of maintenance and indirect costs, such as business opportunities missed or current income endangered.
Most important of all, what payback period is required to fund the project? Typically, we find that tactical projects are required to have a 24 month or shorter financial payback period, 36 months at most, for internally funded projects. Then, once we know the period of analysis, we can ask, “what is the payoff for the business if we succeed?”Although we will discuss cost reductions below, it is almost never true that a modernization project can be justified for the savings in hardware and software within a 24-36 month period. The most professional of the hardware and software vendors have marketing analysts who constantly update their pricing to ensure that they never push their customers to the point where they have a big payback for leaving their existing environment. This is a modern variant on Cardinal Richelieu’s 17th century adage,
“the art of taxation consists in so plucking the goose as to obtain the largest amount of feathers with the least possible amount of hissing.”
The business case justification for application modernization will be found (or not found) in what it means for the business – the additional business income opportunities enabled or the operational cost savings outside of IT resulting from business process improvements. Going into detail on this topic is beyond the scope of this essay, but reductions in business operational costs on the order of 50% are not unusual for Business Process Management (BPM) implementations when applied to appropriate problems. Applying business architecture principles to optimize business processes can provide long term competitive advantages to the overall enterprise that are compelling. See, for example, the Business Architecture Guild.
On the other hand, if optimization is the goal, then we should be focusing on the business case for strategic modernization. The payback period is likely to be far greater for a strategic project, though if not then a hybrid strategy may be called for in order to progressively liberate value rather than all at once.
However, a proper business case analysis will also ask, “what’s the downside for the business if we fail?” And let’s remember that a significant percentage do fail outright, as well as experiencing cost/delivery overruns and functionality shortfalls among the projects that do deliver. The assessed risk times the estimated costs of a failure must be added to the liability side of the analysis. Similarly, an estimated probability of cost overruns (including the financial impact of delivery overruns and the financial impact of functionality shortfalls) should be derived and compared to the failure impact. Or, the costs of insuring against it happening can be added instead.
This risk provision is too frequently overlooked in IT business case analyses, usually for reasons of excess optimism but sometimes because of a business culture that looks on risk provisions as being unduly negative. Team players should not be negative, so there can be a perceived career risk from looking too closely at project risks.
However, we argue that this risk provision is important not only for the financial accuracy of the analysis, but also because it shows the importance of a proper risk assessment of the chosen technical and risk mitigation strategies. When projects do go off the rails, it can be argued that a failure of analysis at this point was the beginning of the problems that eventually led to the unpleasant result. Conversely, a proper analysis can point to risk mitigation strategies that can immunize a project against negative results. Prudence should not be interpreted as being negative, but as ensuring success.
Our Test Driven Modernization methodology will insure against project failure and unexpected cost/delivery overruns, but it comes at a cost. Very high quality testing, as we discussed above, increases apparent costs (though it will also reduce unquantified and unpredictable future costs in fixing residual defects at the end of the software development life cycle – it’s “pay me now or pay me more later”). This cost may be seen as wasteful by overly confident business analysts and technical staff, but it also needs to be quantified as much as possible. For one data point, a project modernizing 2.3 million lines of IMS COBOL into Java and Oracle took 6 programmers 18 months to construct the complete set of test cases needed to perform all the dynamic business rule extraction. This was a small part of a $100 million project, but it was an expense nonetheless.
At the conclusion of step 3, we will understand the business case for and against legacy modernization. If “do nothing” is the clear winner, our analysis may shut down at this point, but more typically the analysis will amplify the results of step 2. However, if the results of the step 3 analysis contradict step 2 findings, we resolve the contradictions before moving on.
Step 4 – IT Cost Savings
Summary: step 4 is determining quantitatively what IT cost savings are likely to accrue from a successful project. This is performed for a strategic as well as a tactical project since both produce IT savings, though for strategic the greater benefit is usually to the wider enterprise.
Part of the business case analysis is realistically assessing the extent of IT cost savings resulting from modernizing the technology. However, this step is usually relevant only if the current platform is a mainframe.
Although we argue that IT cost savings can almost never justify a legacy modernization project on their own, at least within a typical 24-36 month payback period, there are significant savings that can be achieved and these need to be considered. On the other hand, if a project is going to be financed, say, over a 7 or 10 year period, then it could well show a significant ROI on this factor alone, using external rather than internal funding.
Industry average costs for mainframes run on the order of $1,000 to $5,000 per MIPS per year, including the cost of the hardware, maintenance, and software charges. Using the lower figure of $1,000/MIPS/year, the mainframe capacity equivalent to a virtualized, distributed server on a public or private cloud can cost more than 100 times more. This disparity is so large that many IT executives with long-term mainframe experience cannot accept it until they duplicate the results themselves.
Of course, as a technical matter, comparing otherwise identical applications running on platforms with dissimilar architectures, infrastructures, and implementations must necessarily show some significant performance variance. And as soon as one raises a discussion about these relative benchmarks, someone jumps to their feet and talks about the problems associated with cross-platform benchmarks.
These critical remarks about benchmarks are generally both absolutely true and totally irrelevant. Typically cross-platform variance is on the order of +/- 50%, and pathological examples can show variances on the order of a factor of two or three, as you move an application from platform to platform. But when we compare platforms in which the cost difference is a factor of 10 or 100, technical factors are simply irrelevant. Only business factors matter: cost, stability and security.
Stability and security are issues that must be satisfied in order to even consider capturing the cost benefits moving from a mainframe to RISC or Intel platforms. It is absolutely true that proprietary platforms and mainframe platforms in general are as rock solid and secure as humanly possible. But the most relevant question is not “What is the best?” but rather “What is needed?” and “What can be afforded?” Today, Windows and Linux on Intel, properly managed, can deliver stability that exceeds the business requirements of most applications.
On the other hand, some applications can absolutely justify the high cost of mainframe technology. We performed a system design for a new application for a major US metropolitan police department in which the use of mainframe technology made sense, and we recommended an IBM mainframe running both mainframe Linux and z/OS, splitting the workload between the two environments. For another example, we have reviewed high-transaction-rate financial applications that have had a downtime cost in excess of $5 million per hour. From this point of view, $20 million on a mainframe is not a difficult decision to make. There are also cases where the central administration advantages of a mainframe outweigh the platform cost disadvantages. And, to be sure, IBM continues to improve the price/performance of its mainframes.
However, performing a benchmark or a complex return on investment calculation misses the point. The best way to find out what hardware and software platforms are really needed is to ask the question, “if you were implementing a replacement application today, and if you had your choice of any platform, would you or would you not choose a mainframe?”
Far too often we find the justification for a mainframe to boil down to the fact that it is currently running on a mainframe, a circular argument that we find wanting. But where there is a solid justification for a mainframe, we are very comfortable supporting that recommendation.
The conclusion of our step 4 analysis is compared to the business case analysis to see whether this factor increases the bias one way or the other, or whether it proves the platform question to be irrelevant.
Step 5 – Schedule Goals
Summary: step 5 addresses whether schedule goals realistic with respect to the preferred project strategy.
When would you like to have the replacement system? When must you have it? What are the consequences if the delivery is late? What is your Plan B? Step 5, like step 4, is a check on the business case analysis.
Clearly, an aggressive schedule is going to have a negative impact on the cost, quality and risk parameters. It is also easy to lose sight of the fact that, although one woman can make a baby in 9 months, it is not possible to hire 9 women and induce them to produce a baby in one month.
At the conclusion of the schedule analysis, a finding of an aggressive schedule sharply biases the analysis toward a conservative technical strategy, in some cases overriding all other considerations. The less you have to change, the less likely you are to break something in the process, and the sooner you will be up and running.
Step 6 – Resource Analysis
Summary: step 6 addresses whether the financial and human resources available are comparable to the scale of the undertaking.
What resources do we have available? How elastic are those resources? Step 6 serves as a check on the step 2 analysis, but also allows us to establish as early in the assessment as possible whether or not an organization has a realistic match between goals and resources. Too frequently we find a case of champagne tastes on a beer budget, if not a lemonade budget.
But resources include more than just money, though money is very, very important. Almost as important is management commitment and the ready availability of subject matter experts (SMEs). One of the best encouragements towards a successful project is the dedication of key SMEs, without interruptions. On the other hand, interrupted availability of SMEs, particularly interruptions of unpredictable timing and duration, is one of the best ways to ensure late delivery, cost overruns, and quality shortfalls. Promising availability of SMEs and then not providing them consistently is also a good way to end up having arguments with your vendor over the project.
The step 6 analysis goes beyond just checking on the anticipated budgets. In step 6, we begin to construct straw man project plans for both strategic and tactical projects, at least to the level of detail whereby we can derive a preliminary budget. Even if the goal is a strategic project, a preliminary look at projected costs against alternatives is a useful reality check, if we don’t waste a lot of time on them. If it is reasonable to proceed, a risk and benefits estimate will also be prepared.
Step 7 – Other Issues
Summary: step 7 addresses many detailed technical and operational issues that may or may not affect the overall conclusions of the assessment. We look at these residual issues before we focus on the differences between tactical and strategic projects.
Testing is frequently cited as the largest single expense in any IT project, though that may be true only when testing is done to very high standards. The ugly truth that we faced up to in step 2 was that there have always residual undetected faults in non-trivial systems, so that we ask that the testing budget be based on business criteria. Scientifically thorough testing (all logical permutations plus edge cases) is not affordable in virtually all commercial IT projects. Indeed, even the measurement of testing thoroughness, known as test code coverage analysis, is rare in commercial testing even at the basic level of branch and path coverage (each true/false decision in each program is exercised at least once).
Validation Versus Comparison Testing
There are two general approaches to testing: determine if a program is operating correctly (“validation testing”), or determine if a program is operating the same as another program (“regression testing” or “comparison testing”), presumably one that it is replacing.
Though it may not be obvious at first glance, it is both more accurate and significantly less expensive to adopt a comparison strategy. However, comparison testing – when it is used at all – is primarily applied to tactical modernization projects. Our Test Driven Modernization methodology extends comparison testing to strategic modernization projects as well, but this is a recent innovation and so is not commonly done.
For validation testing, the tester must determine what constitutes valid functioning and what does not. That set of criteria must be documented, and data found or created that will appropriately exercise the code. Then the test must be performed, multiple times if problems are found. Validation testing must prove the program to be correct, and determining what “correct” means can take a very long time, in direct proportion to the complexity of the program code. It’s not the execution of the test that is expensive, but the creation of the test case to execute that is expensive.
By contrast, with comparison testing, we only have to execute the old program against equivalent data, and compare its results to the new program. Properly speaking, the tester should be using test code coverage analysis to ensure that the data being used tests the program thoroughly enough, but the creation of the test case is much simpler (though “simpler” does not imply “simple”). The tester does not have to learn the program and what it is supposed to be doing, only how to run it. Proving that it is the same as another program is therefore significantly easier than proving it is correct.
Test code coverage analysis is a white-box testing technique whereby the execution of a program is followed internally and reports of what did – and what didn’t – execute are prepared. Traditional coverage analysis offers both cumulative statistics over multiple test cases as well as coverage of the most recent test or set of tests. It is analogous to interactive debugging, except that the execution trace is recorded.
The most common form of coverage analysis today in the Java environment is JUnit. A test or set of tests is executed, and if a pre-determined threshold is reached, typically 80% or 90% coverage, then the code under test is considered to be adequately tested. Few executives, even those with technical backgrounds, realize that code is almost never 100% tested.
There is a subtlety here that is again almost always missed. People cite JUnit tests as constituting sufficient coverage analysis testing. But coverage analysis of the new application code will only expose errors in the new implementation. Comparison testing against the legacy application is needed to expose errors in business rules and requirements, and coverage analysis on the legacy system is required to expose omissions in business rules and requirements. This is the only way to test for errors and omissions in the business rules and associated specifications.
A colleague once questioned our approach to testing. “Why do you want to go to all of this trouble? When something breaks, we’ll just fix it.” The problem with this criticism that it fails to take into consideration the likelihood that an error will be seen and recognized as such.
Business rules can become very complex. A given transaction may not be recognized as giving an erroneous result for a very long time, years in some cases. When finally exposed, then there is the problem of finding all instances in the data that were affected by the defect plus the effort of fixing all that data.
Unfortunately, it is not just the problem of fixing the code when a defect is recognized – we have to fix the data which by that point in time may be impossible. Because of the defect, subsequent processing on the affected data will also be defective, even though the code was correct. (This is the process by which defects propagate throughout a set of data.)
From a management point of view, it becomes a case of “pay me now or pay me 10 times later.” This issue is taken into account during the risk tolerance analysis step.
Modernization Versus Maintenance Testing
There is a further issue specifically with application modernization projects that make testing more expensive than maintenance testing. Since application modernization strategies impact the whole system, all aspects of the system must be tested, down to the tiniest detail. By contrast, during normal program maintenance, we only have to test the changes to a program or small group of programs.
The cost of such broad brush testing is a direct function of the complexity of the system. A renovation project may be significantly less expensive than a re-design/rewrite project, but the testing budget should be the same at the same level of risk. So, we could end up with a $5 million rewrite effort with a $5 million validation testing budget, which does not seem unreasonable at first glance, but a $500,000 renovation effort with a $5 million testing budget will set off financial alarm bells.
As a real world example, we analyzed a library of COBOL code for a prospective modernization project. Our code analysis revealed 106,000 independent logical pathways through the various programs, in 1.1 million lines of code, making this old legacy application very complex indeed. Devising tests to exercise and validate each of these 106,000 pathways is a significant technical effort, but one that we have proven in practice to be cost/effective. However, testing to exercise all permutations of those test cases and all edge cases in the data is simply impractical for anything short of a lunar landing – if then.
Test Case Construction Exposes Erroneous and Missing Business Rules
It is our opinion, based on the experience of the two projects discussed in the Business Rule Extraction essay, that the analysis that accompanies building the test cases satisfactorily exposes errors and omissions in business rules for strategic modernization projects. Errors and omissions in the business rules are the source of the business risk in the project. (This is not an issue in a tactical modernization using a renovation strategy, but it can affect those tactical projects using a re-architecting strategy.)
How is this so? Well, a business rule controls updates to the application’s data, and errors/omissions in those rules will cause the consequences of that defect to propagate unpredictably. Defects in the specifications for functional and especially non-functional requirements which do not invoke business rules will not cause any consequences other than the inconvenience to the user of that function.
Defects can propagate into and throughout the related data only when the data is changing. The defect propagates to related data when incorrectly processed data is further processed even when processed according to correct business rules. An error or omission in the business rules casts a long shadow over the data.
As a result of these experiences, we consider 100% branch and path coverage to be sufficient to ensure 100% complete and correct extraction of active business rules. Extending it to McCabe cyclomatic test coverage is more complete from a testing perspective, but will probably not expose any additional business rules. However, using cyclomatic test coverage will not usually be significantly more expensive than branch and path coverage, so from a cost/effectiveness point of view either could be used.
Note that we do not require that edge cases in the data be a part of the dynamic business rule extraction process. This is because the essence of modernization is to prove that the new application is executing the same business rules as the legacy application. They do not have to be correct in the validation sense – they only have to be the same. We assert that this definition is sufficient because the existing set of rules is running the business, even if it would fail the edge cases. If there are undetected defects in the existing rules, the rules are nevertheless correct by definition because the business is operating successfully.
This does not exclude the possibility of some implementation defects – our assertion only applies to the business rules. For example, we saw one case where the new implementation intermittently produced a 1 pence error in Sterling currency calculations. Eventually it was traced to the internal differences in rounding error calculations in the respective compilers. The business rule was right – the implementation was wrong. Comparison testing caught it though with no problem because we directly compared the new against the old.
The above discussion on validation versus comparison testing presumes that testing will be outsourced or else conducted by staff without an in-depth knowledge of the system. When you are hired to test a system that you don’t understand, you must test everything because you don’t know what can safely not be tested.
Subject Matter Expert Testing
When significant system knowledge is available to be leveraged in testing, subject matter expert (SME) testing can prove to less expensive still than comparison testing, provided that SMEs are consistently available to the project. However, this only works for projects with low risk profiles where the potential cost of an undetected defect in production is low.
This is not because SMEs are inherently more efficient than any other testers, but they can much more effectively triage the test cases that do not need to be tested thoroughly than someone with no intimate knowledge of the system. Thus SME testing is less accurate than outsourced testing which must be thorough if using a professional testing outsourcer, but if the SMEs really know the system well then the business risk reduction should be optimal for the available testing budget, for an appropriate risk profile. Again, the thoroughness of testing is tied back to the risk tolerance analysis.
Many projects make the unfortunate assumption that the new architectures will give them all of the speed that they need. In many ways, this is true, but there are bottlenecks.
For example, the most common bottleneck is batch processing. Batch programs were built on the implicit assumption that the program and database would be operating on the same platform, with tiny latencies. However, when you re-host a batch program in a tactical modernization to a distributed architecture where the programs and the database are on different platforms, you have to consider the impact of network latencies. If fabric connected, it may be acceptable. However, if there is a 10 microsecond Ethernet latency times 1 billion I/O requests, that adds up to almost 3 hours. There are solutions for this, but they have to be considered for their impact.
Another concern is very high performance systems that will require transaction rates at the limits of their technology, such as message rates in the hundreds or thousands per second or database rates in the range of hundreds of thousands of database calls per second. These require a careful look at the planned hardware and software architecture.
A related performance concern can sometimes be found in the data architecture of the legacy database or in the planned data model in the target database. Sometimes, legacy designs impose an I/O bottleneck that causes throughput to scale up to a limit that no amount of hardware can overcome. Similarly, excessive re-normalization of the database into a theoretically ideal data model can create a performance bottleneck where none existed on the legacy system.
Deployment Into Production
Plans for deployment into production need to be taken into account at the assessment phase, not left for later consideration. If a “big bang” deployment is acceptable and sufficient time permits, then there may be no problem. If it is going to take several weeks to make the cutover, and you cannot shut down your business for that period of time, then the project design must take this into consideration in the planning stages.
Related to deployment is the ability to reverse the deployment. The Maine Medicaid disaster is a salutary example, where neither the vendor nor the IT staff recognized the early warning signs as signaling an impending disaster. But the point here is that once they turned on the new system, they had no way to go back once it was switched on. Prudence requires the ability to do so be baked into the process.
Closely related to deployment is the issue of code maintenance during the project. With a “big bang” deployment, changes on the old platform to functions that have been completed on the new platform must be made to both sets of code and tested on both. If a data synchronization strategy is selected, the dual maintenance may not be necessary. (See discussion below).
Data migration is an issue which needs to be addressed from the first day in any modernization project. The physical migration of the data including validation can become highly complex and is frequently underestimated in project planning (along with data cleansing which we address in the next section).
Time is the most critical element in data migration. It can take days or even weeks to unload a legacy database, transfer the data to the new platform, and load into the new database. It is a rare system that can be shut down for such a length of time, so that provision must be made for parallel operations while the bulk load takes place and then for the incremental updates to be migrated. It makes a great deal of difference whether the legacy system can be shut down temporarily during the switchover and for how long. If measured in seconds or milliseconds, the switchover can become very complex indeed. Volumes, time criticality, the impact of data model renormalization, and related issues are examined and findings reported along with recommendations.
Synchronization is a complex issue closely related to data migration. First, the databases must be synchronized at the moment of switchover, which is figured into the time criticality findings above.This is mandatory for switching into production.
Second, however, synchronization affects the deployment planning for the modernized application quite fundamentally. If we synchronize once at the moment of switchover, then the project must plan for a “big bang” switchover into production which creates operational risks in case of unanticipated problems.
However, if we maintain the data synchronization between the legacy and new databases for an extended period of time, then we can plan for an incremental deployment and eliminate the “big bang” risk entirely. It means that the legacy and the new system can operate in a production parallel mode for an extended period of time which can run from days to years. In case of an identified defect that affects operations, you can immediately fall back to the legacy system at any time required, for selected transactions or all transactions.
Extended data synchronization also allows for an automated testing scenario which can reduce the cost of testing. Deployment risks are fully managed in an extended data synchronization scenario.
Extended data synchronization can eliminate the dual maintenance issue discussed above. Once functions have been designated production on the new platform and the legacy functions decommissioned, then subsequent maintenance can be provided solely to the new code. However, while operating in production parallel mode, both sets of code must be updated simultaneously.
Our assessment report will go into depth on these issues, and provide guidance in the recommendations for how best to proceed.
Data cleansing is a related issue in legacy modernization that must be considered in designing and costing a project. However, data cleansing generally has the same costs across all legacy modernizations project, so it will not affect a decision directly.
We usually recommend cleansing the data on the original platform, so that data issues do not create false positives in testing the modernized application. A full investigation of data quality issues is properly speaking a separate project, so that our methodology asks subject matter experts for their opinion on budgeting for data cleansing. This can be expanded if requested.
Rationalizing Variant Systems
Rationalizing, i.e., merging two or more similar application systems is one of the most difficult and risk prone modernization activities one can undertake. Yet it is a business necessity in many cases, particularly as a result of an acquisition.
Like data cleansing, an assessment of the requirements of a merger project is properly speaking a separate project. However, unlike data cleansing, the requirement for a merger can impact the selection of a modernization strategy. This can determine that the project must be a strategic modernization or at least a hybrid project.
Pulling requirements from two systems during re-design would appear to be pretty straightforward if viewed naively. However, this can suffer from the same problems of specificity in any re-design project, squared. Nevertheless, this may be a necessary approach in some cases, and the project assessment must take careful consideration as a result in the typical scenario where some of the business rules overlap between the two systems.
We approach a merger project with a thorough business rule extraction on both systems. We prefer to modernize the riskier one first, apply comparison testing until that one is proven equivalent, then to begin to merge in the updated and new rules from the second system. This is the lower risk approach.
If merging variant systems is a requirement, it will be included in the assessment under our methodology. The costing impact will be assessed based on the degree of functional overlap, which will be investigated by discussions with subject matter experts.
Source Code Risk Factor
When any legacy modernization strategy plans to utilize the legacy source codes, it is necessary to assure that all of the source code is indeed present and, furthermore, that it is the correct version of the source code. If the source codes are not tightly controlled with a change management system, assuring that the project has the correct versions of the source code can increase the costs of re-architecting and renovating the system, and of extracting a complete and correct set of business rules. On the other hand, just assuming that the sources are correct and up to date can create a significant risk of propagating out of date business rules.
Vendor Risk Factor
If some or all of a project is going to be outsourced, particularly in the case of an application replacement strategy, then our analysis will consider vendor risk. We do not consider a fixed price bid approach to provide adequate protection for a client’s interests in all cases.
Vendor risk comes in a variety of forms. It includes competence issues, such as failure to execute or failure to execute correctly, and ethical issues, such as what some call “playing the change order game” to maximize their revenue.
An essential problem is the source of vendor risk. All IT projects have, with apologies to Donald Rumsfield, both known unknowns and unknown unknowns. These factors are only partly predictable, and can have unforeseeable consequences. The client generally seeks to minimize their risk by shifting it to the vendor. Conversely, the vendor will seek to minimize their own risk by shifting it to the client. Therein ensues a struggle that is sometimes the basis of upfront negotiations, but often is ignored during bidding and contracting only to reappear once work begins.
This struggle can leave the client open to the change order game, which can be subtle. A vendor will respond to an RFP giving stated specifications with a low-ball bid that will indeed implement the specifications as stated, but knowing all the while that the specifications are incomplete and/or incorrect to some extent. When the inevitable changes to the specifications occur, change orders come at inflated prices so that the vendor can make a profit on the project. Unfortunately, this is a strategy that works very well, so that the unscrupulous vendor will prosper while an honest vendor will lose out. This is one of the problems with a client relying on a fixed price bid in order to shift risk to the vendor. The vendor’s tactics will defeat the intent of a fixed price bid by manipulating the refinement of specifications, and the client may not get the most competent vendor.
Professional economists deal with problems like this through game theory. For example, the change order game can be defeated by cost plus a fixed fee bids, provided actual cost can be established objectively. Risk shifting is best dealt with through a formal recognition that risk should be assumed by one party, the other party, or shared.
There is another way in which fixed price bids are defeated, though this tactic carries some risks for the vendor. We have seen vendors change the rules of the game part way through a contract essentially by bullying the client. “Yes, it’s true that we are 50% over budget and it’s our fault, but unless you find the money to pay us anyway we are going to stop work and you’ll have to sue us.” Because the client needs the project completed ASAP, and because it would be personally and professional embarrassing to have a project go to litigation on their watch, client management will sometimes capitulate and pay up. This is a difficult problem to handle, and it takes careful planning on the part of the client before letting the contract and starting work.
If this is a concern, we discuss our Test Driven Modernization methodology as an antidote to the change order game. If we develop a complete set of tests, we will in the process develop a complete and correct set of business rules to feed into the requirements as well as creating the actual tests. Crucially, the RFP needs to specify that the tests constitute the actual requirements and that English descriptions of the tests and business rules may contain errors of interpretation that the vendor must own. Make it clear that there will be no change orders until the tests have been met, and you thereby defeat the strategy of a low-ball bid followed by overpriced change orders.
There is also an economic reality that needs to be understood. Small vendors have shallow pockets. If you choose to do business with a small vendor, you cannot treat them as if they were one of the giants of the consulting world. Typically, you will get much better value for money with a small vendor, properly managed, but to do so means assuming most of the project risk. If you choose not to accept any project risk, then you must be prepared to pay the prices charged by the major consulting firms and system integrators.
In general, we approach vendor risk in several ways. In all cases, the personality and methodology of the administrative project manager and of the technical project manager are key to controlling vendor risk. Frequent intrusive oversight observations of the daily work process by a knowledgeable project architect, both on vendor premises and client premises, can reveal problems while they are still manageable. Even weekly project status meetings will not necessarily reveal all problems.
We recommend specifically including project risk in any RFP, and establishing upfront who owns the risk. If the risk is going to be pushed onto the vendor, we recommend a cost plus fixed fee bid rather than a simple fixed price bid. If the client is going to assume the risk, we recommend either agile development methods with frequent deliveries of code, no longer than once a month, or a minimum risk project design such as discussed below. If a replacement strategy is taken but the project is too large for agile methodologies, or if the Enhanced and Wholly New portions of a project are similarly too large for agile approaches, then a unique approach needs to be crafted. Remember that all waterfall project specifications are incomplete and contain errors within the stated specifications, be vigilant, and be ready for inevitable problems.
Step 8 COTS Evaluation
Summary: step 8 evaluates whether or not a COTS solution is likely to be a realistic alternative.
If we could license a COTS package and plug it in unmodified, it would be the best solution from a risk point of view. We could travel to other sites and see the exact software that we would be using in daily production, and learn from the experiences of those sites. Without trivializing the effort of converting legacy data into a COTS package and configuring a package with a complete set of business rules, which are not without some significant risks for both, this is by far the safest solution – provided the COTS package is used without modification to the actual programs unless any such modifications will be made by the vendor and be supported by the vendor as ongoing processing options.
Unfortunately, this is rarely the case. Usually it is a choice of modifying the package or changing the underlying business process to match the package. Although there may, arguably, be benefits to adopting a new business process, doing so is not without cost. Management may feel that is it less expensive and/or less risky to change the package than to change the business process.
So, we are back to analyzing the existing system, even though this analysis will be to determine the specifications governing the required modifications to allow adoption of the package (or packages) under consideration.
If the business process is not going to change, the only fully complete and correct specification of the rules that govern the business process is the legacy source code itself. However, we only have to apply the functionality analysis from step 1 to the COTS analysis. The infrastructure of the legacy system is irrelevant here.
We will still have Obsolete functionality that we can ignore. We have functionality that needs to be Preserved in moving to the COTS package, and we have functionality that needs to be both preserved and Enhanced during the move to the package. Finally, there is the Wholly New functionality that must be implemented if not already supported by the package.
Assuming that any modifications that are to be made will be outsourced to the COTS vendor, the COTS project must provide that vendor with the full list of specifications required to go into production with the new system. Unless the modifications are trivial, we recommend against attempting to modify a system with which you have no experience.
But, as we discuss at length in the Business Rules Extraction essay, doing this precisely and completely is the most difficult part of the project. It is equally so with implementing a COTS solution.
As a result, a COTS project can be subject to the same problems in delayed delivery, cost overruns, and outright failure as a re-design/rewrite project, in exact proportion to the degree of modifications required. The dual tombstone in the graphic provides the most important lessons learned from COTS projects in trouble. Step 8 concludes with a straw man estimate of the costs, risks and benefits of pursuing each COTS solution under consideration.
Steps 1-8 are common for both strategic and tactical projects, but steps 9 and 10 are dependent on the preliminary finding. We may assess only strategic modernization or only tactical modernization from this point, though we will do both if a hybrid project is a reasonable possibility.
Strategic Modernization Assessments
Summary: steps 9 and 10 look at the two key issues unique to a strategic modernization, deriving a complete and correct set of specifications (along with relevant underlying business rules), and plans for optimizing business processes.
Step 9S – Specifications For a Re-Design/Re-write
The cost of a replacement application of course varies considerably from one application to another, even when normalized on a per LOC basis. We have seen (and worked on) projects that went well over $100/LOC and as little as $10/LOC. The biggest variances comes not from the cost of programming, but from the costs of design and testing, specifically obtaining and proving business rules and deriving and refining specifications.
Years ago, the cost of programming the limited machines of the day dominated the cost of writing a new application. Today the cost of programming has been eclipsed by the cost of writing complete and correct specifications which are then programmed fairly easily, relative to 30+ years ago. Indeed, getting the specifications right is the major source of the risk that projects will run over on cost and delivery, while failing to deliver all desired features. Our Test Driven Modernization methodology is focused on controlling these costs and risks.
Recall that the discussion on failure statistics referred only to major projects, 10,000 function points or greater for the Capers Jones study referenced in the Publications page. This can be roughly equated to 1,000,000 lines of COBOL or C, or 1,000 programs. The 10% of projects that deliver on time/on budget were all of this size or greater. Although the study does not specifically state it, we can be assured that almost all of the sample projects were modernization projects, for the simple reason that virtually all projects today are replacing existing applications.
Our recommendations for moderate and large scale projects can be very different from our recommendations for otherwise similar small scale projects. The reason is very simple: complexity. As projects increase in size linearly, their complexity increases exponentially. A system of 1,000,000 lines of code will not be 10 times as complex as a system of 100,000 lines of code, but more like 20-50 times. A system of 10,000,000 lines will be 400-2500 times as complex. This has direct bearing on the relative risk of different sized projects.
As a result, we are very careful in discussing a rewrite of a project of the order of 10,000,000 lines, because of the risk. By contrast, we were recently asked about automated translation of 50 programs written in an obscure language into a standard language, and our recommendation was to rewrite them (though not to re-design them!) This is such a small amount of code that our risk concerns were minimal for a rewrite.
Consider the example system cited above in Step 7 – Testing, with 106,000 logical pathways through the program code. A full re-design/rewrite effort was estimated at about $8.5 million by our client, a major system integrator, based on the published specifications. We estimated that the published specifications included perhaps half of the existing business rules, none of which were obsolete. $8.5 million was very, very optimistic as total cost for this project.
It is not hard to see how this project would unfold without providing for this complexity. The vendor winning the project would start by conducting JAD sessions to add details to the published specifications, and then would begin to implement those specifications. Once implemented, they could not be put into production, because of the insufficiently detailed logic that would be revealed in testing. So a cycle would begin of adding additional specifications, increasing the cost through change orders, then more testing, then adding more specifications and more re-work to the cost, and so on. A cycle like this could go on for years, and frequently does. The application owner failed to understand the complexity of his own application and take suitable provision.
“At the beginning of any project, your specifications are 70% complete … and 50% correct.”
Jim’s solution to this problem is agile programming, a solution that we agree with, up to a point. Agile is a brilliant technical and project management strategy for those smaller scale, low to moderate complexity applications.
However, in a modernization context for moderate to large scale applications, the assumptions underlying agile break down. The primary assumption is that the users know what they want and can storyboard the resulting system allowing the final design to emerge. The problem is – they can’t, because people can’t tell you what they don’t know. It is faster, cheaper and safer to apply a waterfall design to extract the business rules, and then apply agile to an iterative implementation. This agile modernization works, whereas pure agile will get stuck in a long tail of revealed problems or go into production with undetected problems that will cost far more over a long time to discover and fix both the application and the data.
The conclusion of step 9S is a rough order of magnitude estimate of costs, risks and benefits for a re-design and rewrite project, with options for business rule extraction and for automated testing of the resulting system to insure against production problems – assuming no process optimization. In Step 10S, we consider process optimization, and then go on to consider tactical strategies.
Step 10S Process Optimization Strategy
Are you using an organized methodology such as business architecture to identify opportunities for optimization. How do you envision the actual manual and processing changing? Do you anticipate changes in the business rules (as opposed to changes in the functional and non-functional specifications)?
These are crucial questions if automated testing and/or database synchronization are going to be a part of the project design. Both require that there be equivalent update transactions (or sets of transactions) on both the legacy and new platforms.
There can be a perceived conflict between process optimization and automated testing or database synchronization. However, process optimization should only be concerned with query processing which does not affect either automated testing or database synchronization. Our fundamental axiom of modernization is that business rules are invariant across a modernization transformation of an application. If true, then there is no conflict, but if false then automated testing and database synchronization are excluded, and the risk profile of the project will increase substantially. Whether or not there is a conflict, exploring these issues can reveal a great deal about risk factors for the project and lead to specific recommendations for project management.
For this assessment, we will work with optimization estimates from the project team and – if available – the business architects or enterprise architects that are driving the optimization. We will report on these findings, and estimate their impact if relevant.
Tactical Modernization Assessments
Step 9T – Renovation Assessment
Summary: steps 9 and 10 look at the two key issues unique to a tactical modernization, retaining the existing legacy sources in some form, or retaining the existing legacy functionality in new code derived from the existing legacy code.
Renovation projects are characterized by preserving the original source code in some form. They may or may not include:
- Retaining the online source code as is and wrapping the code in some form to publish the transactions as services.
- Re-hosting the source code to a new platform.
- Using parsing technology to translate the source codes in some form.
Parsers work according to a selection of rules defined at the outset of the project and then refined as the project proceeds using feedback from testing. We frequently get involved with re-hosting and code translation, but only rarely with transaction wrapping because there is little value created, but sometimes it can be a good idea.
Re-hosting (or re-platforming) projects are often conflated with code translation because they do frequently appear together in the same project. However, we have done code translations with systems that will remain on the original mainframe, and we have re-hosted code that was not modified at all but relied on emulation products to operate essentially unchanged in the new operational environment. They are separate, but we consider one, the other or both as part of renovation because the original source code is being maintained in some form. This is distinct from re-architecting as discussed in the next section in which the legacy source code is analyzed but then the results of the analysis are used to create a new application with completely new source code that nevertheless provides the functionality of the original very closely.
Although renovation projects can take many forms, the following are the most common:
- Language translations, often from COBOL to modified COBOL, from some 4GL to COBOL, or from either one to either Java or C#. There are dozens of other translations that have been created and used at one time or another, but they tend to be obscure and rare.
- Database translations, either from VSAM, another indexed file system or a pre-relational database to a relational database management system.
- Online translations, such as from IMS/TM or third party transactional processing monitors to CICS, or from any to a non-transactional messaging system such as Websphere MQ. These are unusual but far from rare.
- Exotic transformations, such as a restructuring processing to untangle spaghetti code into a more manageable form.
There are many misconceptions about renovation technologies. The most common is that it is an off the shelf process where you send off the code (or bring the translator into your site) and after processing is returned for you to compile, test and deploy. Even the most sophisticated code translators require tuning to the unique set of source code under consideration. We worked with one mainframe code translation system that had 128 separate options that could be invoked, most of which resulted from unique requirements in prior projects.
The net result is that code translation is always an iterative process. The out of the box translator is configured for your code, the code run through, examined and configuration parameters (or translation internals modified) updated, and then run through again. This process continues iteratively until the code is judged satisfactory.
Note that the best code translators will reprocess all the code on every update. This is very desirable because as testing proceeds, not only are defects corrected, but opportunities to improve the code are found and implemented. Even code that has been translated and tested satisfactorily can benefit from these improvements as the project proceeds.
Another common misconception – fostered by vendors – is that COBOL and other procedural languages can be translated successfully to Java or C#. Strictly speaking, that is correct – it will translate and the result will run correctly in a J2EE or .NET environment. However, the results will illustrate less object orientation than a system that was written from scratch, and will be unfamiliar to Java/C# programmers who have never worked with COBOL programs. On the other hand, the Java/C# code will be familiar to the legacy programmers and will help them make the transition from COBOL or whichever legacy language(s) they have been using into Java or C#. In other words, you are getting what was described to you, but you may not get what you were expecting to receive.
Our assessment will consider renovation according to the degree of interest on the part of the client, and provide an objective assessment of the pros and cons of this approach.
Step 10T – Re-architecting Assessment
Whereas renovation preserves the original code in some recognizable form, re-architecting preserves the functionality of the original programs without preserving the code. There are a small number of full service re-architecting vendors, and a few more that specialize in certain languages and transformations, such as 2-tier client server to modern n-tier architectures.
Re-architecting does not have the “JOBOL” problem in which COBOL is translated to Java but still looks like the original COBOL. The re-architected programs will be properly object oriented and restructured into appropriately sized modules.
Re-architecting can be considered as a half-way house between a re-design/rewrite project and a code translation project. Alternatively, it could be considered as a tool assisted pure rewrite project. Both descriptions are true. And the cost is half-way as well.
Hybrid Modernization Assessments
Summary: a strategic modernization may be the ultimate goal, but if risk mitigation and/or resource availability rule out a project going directly to the end state then an incremental modernization strategy combining elements of both tactical and strategic modernization projects will be considered.
If neither tactical nor strategic modernization is clearly preferable, then we will conduct steps 9 and 10 for both tactical and strategic considerations, and then provide one or more alternative paths that could be considered that involve elements of both tactical and strategic modernizations. One example hybrid project would be to use an incremental modernization methodology to renovate your code onto your final target platform and database, then to incrementally redesign the data model and rewrite individual programs. Another approach is to extract the business rules, populate a business rule management system, and write Java or C# “glue” code to invoke the rules and drive the user interface. Many other hybrid projects are possible depending on the characteristics of the technical details.
We conducted one study on a 7.5 million line library of COBOL operating against a pre-relational database where the site planned to go to Jaava. A very serious consideration is that they volumes of data were high, the transaction rate was high, and we were generously given 15 minutes to convert the data and make the switchover. The only way this can work is in an incremental modernization where we first convert the data – leaving the programs on the mainframe linked to the database server, then re-host the COBOL, and then rewrite into Java. Attempting anything else will fail on deployment because of the performance and uptime requirements.
Risk reduction in a major project is based on the fundamental principle underlying agile development methodologies: many small deliveries of working code is less risky than a few big deliveries. This hybrid design took this principle a step further, by allowing the legacy and replacement systems to operate in parallel against the same data.
Reversing this principle, if we have as many steps as possible, the corresponding overrun rates should be close to zero, and the failure rate will be zero – for the simple reason that we always have a working system under this design.
Here are a summary of the steps for a minimum risk application modernization project designed for one vendor client which would be applicable to many project requirements:
- Using renovation technical strategies, replace the non-relational database with an RDBMS.
- Using renovation technical strategies, move the application to the target hardware, database and operating system environment and place into production. (Note, these first two steps may be reversed in order or combined into one.)
- Use our Business Rule Extraction methodology, extract the rules from the legacy code and populate a rules engine or a componentized design with an appropriate technical framework to handle the user interface. In parallel, the Wholly New development begins using agile programming.
- Using comparison testing, prove the business rules extraction by direct comparison with the legacy transaction, side by side, retiring each legacy transaction as the new transaction is proven equivalent to the old and placed into production, one module at a time. This can be done in an automated or manual fashion (though automated is less expensive in the end).
- Prove Wholly New Components use conventional validation testing as they are ready to add into the system.
- Retire the old system when the last transaction, report and batch programs have been replaced with proven new technology equivalents. Or, run the databases synchronized and retire the old system gradually.
- Implement the Enhanced functionality in the designated system modules, using conventional maintenance programming and validation of the changes, once the legacy components have been decommissioned.
As new specifications evolve and are added to the project for Enhanced and Wholly New modules, they are integrated into the project just like any maintenance change. In fact, the transition from software development mode to software maintenance mode will be gradual and a matter of definition.A few technical notes:
- Use of the rules engine in this case provided both the ability for business analysts to modify processing rules in the future, and an easy path into new technology for the legacy programming staff. However, this hybrid project methodology could have just as easily created a C#/.NET or Java/J2EE application.
- Because the legacy and new versions of the transactions ran side by side, the new could be phased into production with initially a few users, then a whole office, then the full user base. In addition, if any problem is missed in testing and found in production use, the users can drop back to the legacy transaction screen while the problem is fixed. This removes the largest risk of the project – moving the new code into production in a big bang. This side by side testing can be automated for projects of a sufficient scale. Doing so for one project with a $165 million budget cost less than $3 million.
This project design cannot fail, because we always have a fully functioning system. Plus, benefits begin to accrue early in the project along with the invoices, instead of only invoices during the project and all benefits only at the end.
But this is not the only possible hybrid design. COTS modification could be combined with business rule extraction, for example, to feed into JAD sessions. Similarly, business rule extraction could be combined with a re-design, or renovation might be combined with any other strategy just to get onto the new platform as soon as possible. The hybrid strategy assessment looks at all viable hybrid designs.
Summary – How to Get to Success
Reviewing an organization’s legacy assets is more like software archeology than modern software design and implementation, because we look for ways to extract residual value from those assets whenever practical.
We perform a preliminary assessment to determine whether the preferred direction for the project is strategic, tactical or hybrid. From that preliminary assessment, we implement our 10 step methodology to assess how best to get from the current legacy assets to a successful modernized application result:
1) In step 1,
- We analyze the Infrastructure of the application without any regard to functionality
- We analyze the program logic that supports the business functionality, and assign the associated program source into one of three buckets: (1) Obsolete, (2) To Be Preserved and (3) To Be Enhanced.
- We estimate the amount of new code that will be required for Wholly New functionality, and express this as a percentage of the sum of the other three buckets.
- We ask, of the code that is categorized as Preserved and as Enhanced, “how well does the program logic support the current business process?”
- If not 100%, we ask, “to what extent has the underlying business process changed since the system was designed and implemented”, and “how much of that change is being supported by the system?”
2) In step 2,
- We ask how to optimize among minimum risk, minimum cost, and maximum business value (quality+agility)?
- We ask, “how much will you pay to minimize risk on this project?”
3) In step 3,
- We ask, “what payback period is required to fund the project internally and externally?”
- We ask, “what is the downside for the business if we fail?”
4) In step 4, if the current application runs on a mainframe, we ask: “if you were implementing a replacement application today and had your choice of any platform, would you or would you not choose a mainframe?”
5) In step 5, we ask if there are any deadlines that must be met, and what is the Plan B if dates are missed?
6) In step 6, we derive straw man budgets for re-architecting & renovation projects, and perform a sanity check against available financial and personnel resources.
7) In step 7, we derive a rough order of magnitude estimate of costs, risks and benefits for a straw man re-design and rewrite project.
8) In step 8, we derive a rough order of magnitude estimate of costs, risks and benefits for a straw man COTS implementation and modification project.
9) In steps 9 and 10, depending on the preliminary assessment, we perform one of these
- We consider our findings in light of a strategic modernization design
- We consider our findings in light of a tactical modernization design
- We consider our findings in light of a hybrid modernization design
We conclude our assessment with a technical and business presentation on our findings and recommendations, from which much discussion proceeds, and deliver the formal report on findings and recommendations.