What is Data Migration ? Explained

What is Data Migration ?

Data Migration is a process of moving data from one platform/format to another platform/format. It involves Migrating data from a legacy system to the new system, preferably without disrupting or disabling active applications and then redirecting all input/output activity to the new device. Data Migration is the also the process of importing data from various source systems into one target system. Data Migration is a multi-step process that begins with an analysis of the legacy data and culminates in the loading and reconciliation of data into the new applications. This process involves scrubbing the legacy data, mapping the data from the legacy system to the new system, designing the conversion programs, building and testing the conversion programs conducting the conversion, and reconciling the converted data.

Why we need to migrate data ?

In today’s world, migrations of data for business reasons are becoming common. While the replacement for old legacy system is the common reason, some other factors also play significant role in deciding to migrate the data into new environment. Some of them area

  • Databases continue to grow exponentially that requires additional storage capacity.
  • Companies switching to high end servers such as Exchange Server 2007.
  • Need to reduce complexity and costs by moving to common / stable platforms.
  • Concept of virtualization and the resultant need for data to be transportable, making it possible to migrate to and from physical and virtual environments.
  • IT rationalization to facilitate compliance with the requirements of legislation

Data migration strategy should be designed in effective way such that it will enable us to ensure that tomorrow’s purchasing decisions fully meet both present and future business and the business returns maximum return on investment. Additionally, as data migration and business continuity planning are so closely intertwined, in developing a data migration strategy your business will also be enhancing its disaster recovery options and improving the likelihood of a speedy recovery in the event that a disaster strikes.

Migration – Strategic or Tactical

Data migration is considered to be larger & strategic projects hence many companies view it as a tactical challenge and not as strategic project. This leads them to find a ‘quick-fix’ solution, but these don’t work as data migration is much more complex than organizations initially expect. It has longer learning curve than they planned for. Moreover the existing tools & technologies in the market will add more confusion to the process. This ‘quick-fix’ approach ultimately only contributes to the high failure rates of application on migration projects. Even organizations that have prior good experience with migration frequently fail to leverage their data migration expertise. This ad-hoc approach means there may be no mechanism to capture, leverage, or re-use best practices.
Hence organizations should approach it as strategic project rather than tactical one. This means that addressing data migration is as ongoing process of making the data work—no matter what changes occur in the company’s systems. A well defined data migration strategy should address the challenges of identifying source data, interacting with continuously changing targets, meeting data quality requirements, creating appropriate project methodologies, and developing general migration expertise.

Data Migration Strategy

The key considerations and inputs for defining data migration strategy are given below

  • Strategy to ensure the accuracy and completeness of the migrated data post migration
  • Agile principles that let the logical group of data to be migrated iteratively
  • Plans to address the source data quality challenges faced currently as well as data quality expectations of the target systems
  • Design an integrated migration environment with proper checkpoints, controls and audits in place to allow fallout accounts/errors are identified/reported/resolved and fixed
  • Solution to ensure appropriate reconciliation at different check points ensuring the completeness of migration
  • Solution to selection of correct tools and technologies to cater for the complex nature of migration
  • Should be able to handle large volume data during migration
  • Migration development / testing activities should be separated from legacy and target applications

Data Migration strategy will involve the following steps

  • Identify the Legacy / source data to be migrated
  • Identification any specific Configuration data required from Legacy applications
  • Classify the process of migration whether Manual or Automated
  • Profile the legacy data in detail
  • Identify data cleansing areas
  • Map attributes between legacy and Target systems
  • Identify and map data to be migrated to the historical data store solution (archive)
  • Gather and prepare transformation rules
  • Conduct Pre Migration Cleansing where applicable
  • Extract the data
  • Transform the data along with limited cleansing or standardization
  • Load the data
  • Reconcile the data

Phases of Data Migration

Data Migration process involves various phases i.e

  • Requirement Phase
  • Design Phase
  • Development phase
  • Testing phase
  • Delivery phase

Requirement Phase

Requirement phase is beginning phase of the migration project, requirement phase states that the number and brief description of systems from which data will be migrated what kind of data available in those systems, the overall quality of data, to which system the data should be migrated. Business requirements for the data helps determine what data to migrate. These requirements can be derived from agreements, objectives, and scope of the migration. It contains the in scope of the project along with stakeholders. Proper analysis gives a clear understanding of scope to begin, and will not be complete until tested against clearly identified project constraints. During this Phase, listed below activities is performed

  1. Legacy system understanding to baseline the Migration scope
  2. Migration Acceptance Criteria and sign-off
  3. Legacy data extraction format discussion and finalization
  4. Understanding target data structure
  5. Legacy data analysis & Profiling to identify the data gaps and anomalies
  6. Firming up Migration Deliverables and Ownership

Design Phase

During design phase i.e. after Analysis phase, High level Approach and Low level design for Migration components are created. Following are the contents from the design phase.

  • Approach for end to end Migration
  • Master and transactional data handling approach
  • Approach for historical data handling
  • Approach for data cleansing
  • Data Reconciliation and Error handling approach
  • Target Load and validation approach
  • Cut-over approach
  • Software and Hardware requirements for E2E Migration
  • Data model changes between Legacy and target…For example – Change in Account structure , Change in Package and Component structure
  • Data type changes between Legacy and target
  • Missing/Un-available values for target mandatory columns
  • Business Process specific changes or implementing some Business Rules
  • Changes in List of Values between Legacy and target
  • Target attribute that need to be derived by more than one columns of Legacy
  • Target specific unique identifier requirement

The detailed design phase comprises of the following key activities

  • Data Profiling – Data profiling is an analysis of the candidate data sources for a data warehouse to clarify the structure, content, relationships and derivation rules of the data. The benefits of data profiling is to improve data quality, shorten the implementation cycle of major projects, and improve understanding of data for the users.
  • Source to Target attribute mapping
  • Staging area design
  • Technical design
  • Audit, Rejection and Error handling

Development phase

After completing the Design phase, Migration team works on building Migration components as listed below :

  • ELT mappings for Cleansing and Transformation
  • Building Fallout Framework
  • Building Reconciliation Scripts
  • All these components are Unit Tested before Dry Runs.

Actual coding and unit testing will be done in construction phase. During development phase the structures which are similar to target system should be created. Data from different legacy systems will be extracted into a staging area & source data will be consolidated in the staging area. The Construction and Unit Testing (CUT) phase includes the development of mappings in ETL tool depends on Source to Target Mapping sheet. Transformation Rules will be applied in required mappings and loaded the data into target structures. Reconciliation programs will also be developed during this phase. Unit test cases (UTC) to be written for validating source systems, source definitions, target definitions and to check the connection strings, validation of data types and ports etc.

Testing phase

The objectives of the data migration testing are

  • All required data elements have been fetched/moved.
  • Data has been fetched/moved for specified time periods as per the requirement specification document.
  • Data has originated from the correct source tables.
  • The appropriate target tables have been populated in the correct format
  • Performance of migration programs and custom scripts.
  • There has been no data loss during data migration and even if there is any loss they are explained properly.
  • Data integrity is maintained.

The Migrated data will be tested by reconciliation process. The reconciliation scripts will be developed to check the count between source system and staging area where the data pulled from source system, to check the count between source staging area, target staging area and rejection tables. For example: Count of source records = Count of target records + count of rejected records in rejection table.
All functional check validation will be verified depends on the business rules through functional scripts. The various ways to do the reconciliation and functional checks, it can be manual process or automated process. Creating one procedure to check reconciliation which returns the records in error table if any of the script fails and creating one procedure to check the functionality which returns the records in error table if any of the functionality fails. Creating macro to execute the reconciliation and functional scripts in one shot, Preparation of Procedure and macro will be falling under automated process.

Delivery phase

The migrated data will be moved to QA environment to check the quality of data. During this phase, the cutover activities will be performed and migrations from the legacy systems into Target Systems will be carried out. After the data is loaded, reconciliation of the loaded records will be verified. Reconciliation identifies the number of records that are successfully converted. It also tracks the records failed during the process of conversion and migration.
Analysis will be performed on the failed records to determine the reasons for failure and the necessary fixes are applied to the failed records to load them into the target database.
The cutover phase resembles the final tasks in the SDLC implementation phase, including data conversion, testing, changeover to the new system, and user training. Compared with traditional methods, the entire process is compressed. As a result, the new system is built, delivered, and placed in operation much sooner. The cutover phase is the final phase of the volume move. During the cutover phase, the data in the source volume and the destination volume is synchronized. From the project perspective the final preparation for cutover phase includes:
Final test activities.

  • Resolution of any critical issues (and documented in the upgrade script).
  • Creation of the cutover plan, based on the results and experiences of the tests.
  • A (optional) dress rehearsal test upgrade has been performed based on this cutover plan.

Testing in Data Migration

Below are the situations/scenarios which we can test during Data migration activities.

Record count Check

Data in the staging will be populated from various source systems in general scenarios. One of the important scenarios to be tested after populating data in the staging is Record count check. Record count checks can be categorized into two sub scenarios namely

  1. Record count for Inserted Records:The count of record identified from source system using requirement document should match with the target system. i.e. Data populated after running the jobs.
  2. Record count for updated records:The count of records which are updated in the target tables in the staging using data from source tables should match the record count identified from the source tables using requirement document.

Target tables and columns

If the jobs have to populate the data in a new table, check that required target table exists in the staging. Check that required columns are available in the target table according to the specification document.

Data check

Data checking can be further classified as follows :

  1. Data validation for updated records With the help of requirement specification document we can identify the records from the source tables where data from these records will be updated in the target table in the staging. Now take one or more updated records from the target table, compare the data available in these records with the records identified from source tables. Check that, for updated records in the target table, transformation is happened according to the requirement specification using source table data.
  2. Data validation for inserted records With the help of requirement specification document, we can identify the records from the source tables, which will be inserted in the target table available in the staging. Now take one or more inserted records from the target table, compare the data available in these records with the records identified from source tables. Check that, for inserted records in the target table transformation is happened according to the specification using source table data.
  3. Duplicate records check If the specification document specifies that target table should not contain duplicate data, check that after running the required jobs, target table should not contain any duplicate records. To test this you may need to prepare SQL query using specification document.
  4. Checking the Data for specific columns If the specification specifies that particular column in the target table should have only specific values or if the column is a date field, then it should have the data for specific data ranges. Check that data available for those columns meeting the requirement.
  5. Checking for Distinct values Check for distinct values available in the target table for any columns, if the specification document says that a column in the target table should have distinct columns. You can use a SQL query like “SELECT DISTINCT FROM
    ”.
  6. Filter conditions When the job is completed and data is populated in the target able, one of the important scenarios to be tested is identifying the criteria used to retrieve the data from the source table using specification/any other documents. Prepare SQL query using that criteria and execute on source database. It should give the same data, which is populated in the target table. Filter conditions used to retrieve the data from source tables should match the criteria identified above.
  7. Many source rows to one target rows Sometimes requirement specifies that, from many similar records identified from source, only one record has to be transferred to the target staging. For example If the criteria used to retrieve the records from source tables gives 100 records, i.e. 10 distinct records are there and each distinct record is having 916 similar records so total is 10*10=100. In this case some requirement specification enforces that only 10 distinct records have to be transferred into target table in the staging. This condition needs to be verified and it can be tested by writing SQL statements. In some cases certain requirements specify that all the records identified from source have to be transferred to the target table instead of 1 distinct record from several similar records, In this case we have to check the data populated in the target table to verify the above requirements.
  8. Check the data format Sometimes data populated in the staging should satisfy certain formats according to the requirement. For example date column of the target table should store data in the format ‘YYYYMMDD’. This kind of format transformation has to be tested as mentioned in the requirements.
  9. Check for the deleted Records In Certain scenarios, records from the source systems, which were already populated in the staging, may be deleted according to the requirements. In this case corresponding records from the staging should be marked as deleted as per the requirements.

File processing

In few scenarios, it is required that data in the staging has to be populated using flat file which is generated from a source system. Following tests have to be performed before and after executing jobs to process the flat file.

  • Check for the directories where the flat file has to be placed for processing.
  • Check for the directories where flat file has to be moved after processing is completed.
  • Check that file name format is as expected.
  • Check for the record count in the target table. It should be as specified in the requirement.

While populating data from flat file to target table, Data transformation has to be happened according to the requirements.

File Generation

Sometimes batch jobs will populate the data in a flat file instead of tables in the database. This flat file may be used as input file in other areas. There is a need to verify the correctness of the file generated after running the batch jobs. Following things need to verify as part of testing.

  • Check that target directory is available to place the flat file generated, if it is not available then flat file generation will be failed as there is no directory available to place the file. This check has to be performed before running the job.
  • Since the generated flat file will be used as input file to other system in the integrated environment, check that generated flat file format and file name format meets the requirement.
  • Check that data populated in the flat file meets the requirement. Validate the data populated in this file against source data.
  • Check that number of records populated in this file meets the specification.

Risks involves in Data Migration

Risk management will be a part of every project, especially a large scale migration project. Risks related to schedule, costs and scope should be clearly identified, documented and a plan must be put together to mitigate them.Risks are specific to data migration come from a number of contributors like

Data quality

There are high risk associated with the understanding of business rules & logic related to flow of information between the source and target systems. Data from the source system may not map directly to Target system because of its structure, and multiple source databases may have different data models. Also, data in source databases may not have consistent formatting or may not be represented the same way as Target system. There will be lack of expertise in source data formats. Understanding how data has been formatted in the source system is critical for planning how data entities will map to Target.

Extraction, transformation and load tool complexity

Business systems will have more functionality and modules and this becomes an extremely challenging exercise. In earlier days, data migrations of metadata were easily accomplished by writing data directly into the database(s) by using simple commands and flat files at the database layer. As business systems have matured, most of the times the database layer doesn’t contain the hierarchy and relationship information, these are stored and managed within the application layer. ETL have to use APIs and in some cases, these are not recommended to support extraction and loads in a mass manner

Performance of systems (extraction and target system persistence)

Poor performance of application could lead delay in the load of data, hence through testing with multiple runs and cycles are essential to figure out the exact performance of the system. Load test and stress tests needs to be carried out to check the feasibility of the application to run on high volume of data during peak hours. Hence IT managers should ensure that the best performance people are monitoring the load process and can tune the systems properly. From database perspective there might need to increase the memory allocated or turning off archiving or altering the indexing.

Project coordination

Most data migration projects will be strategic for the organization and will be used to implement new business system. This involves multiple roles to be played during analysis, design, testing and implementation. It requires dedication and well planned coordination to meet the schedule of data migration as any slippages in any of the phase will impact the overall project.

Costs involved in Data Migration

While some of the costs associated with data migration might be apparent, such as the purchase price of the necessary migration tools, others are not so obvious. There should be proper plan for effective migration as all the stakeholders should understand and appreciate the hidden costs involved during migration.

  • Planned & Unplanned Downtime Scheduled & Unscheduled downtime is expensive due to non availability of applications or data. On an average, downtime costs 3.6% of overall annual revenue, close to $18000 per hour for small to midsized enterprises. Such high costs make downtime undesirable and financially disastrous. This will eventually hit the profits of the organizations. We need to make sure that both the application and data are available throughout the migration to avoid any costs associated due to downtime. As a mitigation plan, IT managers can plan to do migration on minimal disruptive operations.
  • Data Quality The need for data migration is to have good data quality for the application. If the quality of data in target database / data warehouse is not up to the expectations or standards the purpose of the entire application is not satisfied. Bad data quality can cripple an application long before roll-out. If the data fails to pass a base set of validation rules defined in the target application, the data load will fail which will increase the cost of re work and takes more time for go-live activities. Organizations must establish user confidence in the data. In order to fully trust data, business should have traceability of each element of data through data profiling, validation, and cleansing processes. In application environments data quality is also about business rules. Data must obey to unique business rules, as well as meet validation thresholds.
  • Man hours Staff time is more precious and it is safer to schedule the migration activities during holidays or weekends to have minimal impact to the businesses. This will certainly reduce the downtime needed during critical hours. To reduce the overtime cost, business should look ways to zero downtime migration. There is no guarantee that all the steps will reduce the necessity for overtime, but certainly it will also better place the IT department to deal with certain problems without the need for disruptive and expensive unscheduled downtime.
  • Loss of data One of the common problems faced in data migration process is the data loss and the frequency of the data loss. On an average, about a third of businesses have lost data during a migration with about a half of those data losses directly translating into monetary losses. Migration strategies should be formulated to mitigate any risk associated with loss of data. Best mitigation policy is to use full data backup of the source system before migrating to target system.
  • Failure to Validate Large number of businesses fails to validate the results of their migrations properly. They solely trust the results of the users testing and not the data that actually got moved out through migration. This will have cripple effect in terms of delay in the identification of problems and these results in either expensive unscheduled downtime during business hours or further evening or weekend overtime. To avoid such situations, businesses should seek to deploy a solution has in-built validation capabilities.
  • Under budgeting Proper planning on man hours, downtime, etc are required for arriving at cost of data migration to avoid any surprises on the cost involved in the data migration Sometimes it will be either over budgeted or under budgeted and according to industry analysts in two thirds of the migrations companies calculate wrong man hours/down time requirements.
  • Deferred/Scrapped Projects Projects may be scrapped or deferred due to high cost and complexity involved during the migration. This might be a short term solution by taking in to account on long term benefits/costs involved on these projects. With planned and effective strategy and investing in necessary applications a business will remove many of the factors that act as inhibitors to more efficient, reliable and less costly systems.

Roll back strategy in Data Migration

Rollback strategy is mandatory for any data migration process as a mitigation plan to restate the application activities in case of any unforeseen situations or failure during migration activities. This has to be planned between initial pilot phase and throughout the entire phases of actual migration. This has to be planned for every level specified in the migration planning. This strategy will help in describing all the activities to be executed If a process fails or does not produce an acceptable result.
For each migration level, it will be specified individually according to which criteria the decision for a reset of the modifications, and thus for an abort of the migration, will be made, which tasks must be executed in order to prepare the abort, which activities must be executed to conduct the abort, particularly how the original data stock can be reconstructed and which activities are necessary after the execution of the abort.
Particularly an evaluation strategy ensuring that the legacy system will again be available with its complete functionality shall be developed. On a broader note, we should plan for the following activities

  • Plan to clean the target system data when target is also in production environment.
  • How to push the transactions during the migration / cut-over window in to legacy applications
  • Plan to minimize the impact on downstream / upstream applications.
  • Restore the operations from existing or legacy systems
  • Communication plan to all stakeholders

Reconciliation of Data Migration

Data audit and reconciliation is a critical process in determining the overall success of a data migration event. Reconciliation is the process of keeping migration activities such as Cleansing, Extraction and Loading under control. It is also instrumental in keeping migration on track, ensuring the authenticity of the quality and quantity of the data migrated at each stage. Validation programs have to be introduced at the extract and upload levels to capture the count of records. A&R checks for accuracy and completeness of the migrated data and deals with quantity / value criteria (number of objects migrated etc.). A&R does not include verification of the migrated data to make sure that the data supports the business processes. Reconciliation requirements are driven by both the SAP functional teams Process Owners requirements and the technical reconciliation requirements. Below are the tasks involved in A&R process.

  • Run reports at source: A&R reports can be executed at legacy and number of records extracted will be validated against the legacy system. Hash counts for key columns are noted.
  • Extract Validation: Data Extracts can then be validated against the Data Acceptance Criteria. This will also include validation of number of records and hash count mentioned in Source A&R reports.
  • Run intermediate reconciliation in transformation layer. For some objects intermediate reconciliation can be performed at the data transformation layer.
  • Run reports at target: Target A&R reports must then be executed to validate the record count and hash counts.

A&R is based around a number of control points (CP). Control points will be set up at major stages in the migration process. The aim is to demonstrate that all records and their key values can be accounted for through the migration process. Following figure shows the location of all the possible reconciliation control points against the high-level migration architecture. The A&R process should be 100% reconcilable. This process is deemed to reconcile even if there is a discrepancy between source and target systems as long as it can be accounted for i.e. the accounts that have a discrepancy have been identified and the amount of the discrepancy is known and understood. In this scenario, the discrepancy must be compared with tolerances that have been agreed with the business process owners.

For more about Data Migration