25 Data Migration Interview Questions & Answers
1. What then is meant by an incremental load?
How to Answer
This is a follow-up to the previous question. When an interviewer has a particular interest in a topic or wants to dig deeper into your knowledge of the subject, they will ask follow-up questions. You should be prepared for follow-up questions any time you've provided an answer to the interviewer.
Answer Example
"Incremental load refers to applying dynamic changes after the initial or full load process. These are performed as needed or when required in a specific period or on predefined schedules."
Experienced
"In incremental load refers to the process of populating the data warehouse tables after the initial load or full load has been performed. Any subsequent migration of data into the tables is considered an incremental load. These can be ad hoc or on predefined schedules."
2. Can you explain the differences between a connected and an unconnected lookup and when you should use each of them in a data migration?
How to Answer
This question asks you to compare two similar but different terms and to discuss when to use them in data migration. This is a typical operational question which starts with a definition and then asks you to describe how you would apply it. As a seasoned data migration operator, you should be able to answer this question easily.
Answer Example
"The two types of lookups are called connected and disconnected. A connected looked up is used for mapping and will return multiple values. It is called connected because it can be tied to another transformation. On the other hand, and unconnected lookup is used when it cannot be connected to another transformation. It returns a single value, but it is reusable."
Experienced
"Connected and unconnected lookups or similar but slightly different. They both query the tables and provide values as a result of the operation. The difference is that connected lookups are part of the workflow and can be tied to another transformation. Unconnected lookups only provide a single output and cannot be connected to other operations. Both types of lockups can be reused if necessary."
3. Please define the term workflow and discuss how it is used in the context of data migration?
How to Answer
This is a straightforward question, which by now you should recognize as another technical question. You should be very adept that answering this type of question at this point in the interview. Keep your answer concise and to the point, and allow the interviewer to ask a follow-up question if they are more interested in this topic.
Answer Example
"Workflow is simply the set of instructions that determines how the tasks are executed during a data migration process. Defining an effective workflow makes the process more efficient."
Experienced
"In the context of a data migration project, the term workflow refers to the progression of tasks that are executed during the data migration. Creating an efficient workflow expedites the migration process and results in a table within the data warehouse, which is more useful and has greater integrity."
4. How do you fine-tune the mapping of the data during data migration?
How to Answer
By asking this question, the interviewer is indicating that this topic is important to them. Hopefully, your pre-interview research helped you to discover this, so you're prepared to address this question.
Answer Example
"There are several steps involved in the fine-tuning of the mapping of data during a migration process. You can use a filter to qualify the data, utilize persistence to create a cache during the lookup operation, aggregate the data, sort it by a group, and use operators' when writing the functions. Another technique is to increase the cache size and commit intervals."
Experienced
"When mapping the data during a data migration, you can fine-tune it to increase the performance and reduce the retrieval time. Techniques available include filtering, qualifying, used utilizing caches, aggregating the data, sorting it into groups, and utilizing operators within your functional expressions. You can also increase the cache size and the commit intervals to optimize the mapping process."
5. How are the tables in data migration analyzed, and when does the analysis occur?
How to Answer
There is another technical question that contains two parts; the first being a how which is a process question the next being a when which is it clarifying question. You're reminded to answer every part of a question which an interviewer asks, especially with a technically-oriented interview. Doing this demonstrates both your listening skills as well as your attention to detail.
Answer Example
"The tables in a data migration process are analyzed by reviewing the statistics generated using the analyze command. The same statistics can be used by a cost-based optimizer to create a data retrieval plan. The analyze statement also supports the creation of the object's structures, space management, and other operations within the data warehouse tables. The tables are analyzed either when the index table is created or when the cluster is complete."
Experienced
"The analyst managing a data migration process use statistics to analyze the tables in the data tables. They use the analyze command to perform this function. The information generated by analyze can also help to optimize how the data is retrieved and make the process more effective. The best time to analyze the tables is when the index table is first created or after the cluster is complete."
6. Can you discuss what is meant by an operational data store and how it works?
How to Answer
An interviewer will ask you this question to continue to probe your knowledge of data migration and learn how you use this particular technique in managing a data warehouse environment. As the conversation continues, you'll notice links between the questions previously asked and the ones you are being asked currently. Make sure you are consistent in your answers throughout the interview.
Answer Example
"An operational data store or ODS is where the data is depository between the staging area and the data warehouse. The ODS acts as a temporary data warehouse allowing operators to analyze the business data before fully populates the tables in the data warehouse. Data can remain in the ODS anywhere from minutes two weeks, depending on the need for the data and the operational characteristics of the organization."
Experienced
"The operational data store or ODS functions like a cache between the staging area for incoming data and tables within the data warehouse. The ODS provides analyst the opportunity to review the data before is fully accepted into the data warehouse. Like the data warehouse, the ODS generates primary keys, takes care of any errors in the data, and will reject data that it doesn't need or that have been corrupted."
7. Can you discuss round-robin and hash partitioning in more detail?
How to Answer
You're being asked another follow-up question. Any time you provide an answer to an interviewer, you can anticipate follow-up questions. This indicates that the interviewer has a particular interest in this topic or needs more information than what you provided in the previous answer. Noting this will help you anticipate future questions and identify which issues are essential to the organization.
For examp: "In round-robin partitioning, the data is evenly distributed among all partitions. Using round-robin partitioning is appropriate when the number of rows in each of the partitions is the same. Hash partitioning applies a hash function to partition keys to group data among the partitions. The results in a group of rows being processed with the same partitioning key in the same partition."
1st Answer Example
"In round-robin partitioning, the data is evenly distributed among all partitions. Using round-robin partitioning is appropriate when the number of rows in each of the partitions is the same. Hash partitioning applies a hash function to partition keys to group data among the partitions. The results in a group of rows being processed with the same partitioning key in the same partition."
2nd Answer Example
"As mentioned earlier, the two main types of partitions are round-robin and hash partitioning. Round-robin divides the data equally among the various partitions within a transaction and accelerates the transaction. Hash partitioning uses partition keys to group data among the partitions. The transaction is accelerated because rows with the same partitioning key are processed together."
8. Can you discuss partitioning in data migration, including the types of partitions and what benefits they bring to data migration?
How to Answer
You'll note that this question has three parts. The interviewer is asking you a general question about partitions, and specific questions about the types of partitions and their benefits. During an interview, you need to listen to the questions you are being asked carefully. Make sure you reply to each part of the question providing all the information about which the interviewer is asking. Interviewers will note when you don't do this.
Answer Example
"Partitioning in the practice of data migration is the sub-division of transactions. The two types of partitions in data migration are round-robin and hash partitions. Partitions are used to improve the performance of the data migration process."
Experienced
"Partitions are a crucial tool used in the data migration process. Partitions involve sub-dividing transactions. The two main types of partitions are round-robin and hash. The benefit partitioning provides is to improve the performance of a data migration transaction."
9. What is a powercenter as it relates to data migration, and what is the difference between a powercenter and a powermart?
How to Answer
This question this summer to the previous one and that it is a technical question asking you to define and compare two similar terms. You can use the same approach you already employed by first defining the words and then describing the difference between them.
Answer Example
"A powercenter is a function within the data migration process which connects with data sources and extracts information from them. The sources are ERP applications, including SAP, Oracle Apps, and PeopleSoft. Powercenters process large volumes of data, whereas a powermart processes small volumes."
Experienced
"The two functions within a data migration system which access data from external sources are called powercenters and powermarts. Both of these connect with external sources, but only powercenters can connect with ERP systems such as PeopleSoft, Oracle Apps, and SAP. The difference between a powercenter and a powermart is the volume of data it can handle. Powercenters handle large volumes, while power marts can only handle small volumes."
10. What are views, and how do they differ from materialized views?
How to Answer
In this technical question, you are being asked to define and compare two different but similar terms. Make sure that you thoroughly answer the interviewer's questions by first providing the definition, and then discussing the differences between the phrases. You may want all enhance this by also discussing how the terms are used within data migration.
1st Answer Example
"Views are representations of the data that are contained in the table. Views are created using the attributes of one or more tables. A view of a single table can be updated, but those with multiple tables cannot. Materialized views are created from an aggregated table which contains data from either fact tables or dimension tables. Materialized views can be updated."
2nd Answer Example
"The two main types of views are regular and materialized. They differ in terms of how the data is obtained and how the view can be updated. Standard views use data from either a single table or multiple tables. Single table views can be updated, but multiple fable views cannot. Materialized views aggregate data from several different tables, including a fact table or dimension table. Materialized views can be updated through Foot Data has already been aggregated."
11. What is meant by the term "snapshots" as it relates to data migration?
How to Answer
The interviewer is continuing to ask technical questions about various terms and aspects of a data migration system. As mentioned earlier, the majority of this interview will involve either operational or technical questions. Preparing for the interview should include reviewing a glossary of terms used in the data migration practice.
Answer Example
"Snapshots are copies of read-only data which is stored in a master table. The primary use of a snapshot is to maintain a copy of the master table at a remote site so that it can either be accessed quickly or replicated if needed."
Experienced
"Snapshots are a copy of the master data table which are stored remotely. They are only as current as when they were recorded and are only updated when a subsequent snapshot is made. Snapshots are typically used as a backup to the master table or to provide remote users with faster access to the data."
12. Can you tell me the names of the three layers in a data migration system?
How to Answer
This is a follow-up question to the one the interviewer previously asked, and you responded to. Follow-up questions are common during an interview, especially when the interviewer has a particular interest in a topic or is looking to solicit more information from you based on the answer you already provided. You should be prepared for follow-up questions throughout the interview.
Answer Example
"There are three layers and a data migration system. The first is the source layer which is the layer where the data first lands. The next layer is the integration layer, where the data is stored after it has been cleansed. The third and final layer is the dimension layer, where the presentation layer stands."
Experienced
"Data migration systems have three distinct layers. The first of these is the source layer, which obtains the data from external sources. The middle layer is where the data is transformed and screened to ensure it is appropriate for the data warehouse. The final layer is the dimension or presentation layer. This is where users access the data for various purposes."
13. With regard to data migration, what is referred to as a Three-Tier System?
How to Answer
This is another question that will test your knowledge of data migration and data warehousing. You can assume the interviewer will have access to a wide range of technical and operational questions they can ask you. While there is no way you can study and be prepared for every single question, you should be able to handle most of them. If you're asked a question you can't answer, simply tell the interviewer that you don't have that information handy, and then describe how you would go about locating it.
Answer Example
"The data warehouse is the 3-tier system in data migration. The first tier is the end-users or the front end. The middle tier in data migration provides end-users the data that they can use and which is secure. The final tier is the back end, where the data is stored."
Experienced
"With regard to data migration, a three-tiered system refers to the data warehouse. The middle tier serves as the interface to both the users on the front end and the data on the back end. It is what retrieves the data based on the user quarries and makes it available in a usable and secure format. The front end is the users, and the back-end is where the data is stored."
14. Can you explain the key features of a data migration system within a data warehouse?
How to Answer
This is a general or opening question. An interviewer will ask this type of question early in the interview to get you talking, to learn more about you, and to extract information they can use for subsequent questions. You can use this type of question to begin to steer the interview in a direction in which you are comfortable discussing.
For exampl: "The data migration system, which is also known as the Extract-Transform-Load (ETL) system, is the foundation of a data warehouse. A properly designed data migration system retrieves data from a variety of sources, scans and confirms the data and enforces data quality and consistency standards, and finally delivers data in a readable format. This enables application developers to use the data to build applications and end-users to use it to make informed decisions."
Answer Example
"The data migration system, which is also known as the Extract-Transform-Load (ETL) system, is the foundation of a data warehouse. A properly designed data migration system retrieves data from a variety of sources, scans and confirms the data and enforces data quality and consistency standards, and finally delivers data in a readable format. This enables application developers to use the data to build applications and end-users to use it to make informed decisions."
Experienced
"Data Migration systems are a critical part of the overall data warehousing plan. Data must be migrated properly for it to be used effectively within the data warehouse. Key elements of a data migration system include verification, quality checks, standards, and formatting. If done properly, that data then becomes available to application developers and decision-makers."
15. What is the difference between an initial load and a full load process in the context of data migration?
How to Answer
By asking this question, the interviewer is testing your knowledge about different terms and processes involved with data migration. If you are experienced in this field, you should have no problem responding to this question. Keep your answer brief and allow the interviewer to ask a follow-up question if they need more information.
Answer Example
"In the context of data migration, the initial load is the process for populating all data warehousing tables for the first time. Full load also refers to the first time a data warehouse is populated. However, using the full load process, all the records are loaded in one batch after all the contents of the table have been erased."
Experienced
"Full load an initial load both refer to populating a data warehouse table for the first time. The differences with the initial load data is transferred into the data warehouse for the very first time. The table never contained any data. In the full load process, the data warehouse table is first erased to remove any data that was previously loaded, and then the new data is populated in a single batch."
16. Why is it important to have real-time data migration into a data warehouse?
How to Answer
This question is meant to explore your understanding of the link between data migration and the business objectives of the organization. The data migration in itself is only as useful as the information it provides to the organization to run the business. Understanding the link between these two processes is critical. The interviewer is seeking to confirm that you are aware of this.
Answer Example
"The purpose of a data warehouse is to provide useful information to the business and its decision-makers. Real-time data migration into the warehouse enables the decision-makers to have the latest data with which to manage the business. This is especially important for companies that use OLTP, such as retailers or financial firms."
Experienced
"Real-time data migration is one of the more critical functions of a data migration team. Businesses depend on usable data to make decisions and operate the company. Having real-time data provides decision-makers with the most up to date information. This is particularly important in businesses that operate in dynamic industries in which conditions are continually changing. It is also essential for companies that execute real-time transactions, including retailers and financial firms."
17. What is the best way to process a fixed-length flat file?
How to Answer
The interviewer is asking you this operational question to make sure you know how to perform this process. Your response should be in the form of a step-by-step description of what you would do to accomplish this task. You don't need to get too technical or provide too many details. The interviewer will ask a follow-up question if they need more information.
Answer Example
"A fixed-length file layout should include the file name, where the field begins, its length, and its data type, which will usually be text or number. You only need to supply those layout information once. The tool will remember this and expect a similar format for subsequent fixed-length flat files. Data validation is essential in this process in case the format of the data has changed. The most likely change will occur if the data includes a date field."
Experienced
"Fixed-length flat files are simple to process if you first define the layout of the files. The information included in this definition is usually the beginning of the file, its length, and the data type. This information is initially entered manually and will remain the same until the data migration tool encounters a file with a different format. This will cause the migration to fail and alert you that some manual intervention is required. It is recommended that you perform periodic validation processes to ensure that the data format has not changed."
18. Why is it important to analyze the impact of changes in a data migration system?
How to Answer
This is another operational question which the interviewer is using to determine the depth of your knowledge in this field. As the interview progresses, the questions will become more complicated and more specific. This signals that the interviewer is growing more confident in your abilities and is willing to investigate more complex topics.
1st Answer Example
"Examining the metadata associated with an object determines how it is impacted by a change in its structure or content. Changing data can break processes that are crucial to the function of the data warehouse. Allowing ad-hoc changes to data staging objects could be detrimental. Once a table is created in the staging area, you must perform an impact analysis before any changes are made to it. Many data migration tool vendors provide impact analysis functionality."
2nd Answer Example
"It is critical to analyze the impact of changes in a data migration system because any change could alter the functionality of the system. The best way to mitigate the impact of any changes to the system is to analyze them during the staging process. This will catch any errors or negative impacts before they are implemented when performing the full data migration process."
19. Can you describe the roles and responsibilities of a data migration team?
How to Answer
This is a general question which the interviewer will likely ask early in the process. Your response will give them some insight into your background and your understanding of both the data migration process and the roles that the administrators play in moving data from one source to another. You can use general questions to help direct the interviewer in the direction in which you are comfortable discussing.
Answer Example
"The primary role of the data migration team is to create and manage the data warehouse. The responsibilities encompassed in this role include delivering data to end-user tools, performing the cleaning and conforming steps, protecting and documenting the extraction of the data, and ensuring the consistency with the sources of the data."
Experienced
"The data migration team has several roles and responsibilities. Their primary responsibility is to ensure the data is extracted from sources and moved to the database in an efficient and trustworthy manner. The duties the team performs include extracting data from the original sources, assuring the quality and cleaning the data, applying labels and measures to the data, and delivering the data in a format that can be used by query tools, report writers, and dashboards."
20. Can you name some of the popular data migration tools available to database administrators?
How to Answer
By asking this question, the interviewer is seeking to understand what tools you are familiar with and whether the tools you use are the same ones they are currently using. Keep in mind that they may also be interested in whether you can bring new tools into their organization. Again, having done your research, you should be familiar with what they already use and can structure your answer to either match their tools or suggest new ones.
Answer Example
"There are several excellent data migration tools currently available. The ones I prefer include Oracle warehouse builder, SAP data services, IBM infosphere information server, and SAS data management. These address my needs and integrate well with the systems I typically use."
Experienced
"Choosing a data migration tool depends on several factors. These include the environment, the data sources, the data warehouse, and the platforms you are using. The tools I like include Power center Informatica, Elixir repertoire for data migration, data migrator (IBI), and SQL Server integration services (SSIS.) These cover 90% of my needs, are flexible, and are easy to work with."
21. What operations are executed in a data migration system, and in what order?
How to Answer
This is another technical question. The nuance of this question is that not only is it asking you to define the steps, but also asks what order in which they are performed. This will ensure that you are knowledgeable about this process and have used it before.
Answer Example
"The typical data migration system consists of three distinct operations. These include extracting the data from the data source, which can be an Oracle, Microsoft, or any other relational database, transforming the data by performing a data cleansing and validation process, and writing the data into the target database or data warehouse."
Experienced
"Regardless of which data migration tool you are using to transfer data from the sources into a data warehouse, the operations performed are similar. The three distinct processes are extracting the data from the source, cleansing or validating it, and then writing it to the target database or data warehouse."
22. Can you explain what a data migration validator is and how it is used in data migration?
How to Answer
Yet another technical question. Remember, technical questions ask you to define the specific topic within the field you are interviewing four. Technical questions are best answered directly and concisely. You should anticipate follow-up questions which the interviewer will use to extract more information from you.
Answer Example
"A data migration validator is a tool used to test the data. I can facilitate the execution of data integration, data warehousing, and data migration projects. It makes uses a specific architecture to extract, load, and validate data from a variety of data sources, including databases, flat files, XML, Hadoop, and Business Intelligence systems."
Experienced
"The process of validation within a data migration project is meant to make the extraction of data from a variety of sources and integration of the data into the data warehouse easier and faster. Its function combines extraction, loading, and validation of the data into a single process. The data validation tool as agnostic with regard to the type of files it can work with."
23. What is the purpose of data profiling in the data migration system?
How to Answer
This is another operational question by which the interviewer is trying to determine your knowledge of this particular operation within the data migration process. You should be able to answer these types of questions easily based on your experience in this area. It is always advisable to brush up on the fundamentals in your field before an interview.
Answer Example
"Data profiling is a careful examination of the scope, quality, and content of a data source to enable a data migration system to be built. Its purpose is to verify that the data source has been well maintained before it is migrated to the data warehouse. If done effectively, the data will require minimal transformation and human intervention to load directly into the tables of the data warehouse."
Experienced
"Data profiling is the process used to verify that the data being imported into the warehouse is clean, formatted properly, and the correct information the administrator is anticipating. The purpose of data profiling is to minimize and the edits, data manipulation, or human interaction with the data after it has been migrated into the data warehouse."
24. What are some of the standard data formats in a data migration system?
How to Answer
This is an example of a technical question. Technical questions usually ask for definitions or brief explanations of the topic about which the interviewer is asking. Technical questions are best answered directly and succinctly, with little embellishment. The interviewer will ask follow-up questions if they need additional information.
1st Answer Example
"There are various standard formats in a data migration system. These include flat files, XML datasets, independent DBMS working tables, normalized entity/relationship schemas, and dimensional data models."
2nd Answer Example
"Most data that can work with a wide variety of data formats. This is important because administrators may not know where the data will be extracted from when designing the system or may not be familiar with the data formats in the source databases., Data formats most systems can manage include flat files, virtually any DBMS system, normalized entity/relationship schemas, XML datasets, and dimensional data models."
25. What are the day-to-day functions a data migration system performs?
How to Answer
The interviewer is asking this operational question to understand your knowledge and experience in this field. Before the interview, you should have done a great deal of research on the company and its operations. This will allow you to structure your answers to their questions to demonstrate how your skills and experience can help them address their business challenges.
Forexample: "Data migration systems perform several different functions. This begins with building the development, test, and production suite of the data migration processes. It will then analyze and extract data from the sources identified by the operators. The system also verifies that the data is safe and complies with the standards of the organizations. Next, the data is formatted into dimensional schemas for the most effective delivery to end users, business intelligence tools, data-mining tools, OLAP cubes, and analytic applications. Finally, the system self-checks and can be tuned to optimize the performance of the overall data migration process."
Answer Example
"Data migration systems perform several different functions. This begins with building the development, test, and production suite of the data migration processes. It will then analyze and extract data from the sources identified by the operators. The system also verifies that the data is safe and complies with the standards of the organizations. Next, the data is formatted into dimensional schemas for the most effective delivery to end users, business intelligence tools, data-mining tools, OLAP cubes, and analytic applications. Finally, the system self-checks and can be tuned to optimize the performance of the overall data migration process."
Experienced
"A comprehensive data migration system performs a variety of functions. Initially, it will create subsystems that allow for the management of the system itself. It will then seek out external sources of data and began to import information into the data warehouse. During the import, the data is verified and scrubbed to make sure it is the appropriate information and that there are or no threats embedded within the data. The system then formats the data so that it can be used in a variety of different ways, including application development, data mining, and extraction for decision-making processes."