What is ETL and Is It Dead Or Alive?

Source:
www.informatec.com

Overview

What is ETL?

ETL is a process in data warehousing that is responsible for pulling data out of the source systems and placing it into a data warehouse. This process includes the performance of three tasks: Extracting the data, transforming the data, and then loading the data. We will see a detailed description of them later.

 

In this post, I will be discussing the concept of ETL, the process of ETL, with an explanation of the three phases, and speaking about the relevance that ETL has in today’s world. To get started with ETL, see this ETL wiki, which includes a collection of resources to get you acquainted with basic ETL and data integration concepts.

 

Data must be properly formatted and normalized in order to be loaded into these types of data storage systems, and ETL is used as shorthand to describe the three stages of preparing data. ETL also describes the commercial software category that automates the three processes.

 

ETL Process

In ETL, the following three tasks are performed:

 

Extract:

the extraction process is the first phase of ELT, in which the desired data is identified and extracted from one or more sources, including database systems and applications. The data is held in temporary storage where the subsequent two phases can be executed. Depending on the source system’s capabilities (for example, operating system resources), some transformations may take place during this extraction process. During extraction, validation rules are applied to test whether data has expected values essential to the data warehouse. Data that fails the validation is rejected and further processed to discover why it failed validation and remediate if possible. The data is extracted from source systems (SAP, ERP, other operational systems), data from different source systems is then converted into one consolidated data warehouse format which is ready for transformation processing.

 

Transform:

After data is extracted, it has to be physically transported to the target system or to an intermediate system for further processing. Transformations can include different processes like date clearing, filtering, and formatting, resorting rows or columns of data, joining data from two values into one, conversely, splitting data from one value into two, or, applying any kind of simple or complex data validation. The importance here comes, of course, in the scale of the data, which makes scalability essential in ETL.

 

Load:

The load phase moves the transformed data into the permanent, target database. It loads data into data warehouses or data repositories other reporting applications as well.

 

Once loaded, the ETL process is complete, although many companies perform ETL regularly in order to keep the data warehouse up to date.

 

The three phases of each ETL operation – extract, transform, and load – are usually tightly coupled together. As noted earlier, the data pipeline typically exists virtually on the memory of the machine performing the ETL, so each load from source to destination typically runs in one contiguous operation. However, ETL processes are typically grouped together into logical units and executed either in sequence or in parallel.

 

What is the relevance of ETL today?

When creating a data warehouse, it is common for data from disparate sources to be brought together in one place so that it can be analyzed for patterns and insights. It would be great if data from all these sources had a compatible schema from the outset, but this is rare. ETL takes data that is heterogeneous and makes it homogeneous. Without ETL it would be impossible to analyze heterogeneous data in an automated method and derive business intelligence from it.

I will talk about the technology changes that are driving the emergence of ELT, and provides a comparison of ELT vs. ETL.

Because of ETL, analysts, business users, and data scientists could interact and query the data without disturbing business operations. More importantly, most ETL tools

eliminated the need to code: They provide a drag, drop, click, and configure

interface which breaks the transformation steps into distinct “stages”. ETL

specialists could build step-wise routines that were easy to trace and debug

without the need to understand code written by engineers.

 

ETL vs. ELT

Both methods are efficient and are easily performed, but ELT is better in my opinion, as you will see why now.

 

Some think that the use of ELT over ETL would have a reduced cost compared to ETL, for example, it will have reduced time-to-market for changes and new initiatives, as SQL deployments take much less time than traditional code. Moreover, there is no need for a dedicated ETL infrastructure, which saves the company some cash as well.

Furthermore, ELT utilizes cloud-based databases better, as processing steps undertaken during off-hours are not billed as CPU hours. Finally, there is more flexibility in maintaining copies of operational data for audit purposes, and in storing intermediate processes as tables (Reusability, traceability etc.).

 

ETL can be contrasted with ELT (Extract, Load, Transform)  which transfers raw data from a source server to a data warehouse on a target server and then prepares the information for downstream uses. Moreover, many people think that ETL is outdated and ELT is now the relevant method in data integration.

 

For an insight on the topic, check this article.

 

Closing Thoughts

In conclusion, we saw that moving to an ELT paradigm has cost advantages that are directly measurable, but coupling an elimination of the intermediary ETL tool needs

changes to processes and a careful rethink of how analytics is delivered in any company.

 

References

ETL Process

http://datawarehouse4u.info/ETL-process.html

What is Extract, Transform, Load (ETL)?

https://www.informatica.com/services-and-training/glossary-of-terms/extract-transform-load-definition.html#fbid=RjHRT3238g7

Overview of ETL

https://docs.oracle.com/cd/B19306_01/server.102/b14223/ettover.htm

ETL vs. ELT

https://blog.holistics.io/etl-vs-elt-how-elt-is-changing-the-bi-landscape/

Leave a Reply