Nowadays it is important to understand the steps of the data transformation process even if the data transformation is not the main part of your work.
Because we live in a world where data is collected, stored, and analyzed in so many different formats, a common requirement for many of us is to be able to perform the basic steps required to collect data from convert one format to another.
This article explains these steps by describing a typical data transformation process.
The data transformation process
While the exact nature of data transformation varies from situation to situation, the following steps are the most common parts of the data transformation process.
Step 1: Data Interpretation
The first step in data transformation is to interpret your data to determine what type of data you currently have and what you need to transform it into.
Interpreting the data can be more difficult than it seems. As a simple example, consider the fact that many operating systems and applications make assumptions about how data is formatted based on the extension appended to a filename. Therefore, your computer probably assumes that the filename video.avi is a video file or text.doc is a Microsoft Word file.
The problem with these labels is that the actual data in a given file (or directory, or database) can be very different from what the file name suggests. Users can add any extension to a filename; Changing the extension doesn't actually transform the data.
For this reason, interpreting data accurately requires tools that can look deeper into the structure of a file or database to see what's really inside, rather than what a filename or database table name might suggest. . Tools like the Linux Command Line Utilityfileare useful for this purpose.
Of course, you also have to specify the target format – i.e. the format in which your data should be available after the transformation is complete. If you are new to this format, you should read the documentation for the tool or system receiving your transformed data to determine what formats it supports or expects.
read the report
Forbes Insights Report - The Data Differentiator - How improving data quality improves business
Learn how data quality is the differentiator of data, including onboarding external data and choosing a data partner.
Step 2: Data quality check before translation
Once you (or your data transformation tool) have figured out what data format you're working with and what forms the data will be transformed into, you should perform a data quality check. A data quality check allows you to identify problems in the source data, such as missing or corrupted values in a database, that can cause problems at later steps in the data transformation process.
Step 3: Data Translation
After you have maximized the quality of the source data, you can start the data translation process. Data translation means taking any part of your source data and replacing it with data that conforms to the formatting requirements or target data format.
For example, you could convert an old HTML file written using an outdated HTML standard to HTML5, the newer standard and the one most modern web browsers expect. Part of the data translation process in this case would be to remove legacy HTML tags like <dir> (a tag thatUsed in older HTML versionsfor creating lists) with <ul> (the list tag supported by modern HTML).
As a rule, data translation involves not only replacing individual data with others, but also significantly restructuring the entire file.
For example, a CSV file formatted as a series of comma-separated words would require significant restructuring to convert to an XML file that organizes information using cascading tag hierarchies.
Step 4: Data quality check after translation
To ensure your translated data is as useful as possible, you should also perform a data quality check. In this step of the process, you look for inconsistencies, missing information, or other errors that may have been introduced during the data translation process.
Even if your data was error-free before translation, there's a good chance that problems arose during translation.
Diploma
In most real-world scenarios, the data transformation steps described above would be performed automatically by software tools. So if these steps sound like work you're not up to, don't worry.
Still, it's valuable for human operators to understand what their data transformation tools do in each step of the data transformation process and how each action adds up to enable the data transformation.
For more information on the importance of data quality, see this Forbes Insights report:The Data Distinctor – How Improving Data Quality Improves Business
data transformation data transformation steps