Welcome to the series “Pattern Generators for Dummies” where we explore the world of metadata-driven systems and the power of generating code based on patterns. As technology advances, the need for efficient and scalable systems is increasing. The use of metadata-driven systems and pattern generators is becoming more and more popular as a solution to these challenges. This series aims to provide a step-by-step guide to understanding and utilizing pattern generators for metadata-driven systems, with a focus on simplifying complex concepts.
Metadata management systems can be a significant upfront investment for a project. They typically require resources for planning, design, installation, and training, which can add to the overall cost. However, many organizations view metadata management as necessary for long-term success and cost savings, as it can streamline processes, improve data quality, and increase efficiency.
If the metadata management system is designed as a single, large, complex system, it can lead to increased risk in terms of costs, time to market, and maintenance. Additionally, flexibility and scalability may be limited, leading to difficulties in adapting to changing requirements or new technologies. A well-designed metadata management system should be modular and scalable, allowing for easy maintenance and evolution.
Modularity
Despite my experience with both monolithic and modular approaches, I recommend taking a modular approach when building a metadata-driven system to ensure success and avoid common pitfalls.
The following figure shows the modular arrangement:
There are the following basic parts of the generating infrastructure
- Metadata Management system – the system that stores, maintains, and versions metadata. You can see more about the requirements of Metadata Managements systems in the article: https://darch.bobjankovsky.org/p=198.
- Metadata interface – By reading metadata directly from a metadata management system, you can automate the process of generating code, reducing the time and effort required to manually create code from scratch. However, it is important to consider the flexibility and reusability of these code generators, as a tightly-coupled relationship with the metadata management system can limit their usefulness in the long term. By establishing a generic interface for communication between the metadata management system and the generators, you can increase the flexibility and reusability of these tools, and improve the overall efficiency of your development process. Designing a well-structured Metadata interface layer enables you to create effective code generators that can be based on provisional metadata while you develop your comprehensive Metadata Management system. This approach leads to improved time-to-life by improving flexibility, reducing coupling between the metadata-driven system and the generators, and increasing the reusability of generators.
- Pattern – Functional pattern defines the desired outcome or function for data processing.
- Pattern template – A pattern template is a specific implementation of the functional pattern for a specific platform, language, or method. The pattern template provides the specific steps and code to perform the desired function. While porting the solution from one platform to another, we do not change patterns, but we change pattern templates.
- Generator – The generator generates code based on pair of input parameters: Transformation metadata – mapping (in the format of above mentioned Metadata interface) and Pattern template (while they are not hardcoded in generators, what is also a way).
- Target platform – The engine where our system does the processing. That can be DBMS such as Oracle or Postgres or others, that can be ETL tools or data engineering platforms such as Apache Spark.
- Platform specifics – Features of the platform, language, dialect, development standard, etc.
Phases of Transformation
The transformation process can be divided into two main parts: data transformation and data population.
During the data transformation phase, the source data is transformed into the format required by the target structure. This phase involves mapping source data to the target data structure and ensuring that the data is in the correct format.
The data population phase then takes the transformed data and populates the target structure in the desired manner, whether it be a simple insert, a merge operation, or a more complex historical data management pattern. By separating the transformation process into these two distinct phases, organizations can more easily manage and optimize the entire data pipeline, improving the accuracy and efficiency of their data management processes.
I do not think it is much efficient to implement these phases separately in real implementations, but there are two reasons I use that division here:
- It is a metadata simple task, data catalog can be used as a provisional source of metadata. Imagine a solution, when you write all transformations as views on source tables and then solve the population metadata driven. Well, not exactly metadata-driven, but at least metadata-aided.
- This is the first chapter now! So we will train on the phase of population.
Metadata interface
The following diagram shows a fragment of our metadata interface. We will use JSON format for metadata, so it allows great extensibility. We will also reveal more and more of the interface considering more complex patterns.
For now, this simple structure is enough.
Pattern template
This simplest viable template for code generation should be able to handle both root attributes and a single level of lists. It should replace placeholders with values on both levels and provide definitions of delimiters and simple filtering capabilities for the lists.
Template syntax:
{<key>}
e.g.{TargetName} … replaces the placeholder with the value of the relevant key.
{<key of list>:[<delimiter>|<list content>]}
e.g. {cols:[\n ,|”{ColTargetName}”]} … repeatable values of the relevant list. The content of the list can contain either key, related to items of the list, or items above.
{<key of list>:[<delimiter>|<list content>?<filter list flag attribute>]}
e.g. {cols:[\n ,|”{ColTargetName}”?ColKeyFlag]} … repeatable values of the relevant list. The content of the list can contain either key, related to items of the list, or items above. The flag attribute should contain either “Y” – yes or “N” – no, default, values.
{<key of list>:[<delimiter>|<list content>?!<filter list flag attribute>]}
e.g. {cols:[\n ,|”{ColTargetName}”?!ColKeyFlag]} … The same as the previous one, but negative. The default value is “Y” in this case.
Yes, the simple template can be made more complex by adding more levels of metadata support, incorporating more advanced filtering and processing, or supporting more complex placeholder replacements. However, it is important to consider the complexity trade-off, as making the template too complex may result in it being difficult to implement on various platforms and may hinder its practicality and usability.
Generator
function TREAT_TEMPLATE(
p_metadata clob -- metadata of the generated code
,p_template clob -- template of the generation pattern
) return clob is
-- a generic function for using templates based on json parametrization
...
The function TREAT_TEMPLATE source code for Oracle is reachable on:
https://github.com/bobjankovsky/metaswamp/blob/main/oracle/shards/treat_template.sql
Examples
We will define two tables:
create table ZZZ_SOURCE_DATA_SAMPLE(
THE_KEY_COLUMN_1 varchar2( 100 CHAR)
,THE_KEY_COLUMN_2 varchar2( 50 CHAR)
,SOME_VALUE_COL_1 varchar2(1000 CHAR)
,SOME_VALUE_COL_2 number(27,4)
,SOME_VALUE_COL_3 date
);
create table ZZZ_TARGET_DATA_SAMPLE(
THE_KEY_COLUMN_1 varchar2( 100 CHAR)
,THE_KEY_COLUMN_2 varchar2( 50 CHAR)
,SOME_VALUE_COL_1 varchar2(1000 CHAR)
,SOME_VALUE_COL_2 number(27,4)
,SOME_VALUE_COL_3 date
,SOME_OTHER_COL_1 varchar2(200 CHAR)
,constraint ZZZ_TARGET_DATA_SAMPLE_PK primary key (THE_KEY_COLUMN_1,THE_KEY_COLUMN_2)
);
The transformation will be represented by the following metadata:
{
"SourceName":"ZZZ_SOURCE_DATA_SAMPLE",
"TargetName":"ZZZ_TARGET_DATA_SAMPLE",
"cols":[
{"ColTargetName":"THE_KEY_COLUMN_1","ColKeyFlag":"Y","ColSourceExpr":"THE_KEY_COLUMN_1"},
{"ColTargetName":"THE_KEY_COLUMN_2","ColKeyFlag":"Y","ColSourceExpr":"THE_KEY_COLUMN_2"},
{"ColTargetName":"SOME_VALUE_COL_1","ColKeyFlag":"N","ColSourceExpr":"SOME_VALUE_COL_1"},
{"ColTargetName":"SOME_VALUE_COL_2","ColKeyFlag":"N","ColSourceExpr":"SOME_VALUE_COL_2"},
{"ColTargetName":"SOME_VALUE_COL_3","ColKeyFlag":"N","ColSourceExpr":"SOME_VALUE_COL_3"}
]
}
Insert pattern
The simplest way of the data population.
The pattern template for INSERT in Oracle:
insert into {TargetName} (
{cols:[\n ,|{ColTargetName}]}
)
select
{cols:[\n ,|{ColSourceExpr} as {ColTargetName}]}
from {SourceName}
And finally the generated sample code:
insert into ZZZ_TARGET_DATA_SAMPLE (
THE_KEY_COLUMN_1
,THE_KEY_COLUMN_2
,SOME_VALUE_COL_1
,SOME_VALUE_COL_2
,SOME_VALUE_COL_3
,SOME_OTHER_COL_1
)
select
THE_KEY_COLUMN_1 as THE_KEY_COLUMN_1
,THE_KEY_COLUMN_2 as THE_KEY_COLUMN_2
,SOME_VALUE_COL_1 as SOME_VALUE_COL_1
,SOME_VALUE_COL_2 as SOME_VALUE_COL_2
,SOME_VALUE_COL_3 as SOME_VALUE_COL_3
from ZZZ_SOURCE_DATA_SAMPLE
MERGE pattern
The pattern template for MERGE in Oracle:
merge into {TargetName} trg$
using(
select
{cols:[\n ,|{ColSourceExpr} as {ColTargetName}]}
from {SourceName}
) src$
on(
{cols:[\n and |trg$.{ColTargetName} = src$.{ColTargetName}?ColKeyFlag]}
)
when matched then update set
{cols:[\n ,|trg$.{ColTargetName} = src$.{ColTargetName}?!ColKeyFlag]}
when not matched then insert(
{cols:[\n ,|trg$.{ColTargetName}]}
)values(
{cols:[\n ,|src$.{ColTargetName}]}
)
And finally the generated sample code:
merge into ZZZ_TARGET_DATA_SAMPLE trg$
using(
select
THE_KEY_COLUMN_1 as THE_KEY_COLUMN_1
,THE_KEY_COLUMN_2 as THE_KEY_COLUMN_2
,SOME_VALUE_COL_1 as SOME_VALUE_COL_1
,SOME_VALUE_COL_2 as SOME_VALUE_COL_2
,SOME_VALUE_COL_3 as SOME_VALUE_COL_3
from ZZZ_SOURCE_DATA_SAMPLE
) src$
on(
trg$.THE_KEY_COLUMN_1 = src$.THE_KEY_COLUMN_1
and trg$.THE_KEY_COLUMN_2 = src$.THE_KEY_COLUMN_2
)
when matched then update set
trg$.SOME_VALUE_COL_1 = src$.SOME_VALUE_COL_1
,trg$.SOME_VALUE_COL_2 = src$.SOME_VALUE_COL_2
,trg$.SOME_VALUE_COL_3 = src$.SOME_VALUE_COL_3
when not matched then insert(
trg$.THE_KEY_COLUMN_1
,trg$.THE_KEY_COLUMN_2
,trg$.SOME_VALUE_COL_1
,trg$.SOME_VALUE_COL_2
,trg$.SOME_VALUE_COL_3
)values(
src$.THE_KEY_COLUMN_1
,src$.THE_KEY_COLUMN_2
,src$.SOME_VALUE_COL_1
,src$.SOME_VALUE_COL_2
,src$.SOME_VALUE_COL_3
)
DIFF MERGE pattern
Similar to the previous one, but eliminating “empty” updates.
The pattern template for DIFF MERGE in Oracle:
merge into {TargetName} trg$
using(
select
{cols:[\n ,|{ColSourceExpr} as {ColTargetName}]}
from {SourceName}
) src$
on(
{cols:[\n and |trg$.{ColTargetName} = src$.{ColTargetName}?ColKeyFlag]}
)
when matched then update set
{cols:[\n ,|trg$.{ColTargetName} = src$.{ColTargetName}?!ColKeyFlag]}
where
{cols:[\n or |((src$.{ColTargetName} is not null or trg$.{ColTargetName} is not null) and lnnvl(src$.{ColTargetName} = trg$.{ColTargetName}))?!ColKeyFlag]}
when not matched then insert(
{cols:[\n ,|trg$.{ColTargetName}]}
)values(
{cols:[\n ,|src$.{ColTargetName}]}
)
And finally the generated sample code:
merge into ZZZ_TARGET_DATA_SAMPLE trg$
using(
select
THE_KEY_COLUMN_1 as THE_KEY_COLUMN_1
,THE_KEY_COLUMN_2 as THE_KEY_COLUMN_2
,SOME_VALUE_COL_1 as SOME_VALUE_COL_1
,SOME_VALUE_COL_2 as SOME_VALUE_COL_2
,SOME_VALUE_COL_3 as SOME_VALUE_COL_3
from ZZZ_SOURCE_DATA_SAMPLE
) src$
on(
trg$.THE_KEY_COLUMN_1 = src$.THE_KEY_COLUMN_1
and trg$.THE_KEY_COLUMN_2 = src$.THE_KEY_COLUMN_2
)
when matched then update set
trg$.SOME_VALUE_COL_1 = src$.SOME_VALUE_COL_1
,trg$.SOME_VALUE_COL_2 = src$.SOME_VALUE_COL_2
,trg$.SOME_VALUE_COL_3 = src$.SOME_VALUE_COL_3
where
((src$.SOME_VALUE_COL_1 is not null or trg$.SOME_VALUE_COL_1 is not null) and lnnvl(src$.SOME_VALUE_COL_1 = trg$.SOME_VALUE_COL_1))
or ((src$.SOME_VALUE_COL_2 is not null or trg$.SOME_VALUE_COL_2 is not null) and lnnvl(src$.SOME_VALUE_COL_2 = trg$.SOME_VALUE_COL_2))
or ((src$.SOME_VALUE_COL_3 is not null or trg$.SOME_VALUE_COL_3 is not null) and lnnvl(src$.SOME_VALUE_COL_3 = trg$.SOME_VALUE_COL_3))
when not matched then insert(
trg$.THE_KEY_COLUMN_1
,trg$.THE_KEY_COLUMN_2
,trg$.SOME_VALUE_COL_1
,trg$.SOME_VALUE_COL_2
,trg$.SOME_VALUE_COL_3
)values(
src$.THE_KEY_COLUMN_1
,src$.THE_KEY_COLUMN_2
,src$.SOME_VALUE_COL_1
,src$.SOME_VALUE_COL_2
,src$.SOME_VALUE_COL_3
)
Conclusion
To sum up, this chapter covered the topic of modularity and introduced the simplest and smallest viable aspect of pattern generation. In the next installment, we will delve deeper into more intricate patterns for populating data and potentially explore new platforms.