BI Trends 2011 – a synopsis

Boris Evelson wrote a white paper for Forrester on BI trends titled: Trends 2011 And Beyond: Business Intelligence.  The document is intended for “Business Process Professionals” but I think it is valuable for anyone that thinks about BI and how to improve it.

This post is a synopsis of the paper (or, Joel’s Notes) intended to provide the highlights of the paper.

Executive Summary: In short, Evelson maintains that successful BI will be challenging as we move forward:  “While BI maturity in enterprises continues to grow, and BI tools have become more function-rich and robust, the promise of efficient and effective BI solutions remains challenging at best and elusive at worst. Why? Two main reasons: First, BI is all about best practices and lessons learned, which only come with years of experience; second, earlier-generation BI approaches cannot easily keep up with ever-changing business and regulatory requirements…(we) should adopt Agile BI processes, technologies, and architectures to improve (our) chances of delivering successful BI initiatives”.

Why do BI initiatives sometimes fail?

Lack of ability and flexibility can lead to unsuccessful BI initiatives

Implementing BI requires using best practices and building upon lessons learned

  1. end-to-end BI implementations require closely coordinated efforts to put together multiple components like data sourcing, integration, modeling, metrics, queries, reports, dashboards, portals, and alerts.
  2. It’s difficult to define future BI requirements as business and regulatory requirements change.
  3. The business may not have strong requirements to begin with

BI technologies and processes have not kept pace with business realities

Standard BI solutions are neither flexible nor agile enough to react and adapt to information requirements that seem to change with ever-increasing speed.

Centralization has not led to agile, streamlined BI implementations

Although centralization can reduce costs, duplication of effort, and provide a “single version of the truth”, it is often too bureaucratic and cumbersome (anything but agile).

The figure below shows path of BI from the 90’s to the 10’s.


Agility is the key to efficient and effective business process

Untamed business processes: “Business processes that form in the seams and shadows of the enterprise, require a balance of human and system support, and cross department, technology, information, and packaged application silos to meet end-to-end business outcomes”

Because business processes can change periodically, traditional SDLC doesn’t work well for automating untamed processes.

Forrester defines Agile Business Intelligence as: “An approach that combines processes, methodologies, organizational structure, tools, and technologies that enable strategic, tactical, and operational decision-makers to be more flexible and more responsive to the fast pace of changes to business and regulatory requirements”.

And, I love this quote: “No technology or processes can address BI challenges if a company’s organizational structure and enterprise culture are not already on firm, agile ground” – very reminiscent of my earlier post “It’s not me, it’s my clubs”.

Best practices to prepare your organization for agile BI:

  • Insist on business ownership and governance of BI
  • Emphasize organizational and cultural change management
  • Decouple data preparation from data usage processes in end-to-end BI cycles
  • Approach and treat front- and back-office BI requirements and users differently
  • Establish hub-and-spoke organizational models
  • Use a combination of top-down and bottom-up approaches to BI design and applications
  • Use Agile development methodologies
  • Enable BI self-service for business users

Embrace change

Evelson continues “Once the organization is aligned for agility, the next step is to consider and implement agile BI processes”.

Once you’ve developed your swing, it’s time for a new set of clubs (Next generation technologies).

Next Generation Technologies are the future of agile BI – there are four major sub categories:

 

 

  1. Automated – eliminates manual work to free up resources for true value-added tasks.
  2. Unified – brings different BI use cases together in a unified platform.
  3. Pervasive – make BI applications available whenever strategic, tactical and operational decision-makers need to analyze information.
  4. Limitless – BI must operate on information without any borders or restrictions.

Evelson goes on to list many vendors that represent each of the four sub categories above.  (Here are some of the vendors without the additional information that the white paper lists about features)
Automation – Composite Software Discovery, IBM InfoSphere Discovery, Appian, Lombardi, Savvion, Alteryx, Endeca, Cognos AAF, JackBe, Kalido, etc.
Unified – Attivio, Endeca, TIBCO  Spotfire, Information Builders, Oracle, Cognos, Alteryx, Quiterian, Kalido, Business Objects, SAS, etc.
Pervasive – Most BI vendors, Appian, Fujitsu, Global360, HandySoft, Lomvardi, Metastorm, Savvion, Business Objects, TIBCO  Spotfire, Oracle, Information Builders, QlikTech, etc.
Limitless – Most BI vendors, PowerPivot, QlikTech, Quiterian, Attivio, Endeca, Saffron, TIBCO Spotfire, SISense, etc.

The real meat of the white paper is to say: Agile and Next-Gen BI may be the future of successful BI implementations.  Embrace it, keep an open mind about new technologies, don’t be afraid to let the business have it, but prepare your organization first.  There is no substitute for good process, best practices, strategic vision, and proper organizational structure.  Get these things right and you will be prepared to provide the kind of Business Intelligence that the business needs.

You can get the white paper here (you have to register).

Just give me everything…

I recently posted the following question on a group discussion in LinkedIn:

“Have you ever been gathering requirements for a DW/BI solution and the business says “Just give me everything in an ad hoc universe” or “we want everything that we had before”?  I thought it would be interesting to see different perspectives and techniques for dealing with this.  What are your thoughts?”

There were some good responses and I’d like to just pull out a few here as well as provide my own comments.

First of all, I’d like to point out that the consensus seems to be “start small and use an iterative approach”.  I like the way Barry Meltzer put it: “How do you eat an elephant? … One bite at a time”.  His approach is outlined as follows:

  • Describe the ‘bites’ (ie subject areas and desired outputs)
  • Determine which ‘bites’ must go together to be useful.
  • Determine any dependencies between the ‘bites’
  • Discuss which bites are ‘most broken’ and ‘most important’

“With that information in hand, you can work out a priority list (what do we attack first?) with your sponsors and then go into analysis, prototyping, etc”

Another person’s approach “is to have a patient hearing of business problems, challenges and try scoping the “Everything”. It’s an iterative process but works well, as business is getting a buy into it and when they see the articulation of their problems, “Everything” in front of them along with business case of “impact of not doing it”.”

Kent Graziano and Aran Nathanson both prefer to start with a prototype and iterate from there.  Aran mentioned a Ford quote that I like: “If I had asked my customers what they wanted they would have said a faster horse”.  I agree with Aran that the business users don’t always know (or, know how to articulate) what they want or need.  He states that the prototype “will also give people something concrete they can understand and play with rather than a conceptual framework which might look great on a powerpoint slide but has not real meaning for most users”.

Kent’s prototyping approach is to start with “some basic, obvious data (based on knowing a bit about their business), then iterate from there.  Of course it will help if you have an easy to use reporting or visualization tool, access to the right data (EDW, Data Vault, Data Mart, or even a virtualization layer on the OLTP data), and soem engaged user champions”.

I do believe that the prototype approach with iterations is an effective way of getting to real business value but be wary of unmanagable scope-creep.  Be sure to set specific goals and deliverables for each iteration or risk falling into the project abyss.

One approach that I have found useful is to get a small group of stakeholders in a room to facilitate a discussion. I try to frame the discussion in a manner that lets the business know that in order to provide the best solution, I need to ensure that I understand the business need, the data supporting the business need, and how the data will be used (and viewed). I usually start by having the business list about 15 to 20 business questions that they are expecting to have answered with the BI solution. Once we have the questions, we can group them into categories and usually eliminate much redundancy. This process usually narrows the focus from “everything” to a much more manageable subset. Whiteboarding some mockups helps to ensure that we are all on the same page. With the subset and mockups, the business usually responds favorably to an iterative approach where we start with a solution that provides answers to the key questions that they have identified rather than the more expensive “everything” approach.

Obviously, there are many variables and factors in play, such as: resources, knowledge of the data, knowledge of the business, time, historical successes and failures, tools, etc.  And, as such, there is never just one solution.  But, maybe this discussion will play a part in the approach you choose when the business asks for “everything”.

If you are a member of “TDWI’s Business Intelligence and Data Warehousing Discussion…” on LinkedIn, you can view the full thread of the discussion at: http://goo.gl/I37Cm

Thanks to following people for allowing me to quote them:

Barry Meltzer
Kent Graziano
Aran Nathanson (www.nprinting.com)

More on Validating the Data Model

Data Model Validation

“The validation activities are all related to the physical structure of the data. As such, validation of this model is the responsibility of IT personnel, and the primary roles belong to data analysts, database administrators, security administrators and programmers.”

Click on the link below for a quick read on the roles and responsibilities around data model validation.

More on Validating the Data Model

Advanced Display Options and an End to Custom Shapes (via Power Designing)

A nice solution to visually display metadata about a model utilizing new features in PowerDesigner (from Richard Kier’s Power Designing blog).

Advanced Display Options and an End to Custom Shapes Okay, so it’s been a LONG time since I’ve posted, but two small kids and a job (okay, mostly the kids) will do that.  Anyway … We’re preparing to upgrade to PowerDesigner 15.3 (sadly from 15.0) and one of the new features I’m very excited to get my hands on is the new advanced display controls available.  Previously, you were fairly limited to the information you could display on a diagram.  Each symbol gave you the ability to select among the … Read More

via Power Designing

It’s not me, it’s my clubs!

My golf game isn’t exactly on par these days but I’m pretty sure all I need is a new set of clubs.  I mean, I’m playing with some really old Wilson – Sam Snead blades and a wimpy driver.  If I just had a set of new Taylor Made Burner 2.0 irons and maybe the R11 TP driver, I think I could be ready to tour.  Well, maybe not…  Actually, I think if I took some lessons, hit the driving range on a regular basis, read up on the latest tips and techniques, maybe joined a league, and got some new clubs, I could probably be good enough to post my actual handicap here.

Can you make a similar correlation to Data Warehousing and Business Intelligence?  Yes, I believe so.  You can buy the latest and greatest tools, beef up your infrastructure,  invest in the hottest emerging technologies and you may have data in the hands of the business in record time.  But, did you hit the green or are you in the rough?  (Is the data accurate, does it meet the business needs, and can you maintain it sufficiently and in a timely manner?)

Sure the latest technologies will help you get there but it’s your swing that really matters.   In data warehousing and business intelligence, the swing is comprised of:  a strategic vision, partnership with the business, a well defined architecture, design and development best practices and process, and the right resources.  These are the building blocks that form the foundation from which success is created.

So, what does it take to build the right foundation?

  • Truly understanding the needs of the business and developing a strategy to meet them
  • Research tools, and trends as well as companies that have implemented them
  • Select tools that align with the strategy
  • Create a well-documented architecture that adheres to a solid methodology
  • Utilize resources that truly understand data management and business intelligence (it’s not good enough to be an expert on the tool)

Oh, there are plenty of other things to consider such as a metadata strategy, data governence, master data managment, etc.  But those are topics for another day…

There are some really cool tools and promising trends right now, such as Tableau, Spotfire, data virtualization, in-memory databases, cloud computing, column-oriented databases, predictive analytics, advanced data visualization, etc.  But, in order to implement these, a strong foundation is required.

Will new clubs help?  Yep, I think so.  But, I also need to work on my swing.

And, Honey if you are reading this, I have a great idea for a Father’s Day gift…

Updating very large tables in Oracle

How do you update very large tables in Oracle (tens or hundreds of millions of records)?  You don’t!

Ok, this isn’t anything new and it’s not rocket science, but maybe it’s a technique that you haven’t used before.  If you’ve tried to update millions or hundreds of millions of records in a table, you know that this can be an extremely time consuming process that ties up resources (and, may end up crashing anyway).  Here’s a simple solution that avoids using an update statement (and, all the redo associated with it):

Let’s say you have a table with 50 million records and you need to update one of the columns using by performing a calculation or applying a function.

At a high level…

  • Create a new table by selecting everything from the old table and performing the calculation in the select clause

Create table TABLE_NEW nologging as select <perform calculation here> from TABLE_OLD;

  • Apply all constraints, indexes, grants, etc. to the new table
  • Drop the old table
  • Rename TABLE_NEW to TABLE_OLD;

Note the nologging option bypasses generating any redo and will result in significant performance improvement.

That’s it!

Of course there are other options like partition swapping or writing a stored procedure (never do procedurally what can be done in a single sql statement).  But, I like this because it is safe and clean.

Writing effective definitions for the data model

It is the data modeler’s responsibility to document robust definitions of all objects in the data model.  Of particular importance are table and column definitions because they are the most visible objects to consumers of the data (Designers, Developers, Report Users, etc.).

Why is it important to write good table and column definitions?

  • Ensures consistent usage of the object – Providing a clear definition makes it more likely that the next designer or developer will continue to populate the object with the same intent that you had.
  • Allows for more accurate data integration – Data Integration involves pulling data from multiple systems into one.  When doing so, it is critical that the data you are integrating from different systems means exactly the same thing.  A combination of effective metadata and data profiling will ensure that your integration is accurate.
  • Provides a clear understanding of the contents for downstream audiences – Whether your database is transactional, an operational data store, or a data mart, it will be consumed by users and systems other than the one for which you designed it.  So, it is imperative that you provide enough information that everyone is using the data consistently.

What is an effective definition?
An effective definition is a description of an object that is clear, concise, and includes as much information (with a minimum amount of words) as possible to eliminate any ambiguity about the object.

Parts of a definition:

  1. The term to be defined
  2. The class of object or concept to which the term belongs.
  3. The differentiating characteristics that distinguish it from all others of its class
  4. Examples (not necessary but sometimes help to further define the entity or attribute)

For Example…
Region (term) is a  logical and geographical representation of the sales area (class).  Region is the highest level of the sales area hierarchy: Region, District, Territory (differentiating characteristics).  It may include values such as “Latin America”, “United States – West”, “Canada”, etc. (examples)

How to get started…
If possible, have a business subject matter expert provide definitions.  Refine the definition to meet the standards of an effective definition and have the business SME review for accuracy (and approve if the process is set up).  If you don’t have a business SME or data steward, use whatever resources are at your disposal (application metadata, wikipedia, google, industry standards documentation, etc.).

Tips for writing an effective definition

  • Avoid beginning the definition with “Contains…”, “A table that…”, “A column that…”, “Information about…”
  • Remember your audience. You are not writing for those who already understand what the term means.
  • Use expressions that help to distinguish the term from closely related terms.
  • Avoid close synonyms as much as possible.
  • Use business resources, data stewards, other application metadata when possible.
  • Write good.  Write well!  Use good grammer.
  • Be accurate!

* “Parts of a definition” originated from the Purdue Online Writing Lab

Junk Cross Joiner with a full twist

No, it’s not a snowboard trick.  It’s the Oracle CROSS JOIN command used to populate a junk dimension.

First of all, if you are not familiar with junk dimensions, here’s a brief description:  Junk Dimensions (sometimes called mystery dimensions) are a way of cleaning up your dimensional model by consolidating flags, indicators, and other “junk” (that don’t really fit into your core dimensions) into one dimension.  Sometimes, designers throw these attributes in the fact table and sometimes they create many small dimensions.  By consolidating them into one “junk” dimension, your fact table remains clean and the model uncluttered by unnecessary dimensions.

Now, it’s not necessarily difficult to populate a junk dimension, however, it can be a tedious task if you have lots of junk.  The reason is that you need to populate your dimension to include every possible combination of attributes.  If you have the hundreds of thousands or even millions theoretical combinations, you may want to generate the dimension on the fly and only create the actual combinations (vs theoretical possibilities).  However, if you have a fairly small number of unique combinations, it probably makes sense to create the dimension up front with all possible scenarios.  Here is a quick way to ensure that you’ve accounted for every possibility.

Let’s say we just have two columns that we want to put in the dimension: DISCOUNT_APPROVAL_CODE and PREFERRED_ORDER_FLAG.  The discount_approval_code only has two possibilities: ‘A’ and ‘R’.  And, the preferred_order_flag has two possibilities: ‘Y’ and ‘N’.

The first  step is to create simple queries that list the distinct possibilities of each of the columns:
select ‘A’ discount_approval_code from dual
union
select ‘R’ from dual

Returns:

DISCOUNT_APPROVAL_CODE
A
R

select ‘Y’ preferred_order_flag from dual
union
select ‘N’ from dual

Returns:

PREFERRED_ORDER_FLAG
N
Y

Now, we cross join these two statements using the (you guessed it) CROSS JOIN command:
select discount_approval_code, preferred_order_flag
from
(select ‘A’ discount_approval_code from dual
union
select ‘R’ from dual)
cross join
(select ‘Y’ preferred_order_flag from dual
union
select ‘N’ from dual)

The result is a distinct list of the possible scenarios:

DISCOUNT_APPROVAL_CODE PREFERRED_ORDER_FLAG
A N
A Y
R N
R Y

Now, you wouldn’t need to put together a query for such a simple table but if you have multiple columns and some with multiple possible values, you may want to consider it.  You’ll see that if we add one more column with three possible values, it increases the number of rows in the table to 16:

select discount_approval_code, preferred_order_flag, payment_method
from
(select ‘A’ discount_approval_code from dual
union
select ‘R’ from dual)
cross join
(select ‘Y’ preferred_order_flag from dual
union
select ‘N’ from dual)
cross join
(select ‘Cash’ payment_method from dual
union
select ‘Check’ from dual
union
select ‘Credit’ from dual
union
select ‘Trade’ from dual)

Result:

DISCOUNT_APPROVAL_CODE PREFERRED_ORDER_FLAG PAYMENT_METHOD
A N Cash
A N Check
A N Credit
A N Trade
A Y Cash
A Y Check
A Y Credit
A Y Trade
R N Cash
R N Check
R N Credit
R N Trade
R Y Cash
R Y Check
R Y Credit
R Y Trade

Now, let’s add the surrogate key and add one more code column along with a description columns and see what it looks like.

select rownum, x.*
from
(
select discount_approval_code, preferred_order_flag, payment_method, contractual_agreement_code, contractual_agreement_desc
from
(select ‘A’ discount_approval_code from dual
union
select ‘R’ from dual)
cross join
(select ‘Y’ preferred_order_flag from dual
union
select ‘N’ from dual)
cross join
(select ‘Cash’ payment_method from dual
union
select ‘Check’ from dual
union
select ‘Credit’ from dual
union
select ‘Trade’ from dual)
cross join
(select ‘S’ contractual_agreement_code, ‘Signed’ contractual_agreement_desc from dual
union
select ‘P’, ‘Pending’ from dual
union
select ‘NA’, ‘Not Applicable’ from dual)
) x

Result (48 records):

ROWNUM DISCOUNT_APPROVAL_CODE PREFERRED_ORDER_FLAG PAYMENT_METHOD CONTRACTUAL_AGREEMENT_CODE CONTRACTUAL_AGREEMENT_DESC
1 A N Cash NA Not Applicable
2 A N Check NA Not Applicable
3 A N Credit NA Not Applicable
4 A N Trade NA Not Applicable
5 A N Cash P Pending
6 A N Check P Pending
7 A N Credit P Pending
8 A N Trade P Pending
9 A N Cash S Signed
10 A N Check S Signed
11 A N Credit S Signed
12 A N Trade S Signed
13 A Y Cash NA Not Applicable
14 A Y Check NA Not Applicable
15 A Y Credit NA Not Applicable
16 A Y Trade NA Not Applicable
17 A Y Cash P Pending
18 A Y Check P Pending
19 A Y Credit P Pending
20 A Y Trade P Pending
21 A Y Cash S Signed
22 A Y Check S Signed
23 A Y Credit S Signed
24 A Y Trade S Signed
25 R N Cash NA Not Applicable
26 R N Check NA Not Applicable
27 R N Credit NA Not Applicable
28 R N Trade NA Not Applicable
29 R N Cash P Pending
30 R N Check P Pending
31 R N Credit P Pending
32 R N Trade P Pending
33 R N Cash S Signed
34 R N Check S Signed
35 R N Credit S Signed
36 R N Trade S Signed
37 R Y Cash NA Not Applicable
38 R Y Check NA Not Applicable
39 R Y Credit NA Not Applicable
40 R Y Trade NA Not Applicable
41 R Y Cash P Pending
42 R Y Check P Pending
43 R Y Credit P Pending
44 R Y Trade P Pending
45 R Y Cash S Signed
46 R Y Check S Signed
47 R Y Credit S Signed
48 R Y Trade S Signed

Now, here’s the twist:  You don’t really need the CROSS JOIN command at all.  You can substitute it with a comma and you will get the exact same results via Cartesian joins.  What’s the difference?  Well, the CROSS JOIN command makes it look like you know what you are doing.  Seriously, the only reason to use CROSS JOIN is to show anyone else viewing the code that you intended to use a Cartesian join.

Generating a source to target from PowerDesigner

“Creating a method to export an Excel source to target mapping (utilizing Extended Attributes in Power Designer)”

In my last post, I demonstrated how to create Extended Attributes in Power Designer that can be used to track data lineage.  In this post, I will show you how to create a method that allows you to export an Excel source to target mapping that can be used for the creation of an ETL specification.

First of all, here are the Extended Attributes that we created.

And, here is an example of how they are utilized for a column.

Next, we will create the method.  Open your Extended Model Definition, right click on “Profile” and select “Add Metaclasses…”.

Check “Table”.

Next, right click on “Table” in the left column of the Extended Model Definitions Properties dialog box and select New => Method.

Name your new method to something like “Export Source to Target” (This will be listed as a selection when you right click on any table in the model).  Next, click on the “Method Script” tab and let’s start coding.  The following needs to be defined:

The following initiates Excel:

The next part of the script loops through the columns of the table and pulls out the necessary information to populate the columns that we have just defined above.  For the “Source…” columns, we will be pulling information from the extended attributes that we created (see previous post).  And, for other columns, we will be pulling information from the target column properties.

The last part of the script just expands the column width of each column to fit the contents:

And, that’s the entire method.  Now, we have to create a “Menu” item that allows us to call the method.

In the Extended Model Definitions properties dialog box, right click on “Table” and select New => Menu.

On your new menu, click on the “add new method” icon:

Check the method that you created.

And, that’s it!  Click “Ok” to close the Extended Model Definitions dialog box.

Now, select any table for which you have populated your Extended Attributes.  Right click on the table and select “Export Source To Target”.

And, voila!!!  Excel should open and populate your source to target mapping that looks something like this:

You can also do this at the model level so it creates a source to target document that contains every table in the model.  You just need to create the method on the Model object instead of the Table object, create the Menu on the Model object, and modify the method to loop through all of the tables.

Here is the entire method in text form so you can copy and paste if you like:

Sub %Method%(obj)
Dim vExcel
dim lineNum
dim Column

lineNum = 2
Set vExcel = CreateObject(“Excel.Application”)
vExcel.Visible = True
vExcel.Workbooks.Add

vExcel.Range(“A1”).Value = “Source Table”
vExcel.Range(“B1”).Value = “Source Column”
vExcel.Range(“C1”).Value = “Source Data Type”
vExcel.Range(“D1”).Value = “Source Key”
vExcel.Range(“E1”).Value = “Source Nulls Allowed”
vExcel.Range(“F1”).Value = “Source Column Order”
vExcel.Range(“G1”).Value = “Transformation Rules”
vExcel.Range(“H1”).Value = “Target Table”
vExcel.Range(“I1”).Value = “Target Column”
vExcel.Range(“J1”).Value = “Data Type”
vExcel.Range(“K1”).Value = “Key”
vExcel.Range(“L1”).Value = “Mandatory”

for each column in obj.columns
vExcel.Range(“A”+Cstr(lineNum)).Value = column.getextendedattribute(“Source_Table”)
vExcel.Range(“B”+Cstr(lineNum)).Value = column.getextendedattribute(“Source_Column”)
vExcel.Range(“C”+Cstr(lineNum)).Value = column.getextendedattribute(“Source_Data_Type”)
vExcel.Range(“D”+Cstr(lineNum)).Value = column.getextendedattribute(“Source_Key”)
vExcel.Range(“E”+Cstr(lineNum)).Value = column.getextendedattribute(“Nulls_Allowed”)
vExcel.Range(“F”+Cstr(lineNum)).Value = column.getextendedattribute(“Source_Column_Order”)
vExcel.Range(“G”+Cstr(lineNum)).Value = column.getextendedattribute(“Transformation”)
vExcel.Range(“H”+Cstr(lineNum)).Value = column.table.code
vExcel.Range(“I”+Cstr(lineNum)).Value = column.code
vExcel.Range(“J”+Cstr(lineNum)).Value = column.datatype
If column.primary Then
vExcel.Range(“K”+Cstr(lineNum)).Value = “P”
Else
If column.foreignKey Then
vExcel.Range(“K”+Cstr(lineNum)).Value = “F”
Else
vExcel.Range(“K”+Cstr(lineNum)).Value = “NA”
End If
End If
If column.mandatory Then
vExcel.Range(“L”+Cstr(lineNum)).Value = “N”
Else
vExcel.Range(“L”+Cstr(lineNum)).Value = “Y”
End If

lineNum = lineNum + 1
next

vExcel.Columns(“A:O”).EntireColumn.AutoFit

End Sub

Fun with Extended Attributes

“Creating Column Level Extended Attributes in Power Designer”

In this post, I will demonstrate how to add extended attributes to a data model.  You can read the Power Designer documentation to learn how to do this but here I will give a specific real world example of how Extended Attributes can be uselful.

In this case, we will be adding extended attributes at the column level that allow us to document a portion of the data lineage for a data mart.  If you own the data model for the source system, you can create dependencies from one model to another.  However, if you don’t have the source model, extended attributes can be a good way of documenting the dependencies.

We will add extended attributes that will also allow us to create a source to target mapping that can be used to assist in the creation of the ETL spec.

To create Extended Attributes, you first need to create a new Extended Model Definition.  To do so, right click on the Model and select New => Extended Model Definition.

Name your Extended Model Definition.

Right click on “Profile” to add a metaclass.

Select “Column”.  You will use this to add extended attributes to columns in the next step.

After the “Column” metaclass has been added to the Profile, right-click on “Column” to add “Extended Attributes”.  Extended attributes are basically custom attributes that you can add to a column to track things such as source column information.  More on that later…

Let’s name the first extended attribute “Source_Table” and change the data type to “Text”.

Now we’re going to repeat the above step to add the following extended attributes: Source_Column, Source_Data_Type, Nulls_Allowed, Source_Column_Order, Source_Key, Transformation.

Now that we’ve created the extended attributes, we can start using them in the diagram.  First, open the column properties dialog box and click on the “Extended Attributes” tab.  Then, enter the values for the new extended attributes.  In this case we have added source system information that enables data lineage and traceability.

Obviously, you can create whatever extended attributes you want or are necessary for your source to target mapping.  These extended attributes will be available for every table and can be used in any reports.

In my next post, I will demonstrate how to create a method that exports the source to target mapping to an Excel file utilizing these attributes.

 

In this post, I will demonstrate how to add extended attributes to a data model.  You can read the Power Designer documentation to learn how to do this but here I will give a specific real world example of how Extended Attributes can be uselful.

 

In this case, we will be adding extended attributes at the column level that allow us to document a portion of the data lineage for a data mart.  If you own the data model for the source system, you can create dependencies from one model to another.  However, if you don’t have the source model, extended attributes can be a good way of documenting the dependencies.

 

We will add extended attributes that will also allow us to create a source to target mapping that can be used to assist in the creation of the ETL spec.

 

To create Extended Attributes, you first need to create a new Extended Model Definition.  To do so, right click on the Model and select New => Extended Model Definition.

 

 

 

Name your Extended Model Definition.