What is Data Modeling ? Differences between the Models and Data Modeling Tools

Data Modeling

A Data model is a conceptual representation of data structures (tables) required for a database
A data model visually represents the nature of data, business rules governing the data, and how it will be organized in the database.
Data modelers are responsible for designing the data model and they communicate with functional team to get the business requirements and technical teams to implement the database.
 


Various Data Modeling Tools

Data modeling tools are the only way through which we can create powerful data models. Following are the list of popular  data modeling tools.
Popular Data Modeling Tools
Tool Name  Company Name
Erwin                      Computer Associates
Embarcadero         Embarcadero Technologies
Rational Rose        IBM Corporation
Power Designer     Sybase Corporation
Oracle Designer     Oracle Corporation
 


Data Modeling Development Cycle

  • Gathering Business Requirements – First Phase:
    Data Modelers have to interact with business analysts to get the functional requirements and with end users to find out the reporting needs.
  • Conceptual Data Modeling(CDM) – Second Phase:   
    This data model includes all major entities, relationships and it will not contain much detail about attributes and is often used in the INITIAL PLANNING PHASE.
  • Logical Data Modeling(LDM) – Third Phase:   
    A logical data model is the version of the model that represents all of the business requirements of an organization.
  • Physical Data Modeling(PDM) – Fourth Phase:
    This is a complete model that includes all required tables, columns, relationship, database properties for the physical implementation of the database.
  • Database – Fifth Phase:
    DBAs instruct the data modeling tool to create SQL code from physical data model. Then the SQL code is executed in server to create databases.



DATA MODELING STANDARDS

  • Several data modelers may work on the different subject areas of a data model and all data modelers should use the same naming convention, writing definitions and business rules.
  • For example, when a data warehouse is designed, it may get data from several source systems and each source may have its own names, data types etc. These anomalies can be eliminated if a proper standardization is maintained across the organization.

Some general guidelines are listed below that may be used as a prefix or suffix for a table.

  • Lookup – LKP – Used for Code, Type tables by which a fact table can be directly accessed.
    e.g. Credit Card Type Lookup – CREDIT_CARD_TYPE_LKP
  • Fact – FCT – Used for transaction tables:
    e.g. Credit Card Fact – CREDIT_CARD_FCT
  • History – HIST – Tables the stores history.
    e.g. Credit Card Retired History – CREDIT_CARD_RETIRED_HIST
  • Statistics – STAT – Tables that store statistical information.
    e.g. Credit Card Web Statistics – CREDIT_CARD_WEB_STAT
  • Key – Key System generated surrogate key.
    e.g. Credit Card Key – CRDT_CARD_KEY
  • Identifier – ID – Character column that is used as an identifier.
    e.g. Credit Card Identifier – CRDT_CARD_ID
  • Code – CD – Numeric or alphanumeric column that is used as an identifying attribute.
    e.g. State Code – ST_CD
  • Description – DESC – Description for a code, identifier or a key.
    e.g. State Description – ST_DESC
  • Indicator – IND – to denote indicator columns.
    e.g. Gender Indicator – GNDR_IND
  • Index – Index – IDX – for index names.
    e.g. Credit Card Fact IDX01 – CRDT_CARD_FCT_IDX01
  • Primary Key – PK – for Primary key constraint names.
    e.g. CREDIT Card Fact PK01- CRDT-CARD_FCT_PK01
  • Alternate Keys – AK – for Alternate key names.
    e.g. Credit Card Fact AK01 – CRDT_CARD_FCT_AK01
  • Foreign Keys – FK – for Foreign key constraint names.
    e.g. Credit Card Fact FK01 – CRDT_CARD_FCT_FK01

 


Steps to create a Data Model

  1. Collect Business Requirement
  2. Create domain.
  3. Create Conceptual Model.
  4. Create Entities and add definitions.
  5. Create Logical Model.
  6. Create attribute and add definitions.
  7. Create Physical Model
  8. Assign data type to attribute.
  9. Create primary or unique keys to attribute.
  10. Create check constraint or default to attribute.
  11. Create unique index or bitmap index to attribute.
  12.  Create foreign key relationship between entities.
  13. Add database properties to physical data model.
  14. Create SQL Scripts from Physical Data Model and forward that to DBA.
  15. Maintain Logical & Physical Data Model.
  16. Create a change log document for differences between the current version and previous version of the data model.



Role of a Data Modeler

  • Business Requirement Analysis.
  • Development of data model.
  • Review.
  • Creation of database.
  • Support & Maintenance

Conceptual Data Modeling

  • Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE
  • Conceptual data model is created by gathering business requirements from various sources like business documents, discussion with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modelers create conceptual data model and forward that model to functional team for their review.

Logical Data Modeling

  • This is the actual implementation and extension of a conceptual data model. A Logical data model is the version of a data model that represents the business requirements(entire or part) of an organization and is developed before the physical data model.
  • As soon as the conceptual data model is accepted by the functional team, development of logical data model gets started. Once logical data model is completed, it is then forwarded to functional teams for review. A good data model is created by clearly thinking about the current and future business requirements. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules.

Physical Data Modeling

  • Physical data model includes all required tables, columns, relationships, database properties for the physical implementation of databases. Database performance, indexing strategy, physical storage and denormalization are important parameters of a physical model.
  • Logical data model is approved by functional team and there-after development of physical data model work gets started. Once physical data model is completed, it is then forwarded to technical teams(developer, group lead, DBA) for review. The transformations from logical model to physical model include imposing database rules, implementation of referential integrity, super types and sub types etc.

Relational (OLTP) Data Modeling

  • Relational Data Model is a data model that views the real world as entities and relationships. Entities are concepts, real or abstract about which information is collected. Entities are associated with each other by relationship and attributes are properties of entities. Business rules would determine the relationship between each of entities in a data model.
  • The goal of relational data model is to normalize (avoid redundancy)data and to present it in a good normal form. While working with relational data modeling, a data modeler has to understand 1st normal form thru 5th normal form to design a good data model.

Dimensional Data Modeling

  • Dimensional Data Modeling comprises of one or more dimension tables and fact tables. Good examples of dimensions are location, product, time, promotion, organization etc. Dimension tables store records related to that particular dimension and no facts(measures) are stored in these tables.
  • For example, Product dimension table will store information about products(Product Category, Product Sub Category, Product and Product Features) and location dimension table will store information about location( country, state, county, city, zip. A fact(measure) table contains measures(sales gross value, total units sold) and dimension columns. These dimension columns are actually foreign keys from the respective dimension tables.



Difference between Relational and Dimensional Modeling

Primary level of information Secondary level of information
Tables are units of storage Cubes are units of storage
Data is normalized and used for OLTP. Optimized for OLTP processing Data is denormalized and used in data warehouse and data mart. Optimized for OLAP
Several tables and chains of relationships among them Few tables and fact tables are connected to dimensional tables
Volatile(several updates) and time variant Non volatile and time invariant
Detailed level of transactional data Summary of bulky transactional data(Aggregates and Measures) used in business decisions

Some Important terms and abbrevations

Attribute          An attribute is a part of the description of the entity.
Entity               Any kind of information of importance to the business.
Relationship     A relationship (link) between two entities.
Cube                Collection of Dimension and Facts
Dimension       Textual description of the business
Facts                Numerical units of the business
Normalization   Process of removing the redundancies
OLTP                Online Transaction Processing
OLAP               Online Analytical Processing