
|
|
System Innovations
-- Integration modeling, data warehousing, data strategy
-- |
Steps of the Process
The model below defines
a methodology for building a data warehouse. As the model shows, it is a very
iterative process, where the idea is to get all the way through the process (or a
subprocess) with a component of data, then add another component and get it all the way
through the process again. It is a model that most of the data warehousing industry
now follows, developed because the warehousing process is essentially a process of
discovery. The iterative approach allows us to build in new knowledge gained during
data research, data design, build and testing, and application deployment, as we move
ahead in the overall project.

Project Initiation
Data Business Area Definition
Data Research
Data Sourcing
Data Model Completion
Perform Multidimensional
Modeling
Physical Model Implementation
Performance Tuning
Business Process Definition
Technical Architecture
Definition
Application Definition
Application Engineering
Distributed Deployment
Project Initiation
Define Business Case, Scope and Feasibility.
Seek ongoing sponsorship and executive buy-in.
Identify allies early, establish their goals, and include in the scope.
Back to steps
Data Business Area Definition
Keep the scope focused and manageable.
Determine areas of the business most important to the organization at this time.
Priority depends on:
How much information is available through automation already?
What are the industry trends and competition?
What are the events driving the business at this time?
How much is known and understood about this area of the business?
As the above factors are clarified, what will emerge is an understanding that
Certain areas can be
researched and "mined" for data immediately, the "low-hanging
fruit";
Some areas are well
understood and the amount of research can be estimated and started immediately, the
"near-term" deliverables;
And thirdly, other
areas are little-understood and need further research before they can even be estimated,
the more "long-term" areas of interest.
Conducting this type of
analysis early and often will keep a data warehouse project focused on the real goals of
the business and will assure that the research and sourcing that is undertaken is
feasible, meaningful and realistic.
Back to steps
Data Research
Determine how the data will be used (Business Process).
Inventory, question, define business rules for sending (source) applications.
Expect receiving business process will undergo a transformation as the result of new
awareness and new understanding due to having more information than ever before.
Expect the requirements to change quickly because the access to information represented by
an effective Data Warehouse will cause rapid growth and change in the functions using the
information.
Research data availability and value.
Continually evaluate priority and benefit of desired data vs. cost and time involved in
acquiring it.
Back to steps
Data Sourcing
Identify systems, suppliers and vendors that will provide data.
Prioritize based on revenue implications, strategic and competitive concerns as well as
usage and actual availability.
Gather valuable information from sourcing to be returned to the next iteration of Business
Area Definition and Data Research as data discovery results.
Equip and train teams performing sourcing to collect information to be returned to
Researchers.
Back to steps
Data Model Completion
Select a portion of the Enterprise Model to implement through the
analysis of the Business Area.
Define data relationships and resolve anomalies like many-to-many relationships.
Back to steps
Perform Multidimensional Modeling
Define Star schemas for data constructs, providing greater flexibility
and understandable data structures for the business user.
Define facts, dimensions, hierarchies and sparsity.
Define and model common business dimensions such as time, scenarios, geography, product
and customer using Modeling techniques like the hypercube, star diagrams and snoflakes.
Back to steps
Physical Model Implementation
Depending on the performance requirements and other architecture
considerations of the system, choose:
Relational DBMSs such as Sybase, SQL Server, Oracle, Informix/RS6000, DB2,
or
Multidimensional Databases like PaBLO, PowerPlay/Cognos, Mercury, ESSBase/Arbor, Lightship
Server, Acumate ES/Kenan, Express/Oracle, Gentium or Holos
or
Relational OLAP (or OLAP-on-Relational) tools like Metaphor, Information Advantage's
AXSYS, Prodea's Beacon, Redbrick Data Warehouse, Alpha Warehouse (ISI & Digital) or
Decision Warehouse from Sun Systems.
Back to steps
Performance Tuning
Determine workload
Bitmap indexing significantly improves response time over traditional indexing methods by
greatly reducing the number of read operations to the data. Bitmap indexing also
allows more users to access the warehouse simultaneously makes it easier for users to pose
a series of queries to analyze the data achieves an acceptable level of responsiveness
with a lower hardware expenditure than with traditional indexes.
Back to steps
Business Process Definition
Define/validate the process to be automated through accessing the data
warehouse.
Utilize Business Modeling to clarify and redesign if necessary.
Highlight decision-support requirements for analysis.
Back to steps
Technical Architecture Definition
Define the hardware and technical constructs to be applied in providing
the business solution.
Select core technologies (platform, networking, RDBMS, etc.).
Address structural issues.
Data Warehousing architecture includes:
- Sourcing of data from legacy systems and other applications (data collection, editing
and preparation).
- Transformation and data integration, with storage issues resolved and atomic vs.
Composite levels of granularity defined.
- Distribution of data to actual warehouse tables (further distribution to subject area
databases or Data Marts is sometimes in order).
Consideration should be given to the nature of the data, answering the primary
question of whether it serves the informational community or the operational.
Back to steps
Application Definition
Define the processes that will access and manipulate the data.
Consider incorporating an Application Server, removing the business logic to a
Three-tiered architecture which addresses performance, reliability and resource management
by moving complex application logic to an application server independent of server
databases and Client PCs:
Application server provides efficient data access, accelerated response, scheduled
background processing and serving of pre-processed reports.
Client is dedicated to presentation logic and services, and has an API for invoking
the applications in the middle layer.
Database server is dedicated to data services and file services.
Back to steps
Application Engineering
Develop physical design, test plans and scripts with expected results
and build plan.
Application Definition, Engineering and deployment are iterative, with successive
prototypes developed, providing early requirements gathering to third-level systems that
will graduate to production.
Always put test data in test tables; never load it to production. (Early prototyping
efforts in data warehousing often deliver data that is below performance thresholds for
accuracy. Performance threshold should be defined early in the process, and signed
off as part of the data research phase. Warehoused data is sometimes not required to
be 100% accurate - certain marketing applications for instance can tolerate a margin of
error up to 5%. Expectations must be documented and adhered to, but test data that
falls below standards should not be loaded to production databases.)
Back to steps
Distributed Deployment
Deploy architecture: server, client and application.
Consolidate meta-data (data about the data) for publication.
Data credibility hinges on managed expectations and consensus in the definition stage
Back to steps
Return to Data Warehousing
Top
of Page
[Company Profile] [Services] [Home] [Books
& Articles] [Laura's
Bio]
- Copyright 1998,
1999, 2000. Laura Brown,
LBPI, Inc. (DBA: System Innovations)
- Last Updated: August 18, 2000
|