Data warehousing tools

The term data warehousing was coined just six years ago by W.H. Inmon, a noted database expert and founder of Pine Cone Systems Inc. of Edgewood, Colo. Though widely used, it's also largely misunderstood. Before we look at specific types of data warehousing tools, let's clarify a few terms.



The term data warehousing was coined just six years ago by W.H. Inmon, a noted database
expert and founder of Pine Cone Systems Inc. of Edgewood, Colo. Though widely used,
it’s also largely misunderstood. Before we look at specific types of data warehousing
tools, let’s clarify a few terms.


A data warehouse is a large collection of data from multiple sources within an
organization. The data is used to support enterprise decision-making, forecasting and
planning. It is the end use of that data—the support of enterprise-level
decision-making—that defines a data warehouse.


A related, sometimes misused, term is data mart, which is a smaller, more localized
version of a data warehouse. A data mart is designed to serve a division, department,
location or other entity within an enterprise.


Typically, agencies will implement a data warehouse by assembling data from legacy and
operational database systems throughout the enterprise, although the warehouse itself is
not a production database. Because the original data comes from multiple sources in a
variety of usually incompatible formats, a lot of processing is needed both to create the
data warehouse and to use it.


Hundreds of products are at play in the data warehousing market, which today may be
second only to the Internet as a fertile ground for marketers. Products range from
all-in-one integrated suites, which, their makers claim, can handle the entire data
warehousing job—compilation, management and analysis—to specialized niche tools
that do only one job. Broadly, these specialized tools form three groups:


A data warehouse is an enterprise database that’s gone about as far as it can go.
To take it to these ends, you’re going to need more than a few software tools.
We’re talking about serious computer hardware and massive amounts of online storage,
sometimes measured in multiple terabytes and, in some cases, conceivably even exabytes.


The database management system component of data warehousing is not fundamentally
different from any other database.


According to the Meta Group Inc. of Stamford, Conn., most data warehouses today are
managed with standard Unix relational DBMSes from Sybase Inc., Oracle Corp., IBM Corp. or
Informix Software Inc. of Menlo Park, Calif. It is worth noting, however, that some
DBMSes, such as Informix’s Red Brick Warehouse, are designed for data warehousing.


In this Buyers Guide, I will concentrate on tools for assembling and converting the
data from its original sources, and products that perform sophisticated queries and
analyses.


I’ve also eliminated those products oriented toward smaller-scale projects. For
example, if a particular server is specified to run on Microsoft Windows 95 and use
Microsoft Access as its main DBMS, it’s just not a realistic candidate for managing
anything that should be called a data warehouse. Database, yes; data mart, possibly; data
warehouse, no way.


When creating a data warehouse, it’s almost inevitable that you’ll get data
from a lot of systems that don’t usually talk to one another—assorted
mainframes, distributed departmental-level systems such as IBM AS/400s, servers running a
variety of Unix flavors on a wide variety of hardware, and even from LAN-based data on
Novell NetWare, Banyan Vines or Microsoft Windows NT systems, and smaller, proprietary
systems, too.


And you may well be getting data from outside sources: customers, suppliers, and state
and federal agencies.


Assembling data from such multiple sources requires a serious effort at conversion,
both in making the physical transfers workable and in mapping the data structures into
something that will handle all the incoming data. The second of those two jobs is where
the greatest amount of work is involved.


Keep in mind that you may have to accommodate both fixed-length data fields and
records—those 80-column card systems persist—and variable-length fields. Data
from multiple-table relational databases, maybe normalized and maybe not, will have to be
matched with data from hierarchical network-based and object-oriented databases.  


For example, you’ll encounter a variety of date structures that have to be
reconciled over and above year 2000 issues. Expect also to encounter different accounting
systems, different ways of handling null entries, mismatched file-naming conventions and
duplicated part numbers—everything except a simple answer.


Before you can select a product, you need to create a comprehensive inventory of all
the machines, physical locations, DBMSes and networks from which you will get the data to
populate the warehouse. And you’ll also have to figure out how much data will have to
be moved and how often.


After you’ve done such an inventory, you’re ready to look at some of the
acquisition tools in the accompanying table. Given the uniqueness of each organization and
its warehousing needs, there’s no shortcut to doing your homework and seeing just
which products might be useful in your situation.


OK, you’ve collected the data from all your sources, massaged it so that you can
make comparisons, and it’s all assembled in a database managed by a good DBMS. Now
what?


Most queries that are run against databases outside of warehouses are specific, so
you’re looking for definite information that’s presented in a definite form.
That’s fine as far as it goes, but it’s generally not enough to justify having a
data warehouse.


The value of warehousing lies in its ability to let you discover information you
didn’t know you had, explore new possibilities, follow a line of reasoning wherever
it takes you, and do what-if analyses and create new scenarios. For this work, you need a
package that slices and dices the data in as much detail and as many dimensions as you
need.


The process is generally called OLAP, but other related terms are data mining, drilling
down, or, for one vendor, data diving. One factor that differentiates these OLAP tools
from query and reporting packages is that you use them interactively.


With most OLAP products, when you want more detail, you just double-click on a category
to drill down. Or you can show data as a graph by simply clicking on a button, or bring in
a new data dimension with a click or two. Equally important, you can usually save the OLAP
queries in a form that lets others continue the interaction to extract even more
information from the data.


Using OLAP tools, you often can discover correlations and relationships in your data
that you didn’t even suspect existed. You can analyze trends and population factors
more accurately and in finer detail, look at time-series and statistical factors, and view
different types of graphical representations and visualizations to see what they reveal.


Some OLAP tools are designed to work directly with warehouse databases on networked
workstations, while others are designed to download pieces of a data warehouse onto a
desktop PC and do most processing and analysis locally.


Establishing and maintaining a data warehouse is one of the biggest information
technology projects your organization may ever tackle, short of a complete re-engineering
of procedures. It’s mandatory that you put a lot of time into investigating
alternatives and products to find which ones will serve you best. This guide is only a
starting point—a product that isn’t listed here may be the right one for your
organization.


More information is available on the Web. Three good starting points are: Data
Warehousing at http://www.datawarehousing.com,
  the Data Warehousing Institute at http://www.dw-institute.com,
and the International Data Warehousing Association at http://www.idwa.org/.


The following books also can help:


Russell Kay writes about and reviews software and hardware in Worcester, Mass. E-mail
him at russellk@bix.com.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.