Data warehousing tools

Recent developments in data warehousing have run toward the emerging issues of getting information to the Web, developing enterprise information portals and integrating wireless technology. But the basic steps are still a challenge for developers.

By John H. MayerSpecial to GCNRecent developments in data warehousing have run toward the emerging issues of getting information to the Web, developing enterprise information portals and integrating wireless technology. But the basic steps are still a challenge for developers.The process of identifying, acquiring, transforming and loading data into the warehouse remains the most complex, time-consuming and costly portion of the development cycle.The difficulties information technology managers face when they tackle this task have been chronicled many times. A recent study by the Data Warehousing Institute of Gaithersburg, Md., for instance, found that almost one-third of IT managers surveyed indicated that their efforts to collect data did not meet expectations.More often than not, projects failed simply because the data that IT departments want to collect from legacy sources is inaccurate, inconsistent or difficult to acquire in its original form.This guide focuses on commercial data warehouse tools that perform the data extraction, transformation and loading (ETL) functions.Industry analysts typically define extraction as the process of identifying and retrieving a set of data from an operational system. Transformation tools play a key role as IT managers integrate data from multiple sources. They permit a system administrator to develop rules for integrating data from different sources and tables to form a single table or entity.Just as crucial is the process of formatting the output from the transformation process into a form acceptable to the target database's load utility. This typically includes the use of a scheduling mechanism to trigger the loading process.All of these functions usually are performed by a single set of tools. The tool sets also often support a metadata repository and a scheduling engine.IT managers facing the prospect of integrating data from multiple sources into a single repository traditionally have had to start by grappling with the build-or-buy question.There are excellent reasons to build your own ETL system. First, your in-house programming staff is invariably better acquainted with the legacy database and operating environment than any vendor could be. Second, few IT department personnel have the free time to evaluate, compare, select and install commercial tools, let alone learn to properly use them. And finally, few IT budgets today can afford the additional capital expense of the tools, especially for a first-time, proof-of-concept project.Despite the valid arguments for building your own system, vendors have over the past few years made a convincing case for buying one instead.The learning curve issue is a good example. Vendors have dramatically shortened the time it takes a developer to get up to speed with an ETL tool by integrating highly graphical and wizard-driven interfaces that guide developers through the implementation process.Today, developers can construct data flow diagrams to visually model a task and automatically extract, transform, validate and load data without writing a single line of code. And highly intuitive interfaces are simplifying the task of mapping such things as complex enterprise resource planning data to a target database.At the same time, vendors are extending the scope of their tool sets and putting an end to the days when developers had to write code to string together different functions. Products such as Oracle Warehouse Builder, for example, replace a bevy of individual point tools with a single common environment capable of modeling and design, data extraction, loading, aggregation and warehouse administration.In addition, many tools such as Warehouse Builder, IBM's Visual Warehouse, Computer Associates International Inc.'s DecisionBase and others add extensive metadata management capabilities.Tool performance isn't taking a backseat, either. Many vendors have recently added enhanced loading capabilities for the leading target databases. New loaders in Hummingbird Ltd.'s Genio Suite, for instance, significantly reduce the time required to manage large volumes of transactions.Other products, like Formation from Informix Software Inc., integrate parallelism into their architectures to take advantage of all available CPUs. By automatically segmenting a job across multiple CPUs, such tools boost performance while making more efficient use of enterprise resources. Users can even specify parallel settings on individual operators or groups of operators.Meanwhile, companies have introduced new techniques to help reduce the data processing workload, a particularly crucial consideration as warehouses sag under mountains of data and an escalating number of transactions.Products such as ChangeDataMove from BMC Software Inc. lets designers almost instantly reflect in the data warehouse incremental updates and changes in the operational transaction processing environment.By tracking the changed records during the input/output operation, the tool captures changes virtually as soon as they occur. That not only keeps target databases more current and consistent with source databases than they have been in the past, but it also allows the tool set to transform and transport to the target database only the data that has changed, thereby reducing the workload.When purchasing an ETL tool, a number of issues are worth considering. Will the tool support the key legacy databases and data types your application requires? Will the tool support performance levels that could easily reach hundreds of transactions per second without adversely affecting traditional operational performance?Take a close look at the architecture of the tool as well. Tools that offer reusable modules, for instance, often can save considerable development time and effort. If you have already designed a custom in-house process and are looking to replace it, examine where the current commercial product can improve efficiency. Will the purchase of this tool force you to revamp an established process? And if so, what are the costs involved?At the same time, look for tools that offer performance enhancements, such as loading files into memory for lookup or adding parallelism to maximize performance.Another issue to consider is the support a tool vendor can provide. Although vendors have made their products much easier to use, the tools are still software and require a lengthy learning curve.One way to prevent the learning curve from undermining your project is to ask the vendor, as a condition of the sale, to provide someone who understands how to design with the tool. Such a consultant will come at a high price, but the investment will be worth it if it guarantees success.Although vendors continually have enhanced their products, a number of factors in your legacy systems will affect how well a tool extracts, transforms or loads data. The availability of network bandwidth, for example, is one issue. Bandwidth bottlenecks can easily slow down the performance of the most optimized tool.The volume of data to be moved and collected in a project also is crucial. With the rise of the Web and electronic commerce, businesses are continually surprised at how much data they can collect in a short period of time. Yet the larger the prospective data warehouse, the longer and more complex the process of extracting, transforming and loading data.Other issues such as hardware configurations'the amount of memory and speed of disk performance on the source and target systems'also can influence the speed of the process, as will the type of interconnect used to access source and target data. Will the system have native access or use Open Database Connectivity or some other interface?Finally, the construction of the target database itself also will have a major impact on data transfer. Are there constraints in the data warehouse in terms of the complexity of the data structure or number of indices? And has the database been tuned to maximize performance?The task of extracting, transforming and loading the data from legacy sources into a central repository doesn't grow any simpler as the amount of data you collect skyrockets.But the good news is that there is a wider choice of more capable and comprehensive tools today than there ever has been.The key to selecting the right tools will lie in your own understanding of your operating environment and the unique requirements you face as you build your data warehouse.XXXSPLITXXX-
The right products can help you build a foundation for successfully managing data




































Formation from Informix Software offers a visual method of designing data transformtion jobs. Pricing for Formation starts at $7,500 per processor.





















Tips for buyers

' Before you decide to build a custom extraction, transformation and loading system, make sure you consider ongoing maintenance and enhancement costs, particularly as the system's environment grows increasingly complex.

' Before purchasing a commercial tool, talk to organizations that have used it in similar environments to judge whether the tool will save enough time to justify the investment.

' Consider requiring that the vendor provide an experienced consultant as a condition of the sale, to help accelerate the learning curve.

' Take a close look at the reusability of modules within the tool and their affect on your productivity.

' If the tool offers parallelism features, take full advantage of them to maximize performance.








Factor this















John H. Mayer of Belmont, Mass., writes about information technology.



















































































































































VendorProductPlatform supportLegacy data sourcesTargets supportedCommentsPrice
BMC Software Inc.
Houston
800-841-2031
www.bmc.com
ChangeDataMoveNT, Unix, OS/390IMS, IMS fast path, CICS/VSAM, VSAM batch, DB2Oracle, DB2 UDB, Sybase, SQL ServerNon-intrusive change capture tool; supports near-real- time transaction-based change data propagation to enable fast and complete synchronization of data$20,000 up
Computer Associates International Inc.
Islandia, N.Y.
800-225-5224
www.cai.com
DecisionBaseNT, UnixIMS, VSAM, physical sequential files, DB2, ADABASE, CA-IDMS, enterprise resource planning systemsDB2, OS/390, OS/2 DB2/400, Informix, SQL Server, Oracle, Red Brick, SybaseIntegrates data transformation with enterprise repository; includes metadata management capabilities$200,000
Constellar Corp.
Redwood Shores, Calif.
650-631-4800
www.constellar.com
Warehouse IntegratorClient: Win95, NT; Server: NT, HP-UX, Solaris, AIXAll major RDBMSes, other data types through Constellar Hub productOracle, Informix, SQL Server, Sybase, CA Ingres, Red Brick ESSbase, PaBLO, MicroStrategy DSSAgent, SQL/MP, ODBCDimension reference model supports management of many types of data warehouses across different MOLAP and ROLAP platforms; new version adds native support for Oracle Express and metadata auditing$100,000 up
Decisionism Inc.
Boulder, Colo.
800-938-8805
www.decisionism.com
Aclue Decision Supportware Version 2.4Client: Win95, NT; Server: NT, Solaris AIXFlat files, all major RDBMSes, ODBCOracle Financial Analyzer, Oracle Express, Hyperion Essbase, SQL Server 7.0 OLAP Services, all major RDBMSesSupports three OLAP platforms; includes comprehensive audit trail and drill-back capabilities, and advanced time-server management$37,500 up
Evolutionary Technologies International Inc.
Austin, Texas
800-856-8800
www.eti.com
ETI*ExtractClient: Win95, NT; Server: NT, AIX, Solaris, HP-UXFlat files, IMS, VSAM, Oracle, Sybase, Teradata, SQL Server, Informix, SAS, TurboI, IDMS, DB2, SAP R/3Most major relational databasesManages batch, legacy, near-real-time and clickstream data; latest release adds ability to control user access to objects in MetaStore, improved administration metadata audit trail$100,000 up
Hummingbird Ltd.
Toronto
877-359-4866
www.hummingbird.com
Genio Suite 4.0Client: Win95, NT; Server: NT, Win 2000, Solaris, AIX, HP-UXAll major file formats, relational databases and ERP productsOracle Express, Hyperion EssbaseProvides universal data exchange; latest version adds views of third-party metadata repositories including Business Objects' Universes and Cognos' Catalogs; also adds enhanced loading capabilities for Oracle8 and NCR teradata databasesGenio engine: $50,000 for NT, $75,000 for Unix; $20,000 for design tools
IBM Corp.
Armonk, N.Y.
888-411-1932
www.ibm.com
Visual WarehouseAIX, SolarisDB2, Oracle, Informix, SQL Server, CICS/VSAM, IMSDB2, Hyperion EssbaseProgram for large and small businesses provides a managed OLAP environment and Web-enabled infrastructure; comes bundled with DB2 UDB$36,950 up
Informatica Corp.
Palo Alto, Calif.
800-970-1179
www.informatica.com
PowerMart 4.6Client: Win95, NT; Server: NT, HP-UX, AIX, SolarisDB2, flat files, IMS, Informix, SQL Server, Access, Oracle Sybase, VSA, ODBCFlat files, Informix, SQL Server Access, Oracle, PeopleSoft ERP, SAP Business Info Warehouse, SybaseIntegrated suite of products includes a deploy folder wizard to guide developers, source-extraction mapping templates, flat-file acceleration, advanced session management and multilevel security$45,375 up for NT $66,000 up for Unix platforms
Informix Software Inc.
Menlo Park, Calif.
650-926-6300
www.informix.com
FormationNT, Solaris, HP-UXInformix, Oracle, SQL Server, ODBC, text files, EBCDIC files, binary filesInformix, Oracle, SQL Server, ODBC to othersIntegrated autoparallelism automatically segments jobs across multiple CPUs; includes user-specifiable parallel settings and native loaders for Informix, Oracle, SQL Server$7,500 up per processor
DataStageNT, UnixFlat files, VSAM, SQL ServerSQL ServerFormer Ardent Software product provides an automated workflow environment with reusable components$47,500 up
Oracle Corp.
Redwood Shores, Calif.
800-672-2531
www.oracle.com
Warehouse BuilderClient: NT; Server: Oracle8iOracle, DB2 via gateway, ODBC, flat files, IMS, VSAM, Oracle ERP, SAP R/3Oracle8i, 8i Catalog, Discoverer and ExpressIntegrates modeling and design, data extraction, movement and loading, aggregation, metadata management and tool integration into single solution; has a wizard-driven interface and is tightly integrated with Oracle8i; integrates metadata via Common Warehouse Model$25 per Universal Power Unit (for UPU, multiply MHz speed of Intel chip by 1; multiply speed of RISC chip by 1.5)
SAS Institute Inc.
Cary, N.C.
919-677-4444
www.sas.com
SAS/AccessNT, UnixDB2, SQL Server, Oracle Teradata, SAP R/3, Baan, flat files, text files, binary filesMost file structuresSupports a variety of file structures$31,664 up for server license with OLAP tool
Taurus Software Inc.
Palo Alto, Calif.
650-961-1323
www.taurus.com
DataBridgeClient: Win95, NT; Server: NT, HP-UX, AIX, Solaris, UnixWare, SGIMPE, Oracle, Image, KSAM, Allbase, fixed files, flat files, text files, CSV files, freeze filesUnix, Oracle, fixed files, flat files, text files, CSV files, Freeze files, DB2, Informix, SQL ServerETL tool with portable scripting language and a GUI that simplifies script development$29,000 to $79,000
BridgeWareSameMPE, Oracle, Image, KSAM, Allbase, fixed files, flat files, text files, CSV files, freeze filesUnix, Oracle, fixed files, flat files, text files, CSV files, Freeze files, DB2, Informix, SQL ServerAdds real-time, change detect component$20,000 to $44,000
X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.