How can "brute force" computing help you transform data management?

Whether you buy or sell IT technology, you have benefited from a long succession of doublings of the number of transistors on a logic chip, every two-three years as predicted by "Moore's Law." Looking ahead, brute force computing opens up opportunities to reshape business systems architectures and to transform the management of business data.

In May 2005, Intel's management conducted a
scavenger hunt for an original of Gordon Moore's
1963 magazine article in which he first advanced
his "Law." Intel offered a reward of $10,000, modest
given that much of Intel's $140 billion market
capitalization is a product of Moore's Law. Intel's
CEO also predicts several more doublings.

The chart at the right illustrates the performance
impact of the approximately 19 doublings since
about 1963 plus the next two to come. Recent and
prospective increases in processing power are huge.
Therefore, the question as to what you are going to
do with the increased power offered by the next two doublings is very important.

Additionally, a single-chip perspective understates the opportunity, because computer manufacturers like Dell, HP, IBM and Sun are happy to package arrays of processors or through grid computing to enable your applications to call on hundreds or even thousands of processors. They have also moved to 64-bit computing and faster, bigger disk storage. Thsee suppliers very much need applications suited to putting those multi-processor platforms to productive, profitable  business use, but the point of this article is that customers also need such applications to cut through today's data management thicket.

More compute power can of course disappear into your data center without a trace of improvement. Adding raw processing power may  produce little or no performance gain because the design of the applications and database create choke points.

Using brute force computing to transform enterprise data management

To make your enterprise function more profitably, more effectively and with fewer compliance headaches, I suggest that you exploit brute force computing to transform data and, a necessary adjunct, to apply business rules directly to raw data. Recent circumstances - such as the difficulty and expense involved in attaining Sarbanes-Oxley compliance - are illustrative of current shortcomings in data design and data management. Trading off  increasingly cheap and powerful hardware to improve data management and data integrity is a "no-brainer," except that you have to rethink data management to make that trade off produce the desired results.

One has to begin at the beginning - with the data itself. Enterprise "Truth" rests on a bedrock of "real" data concerning transactions and events. This "real" data is the information originating in sales orders, production orders, shippers, purchase orders, receiving transactions, customer payments, payroll transactions, capital expenditures, RFID scans and many others.

However, that bedrock of raw
fact often is scattered and obscured by layers of "synthetic data."  Synthetic data looks to the naked eye like "real" data, but is instead the result of rules applied to real data to summarize or
to "bucket" it. As an example, the revenue figure for the month of June for a certain sales group would be a monetary figure that looks real, but is actually a synthetic result of arithmetic, time period definitions, revenue recognition rules, and other practices and conventions. It could be in dispute because of inaccuracies in the raw data, but it is more likely that disputes and errors are a consequence of the synthetic aspects - for example, doubts as to whether the rules were applied correctly or whether the correct rules were chosen, or whether the rules used this year were consistent with last year's rules, etc.

In years past, business application software and data management had to be designed to minimize consumption of scarce and expensive computing resources.  As part of that accommodation, raw transaction data typically is transformed into synthetic data. Even a basic "fact" such as a given customer's 'balance due" amount is a synthetic number generated by adding up payments due and subtracting the sum of cash and credits applied and credits. In a time of computing scarcity, storing the synthetic data field "balance due" avoided the response time delay that would occur if the system waits for a user request and then computes the balance on the fly.

As this simple example illustrates,  software and database designers resorted to synthetic data "buckets", because it would take the system far less time to retrieve the precomputed number than to retreive the raw data and compute the equivalent data dynamically. However, as in holding physical goods inventory in a distribution center, there is no free lunch because inventory clogs the shelves, becomes obsolete, etc .  Additionally, there is a high price paid in terms of business process impact and  software and database complexity. For example, to keep the balance due field "fresh," the application has to recompute and save a new balance every time a new invoice goes out or a new payment is applied or a customer credit is issued. Less reliablesd systems recompute balances nightly, and lack of data freshness often creates business process disconnects and disputes. More sophisticated ERPs can keep synthetic data up-to-date, but at a huge price in terms of system complexity and processing cost. The database itself becomes more complex, because to have a data bucket like "balance due", the data base has to include at least one other data field and often several or perhaps even an entire table plus volidation, "triggers," etc. There is nothing "lean" about the process.

Synthetic data becomes progressively more opaque as synthesis is stacked on synthesis. To generate something as simple as a busines unit level data bucket holding a single product's monthly sales figure for, say, June, one must define "June" (fiscal, monthly?), deal with returns and credits, multiple channels, sometimes multiple versions of the product, etc. Because the effect of rules are embedded in the contents of the synthetic data bucket, it is difficult to detect "apples and oranges" mixtures of synthetic data that were computed based on differing rules. If detected, it is often even more difficult to back out the inconsistencies. For managers who are puzzled when aggregate sales commissions go up even though aggregate sales went down, the answer often is that the synthetic data  "bucket brigade" is at work.

In response to Sarbanes-Oxley, there are Herculean labors underway to validate the thousands of buckets and their synthetic data contents. What is proposed here will lighten the load.

Use increased computer power to reduce data replication and synthesis

What is suggested is to run the business and business systems directly from the"bedrock" of raw data, while using brute force computing to generate on the fly whatever totals, lists, or reports are needed.

To oversimplify only slightly, if someone wanted to obtain the revenue and sales tax invoiced for the XYZ Company, in the brute force world the application software would simply sum the monetary values directly from what could be millions of sales orders, invoice and payments. Using "grids" of low-priced computers, the massive amounts of data could be "crunched" within seconds. The term "monetary value" is used on the assumption that for a large company multiple currencies are involved and that the program would also perform on the fly whatever currency conversion is appropriate. The CEO and the CFO could then attest rather comfortably to the sum, because there would be no mystery about the data or the rules applied to the data.

This option is becoming feasible because, in comparison to today's computing and storage resources, it is remarkable how little bedrock business transaction data exists. As a hypothetical example, if the XYZ Company reported annual revenues of $15 billion for 2004, with average revenue per sales line of $200, that company generated about 75 million sales lines in 2004. If it also generated an average of 20,000 characters of "real" data per sales line - shippers, invoices, etc, its entire order to cash process could be accommodated on three 500 gigabyte disk drives - with perhaps another eight or ten to provide redundancy, working space and to accommodate application software.

The rules governing operations against raw data would go into a "rules engine" environment.  A person performing a high level business function would not know or need to know that stacks of rules are being invoked. To address compliance issues, it becomes much more straightforward to analyze the multi-layered rules and, if needed, look for gaps or errors in raw data.


Conclusion

We have huge amounts of raw computing power at affordable prices: yesterday's computers were toys in comparison. More capacity is on the way.

However, we cannot effectively exploit that raw power by continuing to rely on application and data base design approaches that were optimized to accommodate older, less powerful computing capabilities. Whether you are a buyer or seller of application software and hardware, what is important to think about is that we have reached one of those "inflection points" at which a change in scale - faster, bigger, more powerful - triggers changes in kind. The proposition described above - to run business systems directly off the raw data generated by business operations - is one way to exploit brute force computing to streamline your business and bullet-proof your audit trails. For more information on the technology of "brute force" computing, click
here. For more on the epistemological aspects, click here.
HOME