Emerging Technologies for FP&A — Part 1. Databases
Posted by Michael Coveney
In this series of blogs I’m looking at a number of areas that FP&A departments must address if they are to add value to the organisations they serve in this technology driven age. In this blog I look at the changing nature of databases.
The predominant analytic tool for FP&A professionals are spreadsheets, probably due to their prevalence, versatility, low cost, and reporting capabilities. Where these are weak — in being single user, unable to present multiple views without duplicating data, difficult to setup/maintain large complex models, lack of workflow capabilities — more sophisticated analytic applications have plugged the gap. Most, if not all of these, depend on a database that provides easy access, scalability and integrity.
The Rise of the SQL database
A database is simply a way of storing data that can be searched and information retrieved at a later date. To this end there are two parts to a database — the way in which the data to be stored is structured and the tools that provide access to the content.
It can be argued that an electronic folder is a database, where data is stored as individual documents through which the file manager can be used to search for a specific text string within those documents. Similarly, a spreadsheet is a database with facilities for searching and producing analyses (e.g. summaries) from the stored content.
Analytic applications such as those used for planning, also have a database that typically holds both the data and the defined relationships between the data. For example, that a relationship exists between sales, volume and price for each product and location combination. This application ‘intelligence’ as well as what each data item represents is known as ‘metadata’.
The underlying database technology in these applications is typically a ‘relational’ database where both data and metadata are held in the form of tables that are made up of records and fields. These components can be considered to be the equivalent of spreadsheet’s sheets, rows and columns.
Some vendors use a proprietary SQL database, while others use a hybrid of their own, but they still tend to be made up of sets, records and fields. (For the purpose of this blog I am treating multidimensional databases as a specialised form of a relational database.) To analyse the data, the vendor will have created various functions using the database’s own access language, which is sometimes made available to customers.
However, with today’s information rich environment and the vast quantities of data that is generated covering multiple forms, the relational model fails in terms of performance and the type of analyses that can be produced.
This is mainly due to the structure used where data has to fit into predefined sets, records and fields and where rows have a unique primary key that is used as the basis for performing analyses.
In recent years, new forms of database have been developed that are designed to handle specific types of analyses on large data sets. These come under the name ‘NoSQL’ to indicate that these store data differently from relational databases and for which there is no standard query language. Their performance is staggering and can produce analyses that were previously impossible to do. Just think for a minute how fast Google responds to a search query? It’s able to do this as data is held in a way that optimises this type of query — and it’s not a relational database.
It is generally agreed that there are four main types of NoSQL database that come under the following names:
Key-value. These are simple databases where data is accessed via a dictionary or a map. The data for each ‘key-value’ can have a different structure. An example is storing the interactions of individual customers logging into a website. This type is ideal for simple searches but cannot really handle complex analyses.
Document. These databases are able to handle complex queries on huge amounts of data that can be stored in a structured, semi-structured or unstructured manner. This is because the data can be embedded with a description of what it contains (typically using XML or JSON) instead of having to store it within predefined fields. Each document is effectively a record that can have the same or a different structure. They also support data hierarchies where data is organised into groups called collections (analogous to the tables in relational databases). For example, a document database might include collections for customers, orders, and products, where each collection has its own and common fields.
Column. This type of database is similar to a relational database except that records are stored as columns, and the fields of a record are stored as rows. This provides vastly improved performance on large data sets that involve complex queries which makes them ideal for advanced analytics.
Graph. These store data as relationships using ‘graph theory’, which are mathematical structures used to model relations between objects. This can then be used to identifying more distant relationships and allow analyses such as which web pages customers are browsing and which products they’re buying to help create upsell/cross-sell recommendations.
In this blog we can only touch on brief summaries of these new database types. It’s worth mentioning that each type typically requires users to develop their own code who must have a good understanding of the technology architecture used. There are also many competing products in each category. As a consequence, using these databases is probably way beyond the capability of most accountants. But that doesn’t mean to say they should be ignored. As mentioned in previous blogs, FP&A departments must take on skills that can both track development, assess potential uses and determine when and where these technologies should be used.
In my next blog I will look at the kinds of applications that can be developed using these database technologies.