Driver-Based Planning Step 4: Finding data
Posted by Michael Coveney
In this series of blogs I am providing a comprehensive approach to the design, construction, roll-out and maintenance of a modern driver-based planning solution. In this blog I look at how to identify data, assess its validity, and what to do if it’s missing.
Data is the fuel that models consume in order to provide analyses that support management action. If data is missing or incomplete, the model could be worthless. So far in these blogs we have defined the measures that reflect how individual business process activities impact goals, and the measures that reflect external influences such as competitor prices and the inflation rate of raw materials. We now need to determine where this data will come from and what we need to do to make it ‘acceptable’ to the model. For each data item required we should record:
Its source and format. Some of the data will be internal to the organisation – either from an existing transaction system, or it may be entered manually. Other items of data such as inflation and exchange rates will come from external sources. Some may be available as a CSV file and others may be accessible directly through a data ‘pipeline’ to the underlying transactional system. It’s important to record the format it comes in as we may need to check that we have the right tools in the software solution we’re going to use, in order to load it.
What time interval it covers and how often it’s available. Most transaction data will be available on a daily basis but some items from the general ledger such as assets, may only be available on a monthly basis. With ledgers, quite often the data is held in a YTD format and so will need to be subtracted from the previous month to get a monthly value. Other data may be weekly and so will need to be aggregated into monthly figures if that is the granularity of the model.
How reliable is the data. Some data sources may be based on intuition or ‘best guesses’ and therefore may be less accurate than other data items. This can be particularly true of forecasts which themselves may be dependent on other factors not yet known. With these sources it is worth noting the reliability of values compared to previous periods, so this can be taken into account when decisions are made based on them.
With every data item there will be a cost in terms of time and effort required to access it and then to put it into the right format for the model. There is also a monetary cost as data items may have a purchase price and/or may need converting/summarising into a format that can be used. The latter may need system tools to automate the data transformation process.
But what if some of the data is not available? You can’t just ignore it if it has been established that this data has an effect on results. If this is the case, then two things can be done. The first is to make an ‘educated’ guess as to what the values are and the second is to look at ways of collecting this data in the future. The latter option may require its own analytic system but could be a wise investment if the results they impact are to be more predictable.
The only other thing to consider is that in the past, data collection was fairly restricted in terms of availability and the level of detail. But in today’s connected age, the vast quantities of data can be overwhelming. For example, most websites can capture in extreme detail where users come from and the exact pages (and order) they looked at. This is potentially very valuable but in order to make it useful may require some form of pre-processing using Big Data tools and techniques and the use of ETL tools (Extract, Transform and Load) to summarise and place it into the format required by the planning model.
Next time we will look at software requirements.