Data Mining can be described as using statistical techniques for analysing large quantities of data to determine relationships and dependencies which may not be readily apparent. It is especially effective for aggregating and summarising data, establishing subgroups based on the content, and finding trends which highlight possible anomalies.
As a simple example, consider a franchise which consists of shops which stock 1000 products. Each shop makes 75-100 sales per day; 75 percent of these sales are for 1-3 products, and another 15 percent for 4-5 products. It offers a customer loyalty card for which 25 percent of its customers have registered, which accrues points for each purchase toward an eventual gift/discount. The computerised system comprises Point of Sale (cash register), inventory management and loyalty card management (but not full-fledged customer relationship management) and uploads data to head office nightly.
The primary data sources available are the daily sales reports; ongoing inventory reports (stock on hand and re-order history); loyalty card usage history and points balance; and customer information gathered via the card application. There may also be qualitative information such as conversations between staff and customers about products.
Starting points for collating product information might include:
By combining the two sets of data, it should be possible to build a more comprehensive view of customer preferences. This research might include:
In designing the customer analyses, it is essential and a legal obligation that the customer’s privacy be protected. The data to be used in analysing customer purchase patterns must be recoded to use only the customer demographics. Details which could identify the individual customer should never be used directly in building an analysis; for instance, the customer’s date of birth could be recoded into an age band (25-29, 30-34) and the loyalty card address could be reduced to just the postcode.