Data Mining

Data Mining can be described as using statistical techniques for analysing large quantities of data to determine relationships and dependencies which may not be readily apparent.  It is especially effective for aggregating and summarising data, establishing subgroups based on the content, and finding trends which highlight possible anomalies.

As a simple example, consider a franchise which consists of shops which stock 1000 products.  Each shop makes 75-100 sales per day; 75 percent of these sales are for 1-3 products, and another 15 percent for 4-5 products.  It offers a customer loyalty card for which 25 percent of its customers have registered, which accrues points for each purchase toward an eventual gift/discount. The computerised system comprises Point of Sale (cash register), inventory management and loyalty card management (but not full-fledged customer relationship management) and uploads data to head office nightly.

The primary data sources available are the daily sales reports; ongoing inventory reports (stock on hand and re-order history); loyalty card usage history and points balance; and customer information gathered via the card application.  There may also be qualitative information such as conversations between staff and customers about products.

  • which products are most popular and which ones might be considered “niche”;
  • what is the effect of special promotions and occasions on sales;
  • what does a customer spend on average with/without a loyalty card;
  • which products appeal to which customer demographic group(s);
  • do some loyalty cards go unused and how can they be revived;
  • can the information be used to tailor promotions and to target customer groups;
  • are there data quality issues to be investigated and corrected?

Starting points for collating product information might include:

  • define and assign a set of product type categories within the inventory system (keep “miscellaneous/unknown” to below one percent);
  • aggregate individual product sales by time interval (week/month) to separate popular from niche products;
  • aggregate sales by product category to identify the leading categories, and leaders within each category;
  • factor in promotion and seasonal dates to assess the impact on sales (consider in-house versus manufacturers’ promotions);
  • analyse multiple product transactions to tabulate popular versus niche concurrent sales;
  • capture and log anomalous data for follow-up;
  • capturing those loyalty cards whose last use was earlier than a selected date to detect inactivity;

By combining the two sets of data, it should be possible to build a more comprehensive view of customer preferences.  This research might include:

  • reviewing the categories of product purchased by each customer group;
  • correlating loyalty card use to promotional sales and multiple-item purchases;

In designing the customer analyses, it is essential and a legal obligation that the customer’s privacy be protected.  The data to be used in analysing customer purchase patterns must be recoded to use only the customer demographics.  Details which could identify the individual customer should never be used directly in building an analysis; for instance, the customer’s date of birth could be recoded into an age band (25-29, 30-34) and the loyalty card address could be reduced to just the postcode.

Who's Online

We have 1 guest online