Monday, December 02, 2013

Intro to Data Mining for Marketers - Part 1

Data mining can be defined as the process of "discovering patterns, meaning and insights in large datasets by using statistical and computational methods". Data mining works to analyze data stored in data warehouses that are used to store that data that is being analyzed. That particular data may come from all parts of business, from the production to the management. Managers also use data mining to decide upon marketing strategies for their product. They can use data to compare and contrast among competitors. Data mining interprets its data into real time analysis that can be used to increase sales, promote new product, or delete product that is not value-added to the company.


Data mining was born in the fields of Statistics and Computer Science (some might say Artificial Intelligence) and may also be referred as “Statistical Learning”. From a statistical perspective, most early and recent advances coming from Statistics have come from the Stanford Statistics department school of thoughts like  Bradley EfronJerome H. FriedmanTrevor Hastie and Robert Tibshirani. By the way, don’t forget that Stanford University is only 7 miles away from Google.

Stanford University ©

Data Mining Framework

Using data mining techniques, we, marketers, need to master an approach that will provide the decision makers with  a-priory knowledge about customers’ preferences and needs. Since there are many different kinds of customers with different kinds of needs and preferences, a simple, solid approach is meant to be a tool for performing market segmentation: divide the total market, choose the best segments, and design strategies for profitability serving the chosen segments better than the company’s competitors do. The example developed below is described for product development in auto industry, but it can be successfully implemented for any other applications where it is necessary to  find the correlations between the customer feelings or perceptions and the physical characteristics of a product. Yes, correlations, even through our statistics lenses. 

Yes,arithmophobia is over, my friend!


Any data mining application should start by understanding the business goals of the application since the blind application of data mining techniques without  the requisite domain knowledge often leads to the discovery of irrelevant or meaningless patterns. In order to understand the target customers of an automotive company, it would be helpful to examine the relationships between the vehicle image/attributes and the customer emotional benefits that are tied to psychological needs, personality traits, and personal values. Thus, data mining can enable us to understand more completely how product specific characteristics relate to customer needs and the benefits a customer hopes to obtain from them. For instance, for many people, cars, homes, restaurants and vacations provide emotional benefits as well as rational benefits. However, for a wealthy person who has everything, the emotional benefits provided by status, prestige and superiority of an expensive automobile could outweigh rational benefits such as gas economy, lower maintenance and insurance costs, and resale value.  

A target audience perhaps? "Free to do anything, in control, confident, sporty but with family."
Therefore, it will be beneficial to have a tool that will help us to respond to questions such as: What and how many of the personality attributes used to describe the customer might be shaped by the vehicle’s image?  What kind of vehicle this customer or group of customers will buy?

Data selection

This step calls for targeting a database or selecting a subset of fields to be used for the data mining. The following issues should be considered in developing a plan for collecting data efficiently:

  • Evaluation of existing data sources 
  • Specification of research approaches 
  • Data gathering (contact methods, sampling plans and instruments)

The survey research is a simple, efficient method to collect data. One of the advantages of the survey research is flexibility because it can be used to obtain many different kinds of information in many different situations. Furthermore, depending on the survey design, it may also provide information quicker at a lower cost compared to manual processing. The survey may be in the form of a questionnaire that is very flexible as there are many ways to ask questions. In preparing the questionnaire, only the questions contributing to the research objectives will be asked. The questions may be closed-ended, as they include all possible answers. In designing the survey, we also make sure that the questions are simple, direct and arranged in a logical order.  The first question should create interest if possible, and difficult or personal questions should be asked last so that respondents do not become defensive.

Instead of a traditional mail questionnaire, a more modern approach is the computer interviewing process, in which respondents sit down at a computer, read questions from a screen, and type their own answers into the computer at their own leisure. The beauty of this approach consists of its multiple benefits. As a first benefit, the respondents’ answers are automatically stored in a database. Furthermore, the survey is posted on the web and it can be accessible by an unlimited number of people. Filling out the survey becomes a non-time consuming task even for a busy person: the survey is on the web and it is accessible for anybody at any time; the submission of the completed survey requires only a ‘click on’ action executed by respondent, action possible  through an interactive survey implementation. 

Third, the computers might be located at different locations such as auto shows, dealerships, or retail locations. The biggest benefit is the collection of more relevant data since people present at those locations are most likely willing to answer correctly to the questions because they are interested in automobiles. The approach can be implemented such that the data is gathered from numerous computers  at different locations and stored in a unique and global database. As a fourth benefit, same survey format will be accessible to different categories of people: expert people (such as car designers) or people less familiar with auto domain characteristics. The large number of respondents and their diversity give more reliability on the results than small samples.

                                                      ...To be continued...