In this course you will be working under Blackwell's Chief Technology Officer Danielle Sherman, as a member of the Blackwell Electronics eCommerce Team. Blackwell Electronics has been a successful consumer electronics retailer in the southeastern United States for over 40 years. Last year, the company launched an eCommerce website. Your job is to use data mining and machine-learning techniques to investigate the patterns in customer sales data and provide insight into customer buying trends and preferences. The inferences you draw from the patterns in the data will help the business make data-driven decisions about sales and marketing activities.

First you will install the open source WEKA machine learning package and use it to understand the relationship between customer demographics and purchasing behavior. Next you will use feature selection techniques in WEKA to determine which add-on product a customer will be likely to buy. Finally, you will present to management, explaining your insights and suggestions for data mining process improvements.

What is data mining?

Data mining is the application of computational methods, including machine learning, for pattern recognition to data sets in order to extract information from the data that can then be transformed or interpreted for use in decision-making or additional analyses.

Beyond the technical description of the term, data mining can be intuitively understood as a set of tools and methods for discovering and characterizing the information that best explains some phenomenon or best answers a question, given data about past experiences.

Data mining is an exceptionally versatile tool that can be applied to virtually any situation where a pattern can be expected to exist and historical data exists. From shopping habits of consumers to biotechnology research to weather analysis, data mining can reveal new information almost anywhere.

One of the features of data mining that makes it so powerful is its ability to very rapidly discover complex patterns in huge amounts of data. Frequently, the patterns discovered through data mining would not have been discovered through conventional human analysis alone, or would have taken an intractable amount of time.

However, data mining, like any tool, has to be used appropriately to generate good results. Insufficient data, improper analysis, or improper validation can all lead to false or misleading results. It is vital to follow the best practices of data mining to obtain robust, valid results.

At the heart of what makes data mining so valuable is its ability to find hidden patterns as well as dispel myths. Good data mining can tell you what really works, based on evidence, and what is most likely to work in the future. It can act as an organization's virtual crystal ball when utilized properly.

Listen to Professor Ravi Starzl talk about the importance of data mining to modern business:

Click to play


And give an example of how it is transforming our daily lives, e.g., by improving heathcare:

Click to play

Are the skills I'll learn applicable to data mining in general?

Yes! The tools and methods that you will use in this course have wide applicability to the data mining tasks you will encounter in nearly all business sectors and other real-world applications. The skills practiced in this course represent current professional practice and include:

  • Using data mining tools to investigate patterns in complex data sets
  • Preprocessing data for data mining (e.g., transforming numeric values to nominal values, discretizing data)
  • Using decision tree classifiers to investigate classification and regression problems
  • Applying cross-validation methods
  • Interpreting and drawing inferences from the results of data mining
  • Assessing the predictive performance of classifiers by examining key error metrics
  • Identifying where learning methods fail and gain insight into why with error analysis
  • Drawing relationships between learner performance and measured features to help understand model performance
  • Conducting feature selection to investigate the correlation between different features in a dataset
  • Presenting data mining results to management
Items to Purchase

Please purchase the following book:

  • Witten, I. H., Frank, E., and Hall, M. A., (2011). Data Mining: Practical Machine Learning Tools and Techniques (3rd Edition). Burlington, MA: Elsevier/Morgan Kaufmann
Next Step:
Go to your tasks
Data Analytics: Understanding Customers