Intro to Data Analytics/Business Data Mining

Homework 3 – Association Analysis

For this homework, please use the data set “BakerySales.csv” posted on Blackboard. The data set consists of line items of products purchased at a bakery. There are four attributes: • TransID – transaction ID • Product – item purchased • Amount – price per product • Type – specifies whether the product is either food or drink

Perform the following for this homework:

1. Import and load the data set “BakerySales.csv” into RapidMiner.

2. Take a screenshot of the data set.

3. Transform the data set into the appropriate format to run association analysis. Then, perform association analysis to generate some interesting rules. You will need to play around with the support and confidence values to get interesting rules. Take a screenshot of your final process stream.

4. Take a screenshot of your results (i.e., rules) after running association analysis. Interpret your results, and explain what support and confidence values you used. Why are these rules important, and how can you use these rules to improve sales for the bakery?

Submission Instructions: Please type up your homework using the homework template posted on Blackboard under Assignments. You should include at least three screenshots: (1) data set loaded in RapidMiner, (2) final process stream, and (3) resulting rules.


I forgot to mention one thing regarding Homework 3 in class last Wednesday. When transforming the data set to horizontal format, in the Pivot operator, you will have to click on the “Show advanced parameters” in the Parameters panel and uncheck “skip constant attributes”. The reason for this is that if you leave it checked, it will skip the attributes where the value never changes within a group so thus, rules will not be generated.

Some reminders about the homework: Remember to write a research question that you want answered using this data set and association rule analysis. Please also make sure that you not only interpret some of the interesting rules that are generated but also explain the support and confidence parameters you used in the “Interpretation of Results” section of the template. Also in the “Interpretation of Results”, please answer the question posed in the homework document of “Why are these rules important, and how can you use these rules to improve sales for the bakery?”

Also I attached 2 files

one in EXCEL for data

anther one for WORD

