Table of Contents
Companies are using data mining to become more competitive. Chen et al. (2015) define data mining as the process of discovering interesting knowledge, such as patterns and significant structures, from massive data in databases or other information repositories. With data mining, a company can understand unique patterns in customer data, which they can employ to devise strategies. However, the process of turning data into useful information has never been easy. Some challenges could pose a risk to the data mining process. Several companies have gone this route and either succeeded or failed to achieve an optimal result. This paper seeks to discuss industry standards for data mining best practices, practices to avoid, and an analysis of successful and failed data mining examples.
Industry Standards for Data Mining Best Practices
Using the CRISP-DM (Cross Industry Standard Process for Data Min-ing) model proposed by Rivo et al. (2012, there are generally five phases that set the industry standards for data mining and predictive analytics in various sectors. This framework is considered preferable for large projects as it makes them faster, cheaper, more reliable, and more manageable.
The first practice is to know the requirements of a project and the business. It is necessary to find out the assumptions for the situation. Finally, create an effective data mining plan to achieve the identified business and project goals.
This phase begins by collecting necessary data from available sources. It is essential to get familiar with the data while performing data load and integration processes. Then, the individual responsible would have to explore the data needs by solving data mining questions through querying, reporting, and visualization.
Once the data sources get identified, the next step is to select, clean, construct and format it in the desired form. During this phase, data exploration can reveal unique patterns.
The individual responsible has to select the best modeling techniques for the dataset. Then, the next phase would be to validate the quality and validity of the model, create more models while involving stakeholders and ensure all these models meet business initiatives.
This phase brings forth new business requirements as new patterns emerge. Here, the go or no-go decision gets made before proceeding to the deployment phase.
This phase involves presenting information gained in the data mining process so that stakeholders can use it whenever necessary. It guides in forming a business strategy and making business decisions time after time.
Pitfalls in Data Mining and Practices to Avoid
Organizations have had to employ modern techniques, such as data compression and duplication, to deal with the growing information volumes. Yet, this process remains a challenge even with modernized technology. According to Jadhav (2013), data managers have to discriminate on what to save and how long because technology may not offer the necessary capacity for a large amount of data.
There is a significant weakness in the security mechanism in cloud technology. Thus, it is predictable to get data tampered with within the public cloud. In cloud technology, an attacker can tamper with the data getting exchanged between servers or even cause a shutdown of the server itself (Jadhav, 2013). This issue can impact the implementation of big data analytics tools.
Communication cost is a concern in computer networking research and applications (Cai et al., 2017). The challenge for most companies is to reduce the communication cost while satisfying the additional storage for processing the data. The bandwidth and latency are two features that could impact the cost.
Baeza-Yates (2022) and Jadhav (2013) note that big data presents many ethical challenges. Companies using this process to aid productivity or introduce new business processes could incur costs of tracking employees’ every move and measuring their performance. Such monitoring may be against people’s rights even if for the company’s best interests. Companies can ensure consent from involved parties to minimize ethical issues (Hutton & Henderson, 2017).
Sampling data is a process that requires careful execution. If there is bias, it may carry over to the knowledge or decisions made if the data gets extrapolated to other areas. It is, therefore, necessary to test data extracted to ensure it is generalizable for valid conclusions.
This challenge compromises the data outcome in an insignificant manner. This challenge could lead to a less powerful model.
In data mining, scientists should avoid the following practices:
- Believing that a pattern seen in the data proves a cause-and-effect relationship.
- Stretching conclusions too far as this could result in bias.
- Overusing a particular modeling method even in situations it does not suit.
- Ignoring the negative results.
- Skipping data quality checks
A Company that has Successfully Practiced Data Mining
Amazon is an example of a company that has dug massive consumer data to present products and services that match their demand. The company analyzes and summarizes information about the customers, designs strategies for promotions, and enhances offerings (Zatari, 2015). It also uses data mining product marketing in various aspects for competitive advantage.
The Steps and Precautions to Ensure the Success of Data Mining Endeavor
One approach used by Amazon is basket analysis which predicts the customer behavior with past performance depending on their purchases and preferences (Zatari, 2015). The company can tell what customers could purchase next based on their previous purchases. According to Zatari, Amazon uses remote computing to offer customers unique promotional offers based on their purchase history. Overall, Amazon uses data mining to screen purchases and return requests for signs of fraud, encourage people to buy more with each order, and run the fulfillment centers.
How Amazon Kept Customer Data Safe
Amazon keeps customer data by adhering to several internationally recognized standards and protocols on data protection, privacy, and security. Examples include the Sarbanes-Oxley rules, ISO 27,000, SAS 70, and the Federal Information Security Management Act (FISMA). Another way is through the separation of powers whereby employees get access to customer systems only on an as-needed basis. It reduces the number of people who can make mistakes. The company also does various inspections to ensure the personnel performs as per the standards. Finally, it continuously updates its systems to prevent disruption during upgrades.
A Company that has Experienced a Failed Data Mining Experience
An example of a company that has experienced a failed data mining experience is Facebook, especially relating to the Cambridge Analytica scandal that involved up to 87 million Facebook users’ data (Isaak & Hanna, 2018). The company also allowed Netflix to read users’ private messages for targeted advertisements (Dance et al., 2018).
Facebook’s Pitfalls and what Could be Done Differently
The company lost a lot of users after the data scandals, which led to a decline in shareholder value, loss of goodwill, and massively-organized boycotts. Many lawsuits also ensued, forcing the CEO, Mr. Zuckerberg, to appear before the court. The company did not take stringent measures to secure user data, which led to ethical issues. The company should have collaborated with the firms and users to share necessary information only with the user’s consent. I wouldn’t ignore their concerns about their data as they have the right to privacy. I would have focused more on data security and ensuring users have an assurance of data safety.
Data mining is crucial because companies such as Amazon can use it to gain a competitive advantage over their rivals and increase revenue through better targeting. However, it is necessary to ensure standards are adhered to through appropriate measures and practices. Without these standards, data mining can fail and cause significant effects on companies, as seen in the Facebook case on the sharing of user data scandals. Companies will continue to invest in predictive analytics as it has shown massive potential in the past.
Baeza-Yates, R. (2022). Ethical challenges in AI. Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining. https://doi.org/10.1145/3488560.3498370
Cai, H., Xu, B., Jiang, L., & Vasilakos, A. V. (2017). IoT-based big data storage systems in cloud computing: Perspectives and challenges. IEEE Internet of Things Journal, 4(1), 75-87. https://doi.org/10.1109/jiot.2016.2619369
Chen, F., Deng, P., Wan, J., Zhang, D., Vasilakos, A. V., & Rong, X. (2015). Data mining for the Internet of things: Literature review and challenges. International Journal of Distributed Sensor Networks, 11(8), 431047. https://doi.org/10.1155/2015/431047
Dance, G. L., LaForgia, M., & Confessore, N. (2018). Confessore, N. (2018). As Facebook raised a privacy wall, it carved an opening for tech giants. The New York Times, 18.
Hutton, L., & Henderson, T. (2017). Beyond the EULA: Improving consent for data mining. Studies in Big Data, 147-167. https://doi.org/10.1007/978-3-319-54024-5_7
Isaak, J., & Hanna, M. J. (2018). User data privacy: Facebook, Cambridge Analytica, and privacy protection. Computer, 51(8), 56-59. https://doi.org/10.1109/mc.2018.3191268
Jadhav, D. K. (2013). Big data: the new challenges in data mining. International Journal of Innovative Research in Computer Science & Technology, 1(2), 39-42.
Rivo, E., de la Fuente, J., Rivo, Á., García-Fontán, E., Cañizares, M., & Gil, P. (2012). Cross-industry standard process for data mining is applicable to the lung cancer surgery domain, improving decision making as well as knowledge and quality management. Clinical and Translational Oncology, 14(1), 73-79. https://doi.org/10.1007/s12094-012-0764-8
Zatari, T. (2015). Data mining by Amazon. International Journal of Scientific & Engineering Research, 6(6).
Cite this article in APA
If you want to cite this source, you can copy and paste the citation below.
Editorial Team. (2023, May 6). The Data Mining Best Practices. Help Write An Essay. Retrieved from https://www.helpwriteanessay.com/essays/the-data-mining-best-practices/