FP-Growth on Spark for Big Data Pattern Mining: Applications in Data Science

- 325831
Complete Articles (CA)
Favorite this paper
How to cite this paper?
Abstract

Frequent pattern mining is essential for uncovering associations and trends within transactional data. However, traditional algorithms often face scalability issues when processing large datasets. The FP-Growth algorithm addresses this challenge by mining frequent itemsets and association rules efficiently, without relying on the computationally intensive candidate generation step. In this work, we develop a distributed application using Apache Spark and the FP-Growth algorithm
to process large-scale datasets. Using the parallelism of the MapReduce paradigm, our implementation achieved significant performance gains, including a speed-up factor of 9.5 and efficiency of up to 90% in certain configurations. These findings demonstrate that the FP-Growth algorithm, when deployed in a distributed environment, is a robust and scalable solution for big data analytics. Furthermore, the results emphasize the practical applicability of such distributed techniques in common Data Science tasks such as product recommendation, customer segmentation, and behavioral analysis.

Share your ideas or questions with the authors!

Did you know that the greatest stimulus in scientific and cultural development is curiosity? Leave your questions or suggestions to the author!

Sign in to interact

Have a question or suggestion? Share your feedback with the authors!

Institutions
  • 1 UTFPR
  • 2 Universidade Tecnológica Federal do Paraná
Track
  • 4. AS&DS- Data Science and Analytics
Keywords
Data Science
Big Data
Pattern Mining
Apache Spark