FP-Growth on Spark for Big Data Pattern Mining: Applications in Data Science

RISTA, Luís Cassiano Goularte; BARBOSA, Marco; TEIXEIRA, Marcelo; FONSECA, Mauro

doi:10.59254/sbpo-2025-212364

FP-Growth on Spark for Big Data Pattern Mining: Applications in Data Science

- 325831

Complete Articles (CA)

Download

How to cite this paper?

Abstract

Frequent pattern mining is essential for uncovering associations and trends within transactional data. However, traditional algorithms often face scalability issues when processing large datasets. The FP-Growth algorithm addresses this challenge by mining frequent itemsets and association rules efficiently, without relying on the computationally intensive candidate generation step. In this work, we develop a distributed application using Apache Spark and the FP-Growth algorithm
to process large-scale datasets. Using the parallelism of the MapReduce paradigm, our implementation achieved significant performance gains, including a speed-up factor of 9.5 and efficiency of up to 90% in certain configurations. These findings demonstrate that the FP-Growth algorithm, when deployed in a distributed environment, is a robust and scalable solution for big data analytics. Furthermore, the results emphasize the practical applicability of such distributed techniques in common Data Science tasks such as product recommendation, customer segmentation, and behavioral analysis.

Programme

13:30 to 15:10 on 10/06/2025

Orquídea

Institutions

¹ UTFPR
² Universidade Tecnológica Federal do Paraná

Track

4. AS&DS- Data Science and Analytics

Keywords

Data Science

Big Data

Pattern Mining

Apache Spark

SBPO 2025

Anais do Simpósio Brasileiro de Pesquisa Operacional
Book of abstracts of the LVII Brazilian Symposium on Operations Research

FP-Growth on Spark for Big Data Pattern Mining: Applications in Data Science

How to cite this paper?

Share your ideas or questions with the authors!

Streamline your Scholarly Event