This paper was published through Galoá and has a deposited DOI. To cite this paper, use one of the standards below:
In case you are one of the co-authors and want to register this paper in your Lattes, use the following code: doi > 10.59254/sbpo-2025-212364
If you've NEVER registered a DOI in your Lattes, check our tutorial!Frequent pattern mining is essential for uncovering associations and trends within transactional data. However, traditional algorithms often face scalability issues when processing large datasets. The FP-Growth algorithm addresses this challenge by mining frequent itemsets and association rules efficiently, without relying on the computationally intensive candidate generation step. In this work, we develop a distributed application using Apache Spark and the FP-Growth algorithm
to process large-scale datasets. Using the parallelism of the MapReduce paradigm, our implementation achieved significant performance gains, including a speed-up factor of 9.5 and efficiency of up to 90% in certain configurations. These findings demonstrate that the FP-Growth algorithm, when deployed in a distributed environment, is a robust and scalable solution for big data analytics. Furthermore, the results emphasize the practical applicability of such distributed techniques in common Data Science tasks such as product recommendation, customer segmentation, and behavioral analysis.
With nearly 200,000 papers published, Galoá empowers scholars to share and discover cutting-edge research through our streamlined and accessible academic publishing platform.
Learn more about our products:
This proceedings is identified by a DOI , for use in citations or bibliographic references. Attention: this is not a DOI for the paper and as such cannot be used in Lattes to identify a particular work.
Check the link "How to cite" in the paper's page, to see how to properly cite the paper