Enhancing Big Data Performance Through Graph Coloring-Based Locality of Reference


  • Methq Kadhum Alnoori Electrical Engineering Department, College of Engineering. Mustansiriyah University, Baghdad, Iraq Author https://orcid.org/0009-0003-3096-3000
  • Mohammad Malkawi Software Engineering Department, Information and Computing College, Jordan University of Science and Technology Irbid, Jordan Author https://orcid.org/0000-0003-0109-8196
  • Enas Rawashdeh Management information system, Amman College, AlBalqa Applied University, Amman, Jordan Author https://orcid.org/0009-0005-2986-9345




Big Data , Graph Representation, Locality of References, Performance, Synthetic Memory


Efficiency is a crucial factor when handling the retrieval and storage of data from vast amounts of records in a Big Data repository. These systems require a subset of data that can be accommodated within the combined physical memory of a cluster of servers. It becomes impractical to analyze all of the data if its size exceeds the available memory capacity. Retrieving data from virtual storage, primarily hard disks, is significantly slower compared to accessing data from main memory, resulting in increased access time and diminished performance. To address this, a proposed model aims to enhance performance by identifying the most suitable data locality structure within a big data set and reorganizing the data schema accordingly; by locality, it has been referred to as a particular access pattern. This allows transactions to be executed on data residing in the fastest memory layer, such as cache, main memory, or disk cache


S. Usman, R. Mehmood, I. Katib, and A. Albeshri, "Data Locality in High-Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System Architecture," Electronics, vol. 12, no. 1, p. 53, 2022. DOI: https://doi.org/10.20944/preprints202211.0161.v1

M. S. Mahdavinejad, M. Rezvan, M. Barekatain, P. Adibi, P. Barnaghi, and A. P. Sheth, "Machine learning for Internet of things data analysis: a survey," Digit. Commun. Networks, vol. 4, no. 3, pp. 161–175, 2018, doi: https://doi.org/10.1016/j.dcan.2017.10.002.

J. Yu, M. Ai, and Z. Ye, “A review on design inspired subsampling for big data,” Stat. Pap., pp. 1–44, 2023. DOI: https://doi.org/10.6339/21-JDS999

A. A. Hamad and M. J. Farhan, “A NEW MIMO SLOT ANTENNA FOR 5G APPLICATIONS,” J. Eng. Sustain. Dev., vol. 24, no. 6, pp. 33–41, 2020. DOI: https://doi.org/10.31272/jeasd.24.6.3

J. Ousterhout et al., “The case for RAMClouds: scalable high-performance storage entirely in DRAM,” ACM SIGOPS Oper. Syst. Rev., vol. 43, no. 4, pp. 92–105, 2010.DOI: https://doi.org/10.1145/1965724.1965751

A. H. Rashed and M. H. Hamd, "Robust detection and recognition system based on facial extraction and decision tree," J. Eng. Sustain. Dev., vol. 25, no. 4, pp. 40–50, 2021. DOI: https://doi.org/10.31272/jeasd.25.4.4

P. Vagata and K. Wilfong, “Scaling the Facebook data warehouse to 300 PB,” Faceb. Code, Faceb., vol. 10, 2014. https://code.facebook.com/posts/229861827208629/ scaling-the-Facebook-data-warehouse-to-300-pb/

H. Herodotou and S. Babu, “Profiling, what-if analysis, and cost-based optimization of MapReduce programs,” Proc. VLDB Endow., vol. 4, no. 11, pp. 1111–1122,2011. https://doi.org/doi.org/10.1145/2522968.2522979

S. I. M. Mosharraf and M. A. Adnan, “Hwang, Eunji, et al. ‘Exploring memory locality for big data analytics in virtualized clusters.’ Proceedings of the 2017 Symposium on Cloud Computing. 2017.,” J. Big Data, vol. 9, no. 1, pp. 1–30, 2022. DOI: https://doi.org/10.1109/CCGRID.2018.00017

Y. Lu, “Zhang et al., 2015.” Massachusetts Institute of Technology, 2017. https://doi.org/10.1007/s13351-017-7088-0

M. Kadhum, E. Rawashdeh, and M. Alshraideh, “An Efficient Bug Reports Assignment for IoT Application with Auto-Tuning Structure of ELM Using Dragonfly Optimizer,” J. Hunan Univ. Nat. Sci., vol. 48, no. 7, 2021.

E. F. Rawashdeh, I. Aljarah, and H. Faris, “A cooperative coevolutionary method for optimizing random weight networks and its application for medical classification problems,” J. Ambient Intell. Humaniz. Comput., vol. 12, pp. 321–342, 2021. DOI: https://doi.org/10.1007/s12652-020-01975-3

L. M. Abualigah, A. T. Khader, and M. A. Al-Betar, "Unsupervised feature selection technique based on genetic algorithm for improving the text clustering," in 2016 7th International Conference on computer science and information technology (CSIT), IEEE, 2016, pp. 1–6. DOI: https://doi.org/10.1109/CSIT.2016.7549453

M. S. Mahdavinejad, M. Rezvan, M. Barekatain, P. Adibi, P. Barnaghi, and A. P. Sheth, “Machine learning for Internet of Things data analysis: a survey. Dig. Commun. Netw.” Press, 2017. https://doi.org/10.1016/j.dcan.2017.10.002

M. Kadhum, S. Manaseer, and A. L. Abu Dalhoum, "Cloud-edge network data processing based on user requirements using modify mapreduce algorithm and machine learning techniques,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 12, 2019, doi: https://doi.org/10.14569/ijacsa.2019.0101242.

M. Kadhum, M. H. Qasem, A. Sleit, and A. Sharieh, Efficient MapReduce matrix multiplication with optimized mapper set, vol. 574. 2017. doi: https://doi.org/10.1007/978-3-319-57264-2_19.

J. Jin et al., “A data-locality-aware task scheduler for distributed social graph queries,” Futur. Gener. Comput. Syst., vol. 93, pp. 1010–1022, 2019. https://doi.org/10.1016/j.future.2018.04.086

R. Bunt and C. Williamson, Temporal and spatial locality: A time and a place for everything. na, 2003. https://doi.org/10.1145/301618.301668

Z. Sha, Z. Cai, F. Trahay, J. Liao, and D. Yin, “Unifying temporal and spatial locality for cache management inside SSDs,” in 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, 2022, pp. 891–896. doi: https://doi.org/10.23919/DATE54114.2022.9774532

M. Y. Özkaya, A. Benoit, and Ü. V Çatalyürek, “Improving locality-aware scheduling with acyclic directed graph partitioning,” in Parallel Processing and Applied Mathematics: 13th International Conference, PPAM 2019, Bialystok, Poland, September 8–11, 2019, Revised Selected Papers, Part I 13, Springer, 2020, pp. 211–223. https://doi.org/10.3390/s18061676

H. Zhang, G. Chen, B. C. Ooi, K.-L. Tan, and M. Zhang, “In-memory big data management and processing: A survey,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 7, pp. 1920–1948, 2015. DOI: https://doi.org/10.1109/TKDE.2015.2427795

H. S. Stone, “A logic-in-memory computer,” IEEE Trans. Comput., vol. 100, no. 1, pp. 73–78, 1970. DOI: https://doi.org/10.1109/TC.1970.5008902

“(PDF) Performability analysis of wireless cellular networks.” Accessed: Feb. 18, 2022. [Online]. Available: https://www.researchgate.net/publication/255821609_Performability_analysis_of_wireless_cellular_networks. DOI: https://doi.org/10.1002/dac.605

D. Patterson et al., “A case for intelligent RAM,” IEEE micro, vol. 17, no. 2, pp. 34–44, 1997. DOI: https://doi.org/10.1109/40.592312

J. Lin, W. Yu, N. Zhang, X. Yang, H. Zhang, and W. Zhao, “A Survey on Internet of Things: Architecture, Enabling Technologies, Security and Privacy, and Applications,” IEEE Internet Things J., vol. 4, no. 5, pp. 1125–1142, 2017, doi: https://doi.org/10.1109/JIOT.2017.2683200.

V. Seshadri, “Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization”. https://doi.org/10.1145/2540708.2540725

V. Seshadri et al., “Gather-scatter DRAM: In-DRAM address translation to improve the spatial locality of non-unit strided accesses,” in Proceedings of the 48th International Symposium on Microarchitecture, 2015, pp. 267–280. https://doi.org/10.1145/2830772.2830820

J. DeBrabant, A. Pavlo, S. Tu, M. Stonebraker, and S. Zdonik, “Anti-caching: A new approach to database management system architecture,” Proc. VLDB Endow., vol. 6, no. 14, pp. 1942–1953, 2013. https://doi.org/10.14778/2556549.2556575

R. Stoica and A. Ailamaki, “Enabling efficient OS paging for main-memory OLTP databases,” in Proceedings of the Ninth International Workshop on Data Management on New Hardware, 2013, pp. 1–7. https://doi.org/10.1145/2485278.2485285

J. J. Levandoski, P.-Å. Larson, and R. Stoica, “Identifying hot and cold data in main-memory databases,” in 2013 IEEE 29th International Conference on Data Engineering (ICDE), IEEE, 2013, pp. 26–37. doi: https://doi.org/10.1109/ICDE.2013.6544811

A. W. J. Lu, A study of an in-memory database system for real-time analytics on semi-structured data streams. University of Toronto (Canada), 2015.

J. L. Hennessy and D. A. Patterson, Computer architecture: a quantitative approach. Elsevier, 2011. ISBN: 9780123838735

S. Manegold, “Memory Locality BT- Encyclopedia of Database Systems,” L. Liu and M. T. Özsu, Eds., New York, NY: Springer New York, 2016, pp. 1–2. doi: 10.1007/978-1-4899-7993-3_686-2. https://doi.org/10.1145/3357526.3357571

E. N. Rush, B. Harris, N. Altiparmak, and A. Ş. Tosun, “Dynamic data layout optimization for high performance parallel i/o,” in 2016 IEEE 23rd International Conference on High Performance Computing (HiPC), IEEE, 2016, pp. 132–141. doi: https://doi.org/10.1109/ISPASS48437.2020.00025

M. Kaufmann, “Storing and processing temporal data in main memory column stores.” ETH Zurich, 2014. https://doi.org/doi.org/10.14778/2536274.2536333

P. Hu, S. Dhelim, H. Ning, and T. Qiu, “Survey on fog computing: architecture, key technologies, applications and open issues,” Journal of Network and Computer Applications, vol. 98. pp. 27–42, 2017. doi: https://doi.org/10.1016/j.jnca.2017.09.002.

P. J. Denning and J. P. Buzen, “The operational analysis of queueing network models,” ACM Comput. Surv., vol. 10, no. 3, pp. 225–261, 1978. https://doi.org/10.1145/356733.356735


Key Dates







Published Online First




How to Cite

Enhancing Big Data Performance Through Graph Coloring-Based Locality of Reference. (2024). Journal of Engineering and Sustainable Development, 28(4), 467-472. https://doi.org/10.31272/jeasd.28.4.5

Similar Articles

1-10 of 559

You may also start an advanced similarity search for this article.