521
Views
0
CrossRef citations to date
0
Altmetric
Original Research Article

A mediation system for continuous spatial queries on a unified schema using Apache Spark

ORCID Icon, ORCID Icon, ORCID Icon & ORCID Icon
Pages 115-141 | Received 29 Aug 2022, Accepted 23 Oct 2023, Published online: 09 Nov 2023

References

  • Alam, M. M., Ray, S., & Bhavsar, V. C. (2018, November). A performance study of big spatial data systems. In Proceedings of the 7th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data (pp. 1–9).
  • Alam, M. M., Torgo, L., & Bifet, A. (2021). A survey on Spatio-temporal data analytics systems. arXiv E-Prints, arXiv–2103.
  • Al Jawarneh, I. M., Bellavista, P., Corradi, A., Foschini, L., & Montanari, R. (2021, October). Efficiently integrating mobility and Environment data for Climate change analytics. In 2021 IEEE 26th International Workshop on Computer Aided Modeling and Design of Communication Links and Networks (CAMAD) (pp. 1–5). IEEE.
  • Armbrust, M., Das, T., Torres, J., Yavuz, B., Zhu, S., Xin, R. & Zaharia, M. (2018, May). Structured streaming: A declarative api for real-time applications in apache spark. In Proceedings of the 2018 International Conference on Management of Data (pp. 601–613).
  • Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K. & Zaharia, M. (2015, May). Spark sql: Relational data processing in spark. In Proceedings of the 2015 ACM SIGMOD international conference on management of data (pp. 1383–1394).
  • Baig, F., Vo, H., Kurc, T., Saltz, J., & Wang, F. (2017, November). Sparkgis: Resource aware efficient in-memory spatial query processing. In Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 1–10).
  • Boucelma, O., Garinet, J. Y., & Lacroix, Z. (2003, November). The virGIS WFS-based spatial mediation system. In Proceedings of the Twelfth International Conference on Information and Knowledge Management (pp. 370–374).
  • Carbone, P., Katsifodimos, A., Ewen, S., Markl, V., Haridi, S., & Tzoumas, K. (2015). Apache flink: Stream and batch processing in a single engine. The Bulletin of the Technical Committee on Data Engineering, 38(4).
  • Chen, Y., Lu, Y., Fang, K., Wang, Q., & Shu, J. (2020). uTree: A persistent B±tree with low tail latency. Proceedings of the VLDB Endowment, 13(12), 2634–2648. https://doi.org/10.14778/3407790.3407850
  • Chintapalli, S., Dagit, D., Evans, B., Farivar, R., Graves, T., Holderbaugh, M. & Poulosky, P. (2016, May). Benchmarking streaming computation engines: Storm, flink and spark streaming. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (pp. 1789–1792). IEEE.
  • Clementini, E., & DiFelice, P. (1996). A model for representing topological relationships between complex geometric features in spatial databases. Information Sciences, 90(1–4), 121–136. https://doi.org/10.1016/0020-0255(95)00289-8
  • Dong, X. L., Halevy, A., & Yu, C. (2009). Data integration with uncertainty. The VLDB Journal, 18(2), 469–500. https://doi.org/10.1007/s00778-008-0119-9
  • Elastic. 2023. Logstash. Retrieved April , 2023. https://www.elastic.co/logstash/.
  • Eldawy, A., & Mokbel, M. F. (2015, April). Spatialhadoop: A mapreduce framework for spatial data. In 2015 IEEE 31st International Conference on Data Engineering (pp. 1352–1363). IEEE.
  • Finkel, R. A., & Bentley, J. L. (1974). Quad trees a data structure for retrieval on composite keys. Acta Informatica, 4(1), 1–9. https://doi.org/10.1007/BF00288933
  • Gonzalez, J. E., Xin, R. S., Dave, A., Crankshaw, D., Franklin, M. J., & Stoica, I. (2014). {graphx}: Graph processing in a distributed dataflow framework. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14) (pp. 599–613).
  • Grégoire, D. (2015). France Geojson. https://github.com/gregoiredavid/france-geojson.
  • Guttman, A. (1984, June). R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD International Conference on Management of Data (pp. 47–57).
  • Halevy, A. Y. (2001). Answering queries using views: A survey. The VLDB Journal, 10(4), 270–294. https://doi.org/10.1007/s007780100054
  • Inoubli, W., Aridhi, S., Mezni, H., Maddouri, M., & Nguifo, E. M. (2018, August). A comparative study on streaming frameworks for big data. In VLDB 2018-44th International Conference on Very Large Data Bases: Workshop LADaS-Latin American Data Science (pp. 1–8).
  • Karimov, J., Rabl, T., Katsifodimos, A., Samarev, R., Heiskanen, H., & Markl, V. (2018, April). Benchmarking distributed stream data processing systems. In 2018 IEEE 34th international conference on data engineering (ICDE) (pp. 1507–1518). IEEE.
  • Kreps, J., Narkhede, N., & Rao, J. (2011, June). Kafka: A distributed messaging system for log processing. Proceedings of the NetDb, 11(2011), 1–7.
  • Kwon, Y., Balazinska, M., & Greenberg, A. (2008). Fault-tolerant stream processing using a distributed, replicated file system. Proceedings of the VLDB Endowment, 1(1), 574–585. https://doi.org/10.14778/1453856.1453920
  • Lee, J. G., & Kang, M. (2015). Geospatial big data: Challenges and opportunities. Big Data Research, 2(2), 74–81. https://doi.org/10.1016/j.bdr.2015.01.003
  • Lenzerini, M. (2002, June). Data integration: A theoretical perspective. In Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (pp. 233–246).
  • Mahmood, A. R., Aly, A. M., Qadah, T., Rezig, E. K., Daghistani, A., Madkour, A., Abdelhamid, A. S., Hassan, M. S., Aref, W. G., & Basalamah, S. (2015). Tornado: A distributed spatio-textual stream processing system. Proceedings of the VLDB Endowment, 8(12), 2020–2023. https://doi.org/10.14778/2824032.2824126
  • Meng, X., Bradley, J., Yavuz, B., Sparks, E., Venkataraman, S., Liu, D. & Talwalkar, A. (2016). Mllib: Machine learning in apache spark. The Journal of Machine Learning Research, 17(1), 1235–1241.
  • OGC. 2023. Open Geospatial Consortium. Retrieved April , 2023. https://www.ogc.org/.
  • Pandey, V., Kipf, A., Neumann, T., & Kemper, A. (2018). How good are modern spatial analytics systems? Proceedings of the VLDB Endowment, 11(11), 1661–1673. https://doi.org/10.14778/3236187.3236213
  • Robert, H. (2003). Spatial data analysis theory and practice. Journal of Women S Health.
  • Sedona, Apache. 2022. Apache Sedona. Retrieved November , 2022. https://sedona.apache.org/.
  • Shaikh, S. A., Mariam, K., Kitagawa, H., & Kim, K. S. (2020, October). GeoFlink: A distributed and scalable framework for the real-time processing of spatial streams. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management (pp. 3149–3156).
  • Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010, May). The hadoop distributed file system. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (pp. 1–10). IEEE.
  • Storm, Apache. 2014. ApacheStorm. Retrieved October , 2022. https://storm.apache.org/.
  • Stripelis, D., Anastasiou, C., & Ambite, J. L. (2018, June). Extending apache spark with a mediation layer. In Proceedings of the International Workshop on Semantic Big Data (pp. 1–6).
  • Tang, M., Yu, Y., Mahmood, A. R., Malluhi, Q. M., Ouzzani, M., & Aref, W. G. (2020). Locationspark: In-memory distributed spatial query processing and optimization. Frontiers in Big Data, 3, 30. https://doi.org/10.3389/fdata.2020.00030
  • Tantalaki, N., Souravlas, S., & Roumeliotis, M. (2020). A review on big data real-time stream processing and its scheduling techniques. International Journal of Parallel, Emergent and Distributed Systems, 35(5), 571–601. https://doi.org/10.1080/17445760.2019.1585848
  • Tatbul, N. (2010, March). Streaming data integration: Challenges and opportunities. In 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) (pp. 155–158). IEEE.
  • Toshniwal, A., Taneja, S., Shukla, A., Ramasamy, K., Patel, J. M., Kulkarni, S. & Ryaboy, D. (2014, June). Storm@ twitter. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (pp. 147–156).
  • Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R. & Baldeschwieler, E. (2013, October). Apache hadoop yarn: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing (pp. 1–16).
  • Wiederhold, G. (1992). Mediators in the architecture of future information systems. Computer, 25(3), 38–49. https://doi.org/10.1109/2.121508
  • Wood, J. (2008). Filter and Refine Strategy. In Encyclopedia of GIS. Springer US.
  • Xie, D., Li, F., Yao, B., Li, G., Zhou, L., & Guo, M. (2016, June). Simba: Efficient in-memory spatial analytics. In Proceedings of the 2016 International Conference on Management of Data (pp. 1071–1085).
  • You, S., Zhang, J., & Gruenwald, L. (2015, April). Large-scale spatial join query processing in cloud. In 2015 31st IEEE International Conference on Data Engineering Workshops (pp. 34–41). IEEE.
  • Yu, J., Wu, J., & Sarwat, M. (2015, November). Geospark: A cluster computing framework for processing large-scale spatial data. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 1–4).
  • Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M. & Stoica, I. (2012). Resilient distributed datasets: A {Fault-Tolerant} abstraction for {In-memory} cluster computing. In 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI 12) (pp. 15–28).