Обзор методов динамической компиляции запросов

Автор: Шарыгин Е.Ю., Бучацкий Р.А.

Журнал: Труды Института системного программирования РАН @trudy-isp-ran

Статья в выпуске: 3 т.29, 2017 года.

Бесплатный доступ

Эффективное использование процессора является решающим фактором производительности аналитических систем, особенно с увеличением размеров обрабатываемых данных. В то же время возрастающие объёмы доступной основной памяти позволяют значительно сократить количество обращений к медленным дисковым хранилищам и тем самым отводят традиционные для большинства систем обработки данных оптимизации подсистемы ввода-вывода на второй план. Одним из наиболее эффективных способов повышения эффективности использования процессора и сокращения накладных расходов, прежде всего проявляющихся в затратах на интерпретацию планов запросов, является компиляция запросов в исполняемый код во время выполнения (динамическая компиляция). В последнее время наблюдается рост интереса к методам динамической компиляции запросов как в академических, так и в прикладных разработках. Данная статья является обзором литературы в области динамической компиляции запросов, в основном для реляционных СУБД. Представлены работы последних лет, описаны архитектурные особенности методов, сделана классификация работ, приведены основные результаты.

Еще

Динамическая компиляция, jit-компиляция, языки запросов, push-модель, специализация кода

Короткий адрес: https://sciup.org/14916435

IDR: 14916435   |   DOI: 10.15514/ISPRAS-2017-29(3)-11

Список литературы Обзор методов динамической компиляции запросов

  • Kuztetsov, S. Foundations of Modern Database Systems. http://citforum.ru/database/osbd/contents.shtml (in Russian, accessed 18.05.2017).
  • Chamberlin, D.D., Astrahan, M.M., et al. 1981. A history and evaluation of System R. Commun. ACM. 24, 10 (1981), 632-646.
  • Wade, B.W. 2012. Compiling SQL into System/370 machine language. IEEE Annals of the History of Computing. 34, 4 (2012), 49-50.
  • Greer, R. 1999. Daytona and the fourth-generation language Cymbal. SIGMOD 1999, proceedings ACM SIGMOD international conference on management of data (Philadelphia, Pennsylvania, USA, 1999), 525-526.
  • Copeland, G.P., Khoshafian, S. 1985. A decomposition storage model. Proceedings of the 1985 ACM SIGMOD international conference on management of data (Austin, Texas, USA, 1985), 268-279.
  • Sharygin E.Y., Buchatskiy R.A., Skvortsov L.V., Zhuykov R.A., Melnik D.M. Dynamic compilation of expressions in SQL queries for PostgreSQL. Trudy ISP RAN/Proc. ISP RAS, vol. 28, issue 4, 2016. pp. 217-240 DOI: 10.15514/ISPRAS-2016-28(4)-13
  • Kornacker, M., Behm, A., et al. 2015. Impala: A modern, open-source SQL engine for Hadoop. CIDR 2015, seventh biennial conference on innovative data systems research (Asilomar, CA, USA, 2015).
  • Wanderman-Milne, S., Li, N. 2014. Runtime code generation in Cloudera Impala. IEEE Data Eng. Bull. 37, 1 (2014), 31-37.
  • Apache Hadoop, open-source software for reliable, scalable, distributed computing. The Apache Software Foundation; http://hadoop.apache.org (accessed 19.06.2017).
  • Apache HBase, the Hadoop database, a distributed, scalable, big data store. The Apache Software Foundation; https://hbase.apache.org (accessed 19.06.2017).
  • Lattner, C., Adve, V.S. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. 2nd IEEE/ACM international symposium on code generation and optimization (CGO 2004) (San Jose, CA, USA, 2004), 75-88.
  • TPC-H, an ad-hoc, decision support benchmark. Transaction Processing Performance Council; http://www.tpc.org/tpch (accessed 25.05.2017).
  • Apache Spark, a fast and general engine for large-scale data processing. The Apache Software Foundation; https://spark.apache.org (accessed 19.06.2017).
  • Armbrust, M., Xin, R.S., et al. 2015. Spark SQL: Relational data processing in Spark. Proceedings of the 2015 ACM SIGMOD international conference on management of data (Melbourne, Victoria, Australia, 2015), 1383-1394.
  • PostgreSQL, an open source object-relational database system. The PostgreSQL Global Development Group; https://www.postgresql.org (accessed 16.06.2017).
  • PostgreSQL derived databases. PostgreSQL wiki; https://wiki.postgresql.org/wiki/PostgreSQL_derived_databases (accessed 20.06.2017).
  • ToroDB Stampede, a database bridging NoSQL and SQL. 8Kdata; https://www.torodb.com (accessed 19.06.2017).
  • Vertica, a "shared nothing" distributed analytical database. Hewlett Packard Enterprise Development; https://www.vertica.com (accessed 19.06.2017).
  • AgensGraph, a highly optimized, multi-model graph database for the modern, complex connected data environment. Bitnine Global; http://www.agensgraph.com (accessed 19.06.2017).
  • Tan, C. 2015. Vitesse DB: 100% Postgres, 100X faster for analytics. Presented at the 2nd South Bay PostgreSQL Meetup; https://docs.google.com/presentation/d/1R0po7_Wa9fym5U9Y5qHXGlUi77nSda2LlZXPuAxtd-M/pub (accessed 20.06.2017).
  • ParAccel 2010. The ParAccel analytic database: A technical overview. ParAccel, Inc. https://marketplace.informatica.com/mpresources/docs/ParAccel-Technical-Overview-White-Paper%202011.pdf (accessed 20.06.2017).
  • Gupta, A., Agarwal, D., et al. 2015. Amazon Redshift and the case for simpler data warehouses. Proceedings of the 2015 ACM SIGMOD international conference on management of data (Melbourne, Victoria, Australia, 2015), 1917-1923.
  • Armenatzoglou, N., Rajaraman, K.J., et al. 2016. Improving query execution speed via code generation. Pivotal Engineering Journal; http://engineering.pivotal.io/post/codegen-gpdb-qx (accessed 20.06.2017). (2016).
  • DeepgreenDB, a scalable MPP data warehouse solution derived from the open source Greenplum database project. Vitesse Data; http://vitessedata.com/deepgreen-db (accessed 19.06.2017).
  • Zhang, R., Debray, S., Snodgrass, R.T. 2012. Micro-specialization: dynamic code specialization of database management systems. 10th annual IEEE/ACM international symposium on code generation and optimization, CGO 2012 (San Jose, CA, USA, 2012), 63-73.
  • Zhang, R., Snodgrass, R.T., Debray, S. 2012. Micro-specialization in DBMSes. IEEE 28th international conference on data engineering (ICDE 2012) (Washington, DC, USA (Arlington, Virginia), 2012), 690-701.
  • Zhang, R., Snodgrass, R.T., Debray, S. 2012. Application of micro-specialization to query evaluation operators. Workshops proceedings of the IEEE 28th international conference on data engineering, ICDE 2012 (Arlington, VA, USA, 2012), 315-321.
  • Callgrind: A call-graph generating cache and branch prediction profiler. Valgrind Developers; http://valgrind.org/docs/manual/cl-manual.html (accessed 8.06.2017).
  • TPC-C, an on-line transaction processing benchmark. Transaction Processing Performance Council; http://www.tpc.org/tpcc (accessed 8.06.2017).
  • Butterstein, D., Grust, T. 2016. Precision performance surgery for PostgreSQL: LLVM-based expression compilation, just in time. PVLDB. 9, 13 (2016), 1517-1520.
  • Clang: A C language family frontend for LLVM. The LLVM Foundation; https://clang.llvm.org/(accessed 1.06.2017).
  • Graefe, G. 1994. Volcano -an extensible and parallel query evaluation system. IEEE Trans. Knowl. Data Eng. 6, 1 (1994), 120-135.
  • Neumann, T. 2011. Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow. 4, 9 (2011), 539-550.
  • Rao, J., Pirahesh, H., Mohan, C., Lohman, G. 2006. Compiled query execution engine using jVM. Proceedings of the 22Nd international conference on data engineering (Washington, DC, USA, 2006), 23.
  • Java Emitter Templates, part of Eclipse Modeling Framework. Eclipse Foundation; http://www.eclipse.org/modeling/m2t/?project=jet (accessed 7.06.2017).
  • DB2, a relational database. IBM Corporation; https://www.ibm.com/analytics/us/en/technology/db2 (accessed 21.06.2017).
  • Ahmad, Y., Koch, C. 2009. DBToaster: A SQL compiler for high-performance delta processing in main-memory databases. PVLDB. 2, 2 (2009), 1566-1569.
  • Box, D., Hejlsberg, A. 2007. LINQ:.NET language-integrated query. Microsoft Developer Network; https://msdn.microsoft.com/en-us/library/bb308959.aspx (accessed 8.06.2017). (2007).
  • Yu, Y., Isard, M., et al. 2008. DryadLINQ: A system for general-purpose distributed data-parallel computing using a high-level language. 8th USENIX symposium on operating systems design and implementation, OSDI 2008, proceedings (San Diego, California, USA, 2008), 1-14.
  • Dryad data-parallel processing framework. Microsoft; https://www.microsoft.com/en-us/research/project/dryad (accessed 19.06.2017).
  • Duffy, J. 2007. A query language for data parallel programming: Invited talk. Proceedings of the 2007 workshop on declarative aspects of multicore programming (New York, NY, USA, 2007), 50.
  • Murray, D.G., Isard, M., Yu, Y. 2011. Steno: Automatic optimization of declarative queries. Proceedings of the 32Nd aCM sIGPLAN conference on programming language design and implementation (New York, NY, USA, 2011), 121-131.
  • Krikellas, K., Viglas, S., Cintra, M. 2010. Generating code for holistic query evaluation. Proceedings of the 26th international conference on data engineering, ICDE 2010 (Long Beach, California, USA, 2010), 613-624.
  • MonetDB, an open source column-oriented database. MonetDB B.V. https://www.monetdb.org (accessed 21.06.2017).
  • Neumann, T., Leis, V. 2014. Compiling database queries into machine code. IEEE Data Eng. Bull. 37, 1 (2014), 3-11.
  • Actian Vector (former VectorWise), a relational vectorized columnar analytic database. Actian Corporation; https://www.actian.com/analytic-database/vector-smp-analytic-database (accessed 19.06.2017).
  • Diaconu, C., Freedman, C., et al. 2013. Hekaton: SQL server’s memory-optimized OLTP engine. Proceedings of the ACM SIGMOD international conference on management of data, SIGMOD 2013 (New York, NY, USA, 2013), 1243-1254.
  • Freedman, C., Ismert, E., Larson, P. 2014. Compilation in the Microsoft SQL Server Hekaton engine. IEEE Data Eng. Bull. 37, 1 (2014), 22-30.
  • SQLServer, a relational database. Microsoft; https://www.microsoft.com/en-us/sql-server (accessed 19.06.2017).
  • Paroski, D. 2016. Code generation: The inner sanctum of database performance. High Scalability; http://highscalability.com/blog/2016/9/7/code-generation-the-inner-sanctum-of-database-performance.html (accessed 19.06.2017). (2016).
  • Buchatskiy R.A., Sharygin E.Y., Skvortsov L.V., Zhuykov R.A., Melnik D.M., Baev R.V. Dynamic compilation of SQL queries for PostgreSQL. Trudy ISP RAN/Proc. ISP RAS, vol. 28, issue 6, 2016, pp. 37-48 DOI: 10.15514/ISPRAS-2016-28(6)-3
  • Melnik, D., Buchatskiy, R., Zhuykov, R., Sharygin, E. 2017. JIT-compiling SQL queries in PostgreSQL using LLVM. Presented at PGCon 2017; https://www.pgcon.org/2017/schedule/attachments/467_PGCon%202017-05-26%2015-00%20ISPRAS%20Dynamic%20Compilation%20of%20SQL%20Queries%20in%20PostgreSQL%20Using%20LLVM%20JIT.pdf (accessed 19.06.2017).
  • Dashti, M., Abadi, R. 2013. Database query optimization using compilation techniques. (2013).
  • Rompf, T., Odersky, M. 2010. Lightweight modular staging: A pragmatic approach to runtime code generation and compiled DSLs. Generative programming and component engineering, proceedings of the ninth international conference on generative programming and component engineering, GPCE 2010 (Eindhoven, The Netherlands, 2010), 127-136.
  • Klonatos, Y., Koch, C., Rompf, T., Chafi, H. 2014. Building efficient query engines in a high-level language. PVLDB. 7, 10 (2014), 853-864.
  • GLib, a general-purpose utility library. The GNOME Foundation; https://developer.gnome.org/glib/(accessed 2.06.2017).
  • Hänsch, C., Kissinger, T., Habich, D., Lehner, W. 2015. Plan operator specialization using reflective compiler techniques. Datenbanksysteme für business, technologie und web (BTW), 16. Fachtagung des GI-fachbereichs «datenbanken und informationssysteme» (DBIS), 4.-6.3.2015, proceedings (Hamburg, Germany, 2015), 363-382.
  • Dexter: Dresden index for transactional access on emerging technologies. Dresden Database Systems Group; http://wwwdb.inf.tu-dresden.de/research-projects/projects/dexter/(accessed 17.01.2013: https://web.archive.org/web/20130117105107/http://wwwdb.inf.tu-dresden.de/research-projects/projects/dexter/).
  • O’Neil, P., O’Neil, B., Chen, X. 2009. Star Schema Benchmark. (2009).
  • Tahboub, R.Y., Rompf, T. 2016. On supporting compilation in spatial query engines: (Vision paper). Proceedings of the 24th ACM SIGSPATIAL international conference on advances in geographic information systems, GIS 2016 (Burlingame, California, USA, 2016), 9:1-9:4.
  • Rompf, T. LB2, a fork of LegoBase. https://github.com/TiarkRompf/legobase-micro (accessed 16.06.2017).
  • PostGIS, a spatial database extender for PostgreSQL. PostGIS Project Steering Committee; http://postgis.net (accessed 21.06.2017).
  • Essertel, G.M., Tahboub, R.Y., Decker, J.M., Brown, K.J., Olukotun, K., Rompf, T. 2017. Flare: Native compilation for heterogeneous workloads in Apache Spark. CoRR. abs/1703.08219, (2017).
  • OpenMP, an API specification for parallel programming. OpenMP Architecture Review Board; http://www.openmp.org (accessed 19.06.2017).
  • Sujeeth, A.K., Rompf, T., et al. 2013. Composition and reuse with compiled domain-specific languages. ECOOP 2013 -object-oriented programming -27th european conference, proceedings (Montpellier, France, 2013), 52-78.
  • Sujeeth, A.K., Lee, H., et al. 2011. OptiML: An implicitly parallel domain-specific language for machine learning. Proceedings of the 28th international conference on machine learning, ICML 2011 (Bellevue, Washington, USA, 2011), 609-616.
  • Brown, K.J., Sujeeth, A.K., et al. 2011. A heterogeneous parallel framework for domain-specific languages. 2011 international conference on parallel architectures and compilation techniques, PACT 2011 (Galveston, TX, USA, 2011), 89-100.
  • Brown, K.J., Lee, H., et al. 2016. Have abstraction and eat performance, too: Optimized heterogeneous computing with parallel patterns. Proceedings of the 2016 international symposium on code generation and optimization, CGO 2016 (Barcelona, Spain, 2016), 194-205.
  • Würthinger, T., Wöß, A., Stadler, L., Duboscq, G., Simon, D., Wimmer, C. 2012. Self-optimizing AST interpreters. Proceedings of the 8th symposium on dynamic languages, DLS ’12 (Tucson, AZ, USA, 2012), 73-82.
  • Würthinger, T., Wimmer, C., et al. 2013. One VM to rule them all. ACM symposium on new ideas in programming and reflections on software, onward! 2013, part of SPLASH ’13 (Indianapolis, IN, USA, 2013), 187-204.
Еще
Статья научная