«

»

In-Memory Databases

Share

In-memory databasesIn this post I present a high-level review of three of the mainstream in-memory database management systems, namely Oracle’s TimesTen, IBM’s solidDB and SAP’s HANA. It is not a comprehensive feature-by-feature evaluation though. There are also other in-memory database systems out there (such as Altibase, Michael Stonebraker’s VoltDB and Microsoft’s IMDB) that are not covered here either. I also investigate why SAP HANA in particular is becoming so popular.

“In-Memory” has become a buzzword of late, so we need to distinguish between in-memory databases and in-memory analytics. An in-memory database management system primarily uses main (RAM) memory for all its data storage, as opposed to a conventional database management system that stores its data on disk (although it too caches some of its data in memory to speed up operations). An in-memory database does all the reading and writing of data in memory, vastly improving the speed and response times of transaction processing and analytical applications. In-memory analytics on the other hand, is where the data is loaded into memory (typically temporarily from disk) in order to apply analytical operations to the data whilst it is in memory. With in-memory analytics, the data is hardly ever updated.

Oracle TimesTen

Oracle acquired TimesTen in 2005. It now markets this product as both a standalone in-memory database and as an in-memory database cache for the Oracle database management system.

Oracle TimesTen is a full-featured relational database that is designed to run in the application space and store all its data in main memory. TimesTen uses a memory-optimized data layout and access methods for highly concurrent workloads.

Durability and recoverability is achieved through a combination of transaction logging and database checkpointing to disk. With high speed replication, TimesTen can be configured for high availability and instant failover.

Oracle TimesTen supports full SQL transaction semantics and includes OCI, Pro*C and PL/SQL for compatibility with Oracle’s relational database.

TimesTen is typically implemented as an application library from which it is linked to the application. This improves performance by eliminating the client-server connection to the database. However, conventional client/server access can be used when a database is shared by a number of applications. TimesTen does not require database administration interaction, so it can be completely embedded in an application.

Oracle TimesTen is the database “engine” of the Oracle Exalytics appliance.

IBM solidDB

IBM acquired solidDB in 2008. It is a persistent, relational, in-memory database with high performance and reliability for real-time applications.

The solidDB database consists of two integrated database engines: one is a traditional disk-based engine and the other is a main memory engine allowing you to create tables that reside permanently in main memory. The indexes created for those tables are also stored totally in main memory. When using the in-memory database capability you may choose, for each table, whether the table is stored on disk or in memory.

When deployed in a two-node, hot-standby configuration, solidDB maintains two copies of the data synchronized at all times, therefore in the case of a system failure, applications can recover access to solidDB in less than a second without any loss of data. It persists data to disk through checkpointing and transaction logging. It provides powerful schema mapping and data transformation capabilities and scales both vertically and horizontally, horizontally with two-node shared memory access to increase performance and reduce response times. solidDB’s access methods implement efficient concurrency control mechanisms to help ensure transactional integrity across large numbers of concurrent operations.

solidDB is designed to take full advantage of 64-bit memory, as well as the processing scalability provided by multicore, multiprocessor architectures.

IBM’s solidDB is easy to deploy and administer, therefore it runs virtually unattended and it can be embedded directly into applications. Linking applications directly with the server code means that the application-level requests can be executed within the same address space. Applications then access the memory-resident data without the overhead incurred by multiple context switches on each interaction between the application and the server.

Applications access solidDB through standard ODBC, JDBC and SQL interfaces. solidDB supports SQL-92 entry-level, with selected features of SQL-98 and SQL 2003, with full ACID transactions.

IBM’s solidDB is also employed as the in-memory caching software to accelerate IBM DB2 and Informix, Oracle, Microsoft SQL Server and Sybase.

SAP HANA

SAP AG announced the general availability of their in-memory database, SAP HANA, in June 2011.

The SAP HANA database is an in-memory database that combines transactional data processing, analytical data processing, and application logic processing functionality in memory.

SAP HANA is a full database management system (DBMS) with a standard SQL interface, high availability, transactional isolation and recovery (ACID properties). It supports most entry-level SQL92 statements, and SAP applications that use Open SQL can run against HANA without any changes. Additional functions, such as freestyle search, is implemented as SQL extensions. BI applications are supported through multidimensional expressions (MDX) for Microsoft Excel and business intelligence consumer services (BICS), and an internal interface for BusinessObjecs. For analytical planning, you can iterate values on aggregated analytical reports. There are prepackaged algorithms in the predictive analysis library of SAP HANA to perform advanced statistical calculations.  HANA has built-in text support, from its predecessor BI Accelerator that was based on the TREX search engine and Inxight functionality has also been integrated into the HANA text functionality.

SAP has leveraged its application expertise to execute specific business application functions natively in memory. Examples include currency conversion and different business calendars.

SAP HANA supports both row-based and column-based stores within the same engine. You can use the type of store most applicable to the operations you expect to perform on the data. In general row-based storage is good for transactional applications, while column-based storage is better for reports and analytics. Column-based storage compresses the better too.

With massively parallel multicore processors, SAP HANA optimises the SQL for parallel execution, which scales well with the number of cores. It performs aggregation operations by spawning a number of threads that act in parallel, each of which has equal access to the data resident on the memory on that node. HANA also supports distribution across hosts, where large tables may be partitioned to be processed in parallel.

HANA is the database “engine” of the SAP HANA Analytics appliance as well.

Why HANA now?

Of these, SAP HANA is getting a lot of coverage lately. Why would that be? On a high level there does not seem to be that much difference between Oracle’s TimesTen, IBM’s solidDB and SAP’s HANA. Likewise, just storing data in the RAM, and treating this as a faster storage medium for the database, is nothing new. Sybase ASE – the conventional relational database acquired by SAP last year – also has an “in-memory” database option, where the whole database is operated in memory.

Is there anything else that distinguishes HANA to cause it to take the world by storm? On closer inspection, three aspects of HANA stand out:

  • HANA’s combination of a row and column store is fundamentally different from any other database engine on the market today, which allows it to perform OLTP and analytics processing in memory, at the same time.
  • In SAP HANA everything is tuned for the performance required for real-time data processing. Even if RAM is faster than disk, the CPUs still waste cycles waiting for data from RAM. HANA reduces these idle cycles by making sure that there is as much useful data in the CPU caches as possible, through its unique CPU-cache-aware algorithms and data structures. Likewise, it uses late materialization to decompress columnar structures as late as possible, or to run operations directly on the compressed data.
  • SAP HANA is also sold as an appliance, but only on Intel Xeon CPUs, as opposed to on the usual number of hardware platforms. Developers from Intel spent months with SAP coding the core of the HANA appliance, using all the possible features of Intel Xeon chipset architecture, such as HyperThreading, Intel Turbo Boost and Threading Building Blocks. (So you may need to watch HANA’s performance on other platforms or in a virtualized environment). The SAP HANA High Performance Analytic Appliance can perform large-scale data analyses on 500 billion records in less than a minute, taking analytics to an entirely new dimension. This represents a complete data warehouse in RAM, and as a result, much accelerated real-time analytics.

Are these the components of a winning formula, or is HANA merely in the correct hands at the correct place at the correct time?

1 comment

1 ping

  1. Martin

    Here is an excellent blog post about using HANA as an EDW platform:
    http://www.saphana.com/community/blogs/blog/2013/04/03/the-hana-edw?goback=%2Egde_4089577_member_228968734

  1. In-memory data warehouses » Martin's Insights

    [...] a previous post on in-memory databases I gave a high level overview of SAP HANA, Oracle TimesTen and IBM’s solidDB. In this post I [...]

Leave a Reply