NoSQL and Elastic Caching in Papyrus

Mike Gualteri posted on his Forrester Research blog on Application Development about NoSQL and Elastic Caching. Quote: ‘The NoSQL idea is pretty simple: Not all applications need a traditional relational database management system (RDBMS) that uses SQL to perform operations on data. Rather, data can be stored and retrieved using a single key. The NoSQL products that store data using keys are called Key-Value stores (aka KV stores).’ Mike sees the difference as: ‘Ultimately, the real difference between NoSQL and elastic caching may be in-memory versus persistent storage on disk.’

I already posted about the powerful clustering and caching algorithms of the Papyrus Platform some time back. It was now interesting to read about combining NoSQL and Elastic Caching. The Papyrus Platform uses both the same concepts on the lowest layer to support the metadata repository, rule engine, and the distributed, object-relational database and transaction engine. Even the strict security layer and easy to use thick- and thin-client GUI frontend benefit from the powerful object replication and caching.

  • Reliability and Scaling: Papyrus offers the benefits of reliability and scaling through replication. Persistence and storage management concepts are defined on a per object type and node type form. Data can be spread across thousands of nodes. Also user PC’s can have their own local node and storage. Actually, that will be even true for mobile phone users once our mobile kernel will be available later this year for iPhone, WinMobile, and Symbian.
  • Fast Key-Value Access: Papyrus supports straight key-value access but also PayprusQL object-relational access (similar to Xpath), offering query and search across data in widely distributed KV storage nodes. Those can also be offline (dumped to tape or DVD).
  • Distributed execution: Papyrus executes object-state-engines and methods (implemented in PQL), events, and rules. The deployment of the application is automatic to the local node where the data is or any other chosen node. It does not take developers (clever or not) to distribute the load across multiple servers.
  • Change of data structures: Due to Papyrus WebRepository and its class versioning we can add fields to objects without the need to restructure database tables. New instances will simply have the new fields. Data storage IS NOT XML format because the performance to parse it is dreadful. Papyrus uses field-length-keyed hex-codepaged strings that can be parsed 20 times faster.
  • Latency: Papyrus can use transient objects that not saved to disk when the data does not have to be persisted. This significantly reduces the latency of data operations. In-memory operation is thus not a downside for large or persistent objects because it can be chosen per object type (class or template).
  • Reliability: Papyrus provides distributed caching with data replication algorithms to store the data on multiple nodes. If one of the nodes goes down, the load balancer in V7 will move the user session to another node and continue with the proxy objects there. A more efficient object distribution for a HA cluster will be available in Q410.
  • Scale-out: With Papyrus you add and remove nodes during operation. Currently the application can choose how the objects are distributed across nodes. The next release in Q410 will provide this distribution on system level as a part of the backup and recovery procedure.
  • Execute in data location: Using distributed code execution, developers can distribute the workload to where the data resides rather than moving the data to the application. Execution of methods on the owner node of the tool is the basic functionality. Full Distribution  is no problem with PQL.

It does not require enterprise application developers and architects to create architectures with the above features as they are embedded in the Papyrus Platform peer-to-peer kernel engine. Papyrus thus provides all the benefits of NoSQL and Elastic Caching without the technical complexity:

  • Achieve savings by reducing RDMS licenses and maintainance.
  • Add scaling layer in-front of databases, SOA or MQ messaging.
  • Build Web applications with shared session and application data.

Papyrus Application Scalability

Many Papyrus Platform installations have recently grown substantially and therefore means of application scalability have become a common subject. The principal scalability of the Papyrus Platform is unlimited due to its peer-to-peer cluster design but it is restricted by the synchronicity needs of the application. Simply reading or displaying documents or content from storage has no limitation, but keeping multiple write accesses in sync creates scalability issues. Clearly, when the number of users is doubled from 1000 to 2000 users then it does require monitoring and if necessary scaling the HW. When the number of documents or process tasks is doubled from 1000 to 2000 per hour that also requires consideration ow that load can be safely spread across server nodes.

Scaling Papyrus Platform applications is very simple compared to for example three-tiered Web/Java/SQL/SOA application clusters. A more detailed discussion of Java Application scalability I have posted on my ‘Real World’ blog.

User complaints that the ‘application is slow’ without any details being measured are not helpful but have to be taken seriously. Often the user has no means to understand that the processing requirements for similar looking documents can differ substantially. The document might be simple for the user but the backend process can be fairly complex. We propose that proactive monitoring is established. The Papyrus Platform has all the means for such monitoring available.  Simple dashboards can be created and summary reports are available.

Scalability is not only about tuning or maintaining an acceptable response time for a growing number of users. It is unreasonable to expect that the system will handle growth in users and transactions automatically. No system does. Papyrus has many load balancing and tuning options and many are set either by default or by the system. We found that some automatic functions had to be changed for weak networks or when growth was not gradual but for example many users, nodes or new documents and processes were added at once. That is related to the automatic version control and deployment of the Papyrus Platform.

Document application are a complicated conglomerate of GUI, process, and rule execution threads that read and write data from a number of service interfaces or databases. The performance of the SOA backend service interfaces or the database is much more relevant to the scalability than the user front end. Using common database or transaction measurements the Papyrus Platform executes millions of transaction per hour. That is however irrelevant. The question is how that translates into user experience!

The following measurements are used to define the quality of scalability:

1) IRT – Initial Response Time: from request to first usable feedback

2) TRT – Total Response Time: from request to completed function

3) SRP – Service Processing Time: the time the request is actively being processed

4) SQT – Service Queuing Time: the time the request waits for processing

5) ATT – Average Transaction Throughput: transactions performed in a time window

These values can not be taken from the system itself but the measurement of user experience has to be defined in the application. In a multi-tiered Java application that is practically impossible. A screen refresh may or may not be connected to a previous user entry. Papyrus assigns a JOB-ID to each user data entry and enables the tracing through all the functions and servers. A JOB-ID will now trigger the five above measurement points and thus enable a real-world measurement. It is planned to provide a ‘User Experience Dashboard’ for each user that will display a kind of ‘VitalSigns’ statistics for real-time feedback.

The Papyrus Platform uses the Application Performance Analyzer or APA to measure those values and relate them to the general statistical data about CPU, I/O and RAM usage.

APA Tuning and Monitoring

APA offers a unique level of insight across all application functions from the user click on the desktop or portal to the final display. Next to the elapsed time measurements, the measurement of resource usage is the key to understanding and tuning. How much CPU, RAM, disk I/O and network bandwidth is consumed per transaction and in total has to be known for tuning.

Web/Java/SOA applications have substantial overhead for sticky-load-balancing, transaction-safe Java caching and database clustering, and parsing and validating the XML data for SOA data communication. Rather than the immense complexity of clustering multi-layered caches – with multiple conversion from tables to cache pages to objects and reverse – the Papyrus Platform collapses the horizontal structures and work with a purely (vertical) object model from definition to storage. The proxy replication mechanism of the Papyrus Platform uses a partitioned caching concept where there is always a unique data owner and each server node uses a replicated copy that is either pull or push updated. The objects do not have to be de/serialized as they are cached/replicated/binary-stored as is. The same object caching mechanism works transparently for all objects regardless whether they are populated via Web services, external databases or from the Papyrus objectspace.

As a consequence of this future proof design the Papyrus Platform can scale linearly as servers and nodes are added to the application clusters. It is however important to segment the database properly. Papyrus provides transparent object access and search across any number of nodes without any changes to an application. it is important to understand that even with a powerful system such as Papyrus the application has to be defined scalable as much as the system.