NoSQL is designed for operational needs — real-time applications that interface with customers or support APIs in a microservice pattern. NoSQL allows for high-performance, agile processing of information at massive scale (all NoSQL systems include clustering for either scale or resilience), a key feature of the new generation of operational databases. Unstructured data is supported, allowing access via different interfaces to drill down into the freshest and most vital data an organization collects. As such, the NoSQL distributed database infrastructure has been the solution of choice for some of the largest enterprises.
There are many features that differentiate Aerospike from its NoSQL competitors. The differentiator that makes it the choice for some of the world’s most demanding companies is its hybrid memory architecture (HMA). Aerospike’s HMA provides high performance and sub-millisecond latency with low hardware spend — and thus, lower total cost of ownership (TCO) — allowing for enormous scaleup at significantly less cost than that of pure RAM. This enables richer and more compelling user experiences that are key to success in the Internet age.
In an interview with Aerospike CEO John Dillon, we find out how this NoSQL database is enabling companies in myriad industries to achieve mission-critical projects at scale. Headquartered in Mountain View, California, the company was founded in 2009 by database and networking industry veterans Brian Bulkowski and Srini V. Srinivasan, now its Chief Technology Officer and Chief Development Officer, respectively. With the stakes in the Internet enterprise game getting higher, the companies that switch not just to NoSQL database technology, but to the NoSQL database that helps them best apply both analytics and operational processing, will gain a lasting competitive advantage and come out ahead.
1. Why are companies making a switch to NoSQL database technology? Is it driven by the need to rethink big data strategy?
The rapid expansion of the digital world has resulted in significant changes in data volumes (petabytes or exabytes of data vs. terabytes), data processing velocity (high speed vs. low speed), and data variety (structured, unstructured and hybrid/semi-structured vs. structured) — what we commonly refer to as “big data”. Another trend is the increasing interdependency and complexity of data accelerated by the Internet, Web 2.0, and social networks, as well as by open and standardized access to data sources from a large number of different systems. India is on the forefront of this shift due to its extraordinarily growing consumer base and willingness to rapidly deploy and engage with this new technology.
Yet what’s important is not the amount of data deluging an organization creates on a day-to-day basis, but what a company does with the data — specifically, the insights it can glean from these datasets. The real value of big data for an organization is that it can be analyzed for insights that lead to better decisions and actions.
2. Do big data architectures need to be re-thought?
No, not in all cases. In many instances, they can be supplemented. But in some cases, new architectures like hybrid memory (pioneered by Aerospike) can enable a solution that is an order of magnitude better than comparable solutions. After building a data warehouse and creating innovative batch processes to better understand customers, the next step is to bring those insights online, and to create a richer, more engaging, and sometimes more fun user experience. This is the natural progression and extension of batch-oriented and data scientist-oriented approaches — and the next step for big data.
3. How does an enterprise use analytics techniques to improve usability through customization and point-of-contact decisions, and excel in delivering a superior user experience?
Previously, companies used data management tools based on the relational model (e.g., classic RDMBSs). What we’re seeing today, however, is that the amount of data has grown so large and complex, and the decisioning requirements so demanding, that datasets can’t be effectively managed or analyzed using conventional RDBMSs. In fact, in an increasing number of cases, the use of relational databases leads to problems, due to their (1) fixed schema, which makes them ill-suited for changing business requirements, as schema changes are problematic and time-consuming, (2) insufficient performance (too low) and latency (too high) for the new requirements, and (3) limited ability to scale cost-effectively, if at all.
To handle this problem, enterprises are complementing traditional RDBMSs with a rich set of alternative DBMSs such as NoSQL and NewSQL. NoSQL systems in particular are distributed, non-relational databases designed for large-scale data storage and for massively-parallel, high-performance data processing across a large number of commodity servers. They arose out of a need for agility, performance, and scale, and can support a wide set of use cases, including exploratory and predictive analytics in real-time. NoSQL database systems sprang up alongside major Internet companies such as Google, Yahoo!, and Amazon; each had challenges in dealing with huge quantities of data in real-time, something that conventional RDBMS solutions could not cope with. Originally motivated by Web 2.0 applications, NoSQL systems — in contrast to traditional DBMSs and data warehouses — are designed to scale to hundreds of millions and even billions of users doing updates as well as reads.
4. Everything in big data technology is either Hadoop or NoSQL? What advantages does NoSQL have over traditional Hadoop technology?
NoSQL databases refer to non-relational database solutions, such as Aerospike and others. Hadoop is a complex multipart framework that includes “data lake” bulk storage, batch computation frameworks, and query systems. At first glance, NoSQL databases and Hadoop appear to be similar, if not competitive, technologies. Both manage large and rapidly growing data sets, both can handle a variety of data formats, and both can leverage commodity hardware working together as a cluster. But whereas NoSQL is a database infrastructure that can be used on the front-edge of the user experience to create a richer real-time experience (page-by-page or application-action-by-application-action) and handle the heavy demands of big data, Hadoop is a batch-oriented system that enables one to find long-term insights and patterns through sophisticated data analysis. Using MapReduce, Hadoop distributes a dataset among multiple servers and operates on that data. The results of the MapReduce processing are then usually recombined and stored in Hadoop’s own distributed filesystem HDFS, which makes data available to other computing nodes in the Hadoop cluster. Eventually, those results are either presented to decision makers for reporting, or can be displayed online, such as the classic search engine results that are pre-computed through a PageRank-type algorithm.
The companies we are seeing choose NoSQL alongside Hadoop are doing so for several reasons:
a) They Run Real-Time Customer Facing Applications at Scale
Hadoop is naturally suited for data analytics. In the three-stage Hadoop process, data is loaded into HDFS, processed through MapReduce, and then results are retrieved from HDFS. The process is inherently a batch operation, suited for analytical or non-interactive computing tasks. NoSQL is designed for operational needs — real-time applications that often interface with customers or parties external to the organization. It provides the ability to query the data, so users can drill down into the data as it changes. NoSQL allows for high-performance, agile processing of information at massive scale. It stores unstructured data across multiple processing nodes, as well as across multiple servers.
b) They Have a Pressing Need for Speed at Scale
If a company needs to process ever-larger volumes of data, Hadoop can handle it. Hadoop can process petabytes of data compared to the maximum range of terabytes of data that most first-generation NoSQL databases can handle. It should be noted that Aerospike’s hybrid memory architecture sets it apart from earlier NoSQL technologies: it can support real-time applications that extend past hundreds of terabytes. While some of these users are augmenting Hadoop with other processing systems such as data grids, Spark-style in-memory processing, or stream processing (through Storm, Akka, Heron, or others), many find that Aerospike has a role to play in conjunction with such analytics architectures by expanding the amount of data available for streaming analysis.
However, it’s not just the velocity of streaming inputs a technology can handle that matters, but also the velocity of a system’s outputs, as these are what makes an enterprise competitive. In contrast to the minutes or hours required by Hadoop processes, NoSQL works on small subsets of data that it’s able to process in milliseconds or sub-milliseconds. Thus, when there is a need for speed, NoSQL is the technology to implement.
To address the need for low latency, NoSQL databases such as Aerospike have filled the widespread need for in-memory key-value stores, along with projects such as Apache Spark for in-memory, real-time computations.
c) They Are Internet-First or Mobile-First Enterprises
Hadoop is specifically designed for large-scale data processing for use by internal analysts. While this scale is necessary for any Internet-first activity, where data can be collected at will for analysis, the ability to act on those insights is also prevalent.
But taking these insights from this batch system and moving them towards the front edge requires a new technology, namely NoSQL, which is designed for real-time, interactive access to data. Typical NoSQL use cases are characterized by end user interactivity, as is the case for Web applications or virtually any application requiring rapid reads and writes of data.
d) They Have a Use Case That Goes Beyond Batch Analytics
Hadoop can provide powerful analysis, but it’s batch-oriented and can be highly troublesome — if not downright impractical — for real-time use cases. However, it is excellent at petabyte scale.
With its Hadoop connector, Aerospike NoSQL fits easily in the Hadoop ecosystem. In all integrations with Hadoop, the underlying concept is to be able to enrich the operational data on Aerospike, as well as to provide the operational data from Aerospike to the Hadoop ecosystem for enterprise-wide analytics — and in turn, enrich the analytics data set with updates from operational data on Aerospike. In real-time applications, Aerospike is also used as a results store, appending incremental updates from machine learning algorithms running on enterprise-wide data, and providing operational data for models feeding real-time web applications.
5. Where are some of the most common use cases of Aerospike seen? Is it in healthcare, retail, e-commerce, or BFSI?
Aerospike defines itself as an enterprise-class, NoSQL database solution for real-time applications that delivers predictable performance at scale, superior uptime, and high availability at the lowest total cost of ownership (TCO) compared to first-generation NoSQL and relational databases. We currently offer two versions of our database software: the free, open source Community Edition, and the commercial, enterprise-grade Enterprise Edition. The latter includes all the features of the Community Edition, plus many premium features. It also includes access to tested and certified builds, hot patches and 24x7x365 enterprise support. Founded in 2009, Aerospike has now been in production for over 7 years in some of the world’s most demanding companies.
In its early years, Aerospike tested its mettle in the rigorous adtech space and quickly dominated the industry with such flagship customers as eBay and Yahoo! This includes real-time advertisement bidding using real-time auctions to broker online ads (so the right ad can be presented to the right visitor within a fraction of a second), as well as programmatic video ads.
Popular for high-performance operational uses at scale, Aerospike has seen ever-increasing adoption thanks to its ability to provide scale and agility while ensuring availability, uptime, and unmatched performance. Many of the world’s most successful Internet businesses rely on Aerospike as the ideal solution to support their critical systems of engagement (SoEs), which require low latency, high throughput, and massive scalability. Aerospike’s early customers included Exelate, Snapdeal, InMobi, AppNexus, Kayak, Adform, and Neustar.
Over the last few years, Aerospike has diversified its customer base beyond these sectors by catering to real-time, mission-critical use cases in other industries, including financial services (digital payments, banking, etc.), e-commerce, telecommunications, martech, media, publishing, and manufacturing.
As Aerospike’s engineering team continues improving database performance and adding features, this has opened the door for a new set of use cases, including:
- Fraud prevention and digital payments applications that analyze hundreds of contextual data points for billions of users and devices in real time to identify questionable transactions.
- Real-time, analytics-based risk monitoring and alerting.
- Caching layer consolidation: Aerospike’s unique hybrid memory architecture (HMA) eliminates the need for a caching layer, giving a competitive advantage to the firms building systems of engagement.
6. Can you share some of your customer success stories?
Here are just a few of our customer success stories:
A Fortune 100 brokerage and investment management company needed to consistently and reliably support ever-growing workloads during trading hours. Without this capability, they faced consequences such as losing revenue and using delayed (stale) data for financial risk analysis. The company replaced its underperforming and costly cache-and-relational-database combination with Aerospike. The firm is now able to achieve high trading volumes with very low latencies and incorporate the most recent transactional insights into its risk profiling.
A global digital payment provider’s data architecture couldn’t cope with analyzing large data sets quickly enough to make accurate decisions. Consequently, at times, legitimate transactions were denied, while fraudulent purchases were approved, hurting profits and leaving customers dissatisfied. Thanks to Aerospike, this provider became able to process a large volume of data, with latencies in the sub-milliseconds. This resulted in improved fraud detection, reduced false positives and negatives, and enhanced customer satisfaction.
A global telco solutions provider needed a high-performance database for subscriber data management that could cater to the unique characteristics of their customers, based on real-time information about usage patterns, subscriber preferences, needs, and lifestyles. Aerospike provided this company with the high, predictable performance it needed to process a large volume of data, and helped it replace its traditional data architecture.
Here’s what Aerospike’s customers have to say about the performance:
“Aerospike was the only product that was solving three of our main requirements. Now, Aerospike has become a very big part of our whole stack and is something you can use out of the box that works.”
– Mohit Saxena, co-founder and VP of Technology at Inmobi
“2.5 million impressions a second at peak, although we can go much higher, and we see north of 90 billion impressions per day and this is a 24×7 business with 100% uptime with Aerospike. We run Aerospike heavily, peaking at 3 million reads per second and well over 1 1/2 million writes a second in a very cost-effective way. I don’t think there’s any technology we’ve run into that even comes close.”
– Geir Magnusson, CTO of AppNexus
7. Could you tell us how DBMS players are upending RDBMS leaders – Oracle & IBM? Could you tell us how Aerospike, NoSQL DBMS scores over legacy players like Oracle?
New DBMS players are creating systems that reside alongside, and work in concert with, existing RDBMS implementations. We see this reflected in the new big data architectures: companies are augmenting their traditional transactional and relational processing systems with big data systems as well as NoSQL systems. The older architectures were built for a substantially different hardware base, at a time where storage costs were massively higher. Hence, building at Internet scale — or building a system of engagement with a richer and more personalized user experience — became the purview of new databases.
Aerospike has a unique position in the database industry as a whole for having created a system with extraordinary speed — driving 99% worst-case latencies to under a millisecond — without resorting to a DRAM memory architecture, thus enabling higher levels of scale. In this era of speed and scale, Aerospike scores over existing architectures.
It should be said that for all the rise of NoSQLs, RDBMSs aren’t exactly losing ground. There are several reasons for this. Firstly, as much as companies may be keen on adopting NoSQL to bring mounting quantities of unstructured data under control, most of their workloads remain small scale transactional in nature, which is the specialty of RDBMS. Secondly, analytics tooling for NoSQL is still in its beginning stages. And thirdly, enterprises have spent the last three decades using relational databases. It’s virtually impossible and even undesirable, in many cases, to change their culture overnight. Lastly, it turns out that sometimes, a RDBMS is indeed the best solution for a particular problem. Consequently, there are a number of use cases where Aerospike sits as an adjunct to RDBMSs and/or supports RDBMSs.
8. What are your plans for India market? Could you share any India-specific use cases?
Aerospike was founded in 2009 by Dr. Srini V. Srinivasan and Brian Bulkowski on the precept of world-class, round-the-clock support. With our founders’ background in building 24x7x365 teams, it was inevitable that we would establish support centers worldwide early on, including in India and Europe. But given the fact that Aerospike is a highly technical product, we felt that we could not have support centers outside the US without also having development centers with world-class engineers in these locations. You could say that we believed so strongly in world-class support that we decided to adopt a geographically distributed development model for our company. Ever since, a significant portion of our development projects emanate from our Bangalore office, as the city is among the best places in the world to hire deep technical talent (e.g., database experts, system programmers in C, etc.).
In the near-decade since Aerospike’s inception, Internet and mobile access in India has grown by leaps and bounds. Aerospike has been front and center in this trend, providing the infrastructure to support the real-time use cases that emanate from this transformation. The initial use cases include real-time display advertising and real-time bidding for companies such as InMobi and Pubmatic. Subsequent use cases benefiting from Aerospike’s extremely high and predictable performance, as well as its high availability and uptime, include real-time inventory management for enterprises like Snapdeal and Flipkart. Aerospike continues to expand its use cases in India in a myriad of new areas, such as movie ticket purchasing, grocery delivery, chat applications, and real-time, user-level mobile network status information. Of Aerospike’s customers, many are market leaders in the telco, e-commerce, online transportation, and cashless payment systems sectors.
Aerospike has been focused on the India market since 2010 — merely a year after it was created. As mentioned earlier, Aerospike set up world-class technical teams concomitantly in its headquarters in Mountain View, California and in Bangalore in 2011 to maximize our ability to use worldwide technical talent. But another reason for establishing a presence in India was to leverage the fast-growing Asia market, including China, India, Japan, and, now, countries in Southeast Asia. Some of our early customers hailed from these regions in addition to North America. In fact, our early start in India has created very high visibility with both technical and business leaders in the country. Over the years, we have seen our investment in India show great returns. As a result, we’ve doubled down on it, both to address the fast-growing market in India, and to use the country as a springboard for accessing other parts of Asia.
What’s our goal for the India market? Simply put, to be the key real-time database infrastructure used to power the growth of mission-critical, real-time applications in India’s rapidly growing online and digital economy.
9. Can you list features that make Aerospike a leading choice for analyzing big data?
In the era of big data, business competitiveness depends on creating real-time opportunities from massive — and growing — data sets, with latency requirements that become stricter each year. Insights must be generated in just microseconds, and systems of engagement must make analysis and action happen at the same moment. Vital data generated through interactions with customers, partners, and ecosystems must be the basis for instantaneous activity, providing continuous feedback that informs every action.
The Aerospike database provides businesses with the technology they need to enable their systems of engagement (SoEs) to meet these stringent requirements. How? By allowing analytics systems access to the most timely operational data. An innovative set of features and capabilities enables the combination of high throughput and low latency that presents to an analytics system — either created by the application, or through a framework like Spark — allowing it to drive value from every business moment.
Aerospike’s key features include:
- Hybrid memory architecture (HMA): With the database index in memory, it uses attached SSDs as block devices to store data. Access to the index without disk I/O enables predictable high performance. Aerospike’s HMA enables the use of flash storage (SSD, PCIe, NVMe) in parallel on one machine to perform reads at sub-millisecond latencies at very high throughput (100K to 1M) in the presence of a heavy write load. This use of SSD enables enormous vertical scaleup at a total cost of ownership (TCO) five times lower than that of pure RAM.
- Real-time engine: Multi-threaded processes provide simultaneous access to the data across all available cores to scale up to millions of transactions per second per server at sub-millisecond latencies.
- Dynamic cluster management: Very rapid reaction to hardware or network failures and changes, uniform distribution of data, and transaction workload make capacity planning and scaling up and down decisions precise and simple for Aerospike clusters.
- Smart Client TM: Aerospike’s Smart Client TM automatically distributes both data and traffic to all the nodes in a cluster. Automatic load balancing of the client improves both performance and correctness.
In addition to predictable high performance, Aerospike customers benefit from high availability and uptime, superior scalability, and low TCO. Enhancements to Aerospike’s architecture are ongoing. For example, developments in 2017 have included supporting larger cluster sizes as well as enhancing query and scan capabilities with in-database filtering (a key internal database component necessary for more extended analytics) in Aerospike Server version 12, released this Spring.
The above features have made Aerospike a predominant choice for a myriad of big data use cases, including fraud detection in payment systems, risk management in trading systems, revenue assurance for telcos, real-time bidding and ad serving in display advertising systems, and inventory management in e-commerce systems.
Try deep learning using MATLAB