Saturday, July 23, 2011

NoSQL, NewSQL and MDM


Fixing poor data quality at its source and managing constant change is what Master
Data Management is all about. As new database technologies are evolving they will change the MDM solutions landscape improving performance and scalability of working with large datasets(billions of rows). Most MDM solutions use RDBMS(MSSQL, DB2, Oracle) for managing data and we all know performance tuning is a pain point in today’s MDM solutions, lets look at the exciting new things happening in the database world as these improvements will come to the MDM landscape sooner or later.

Evolving DB Landscape

For Matthew Aslett, senior analyst at the 451 group, there are currently three trends in the industry:

  • the NoSQL databases, designed to meet the scalability requirements of distributed architectures, and/or schemaless data management requirements,
  • the NewSQL databases designed to meet the requirements of distributed architectures or to improve performance such that horizontal scalability is no longer needed 
  • the Data grid/cache products designed to store data in memory to increase application and database performance

Figures-Aslett_web

What is NoSQL?

NOSQL is a sort of all-encompassing term which includes all kinds of databases which do not use SQL or use very little SQL. The following are the types of NoSQL Databases most popular:

  1. Wide Column Store / Column Families - Hadoop / HBase, Cassandra,Cloudata, Cloudera, Amazon SimpleDB
  2. Document Store – MongoDB, CouchDB, Citrusleaf
  3. Key Value / Tuple Store - Azure Table Storage, MEMBASE, GenieDB,
    Tokyo Cabinet / Tyrant, MemcacheDB
  4. Eventually Consistent Key Value Store - Amazon Dynamo, Voldemort
  5. Graph Databases – Neo4J, Infinite Graph, Bigdata
  6. XML Databases - Mark Logic Server, EMC Documentum, eXist

Other Technologies used with NoSQL:
1. Memcached - Memcached is a general-purpose distributed memory caching system that was originally developed by Danga Interactive for LiveJournal, but is now used by many other sites. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source (such as a database or API) must be read. Memcached runs on Unix, Windows and MacOS and is distributed under a permissive free software license.

2. MapReduce - MapReduce is a patented software framework introduced by Google in 2004 to support distributed computing on large data sets on clusters of computers. MapReduce has different flavors the most common being Hadoop MapReduce

Good:

1. Elastic scaling
2. Big data - the volumes of “big data” that can be handled by NoSQL systems, such as Hadoop, outstrip what can be handled by the biggest RDBMS.
3. Economics - NoSQL databases typically use clusters of cheap commodity servers 4. Flexible data models

Bad:

1. Support – Lack of Support is a turnoff for Enterprises
2. Analytics and business intelligence - NoSQL databases offer few facilities for ad-hoc query and analysis.
4. Administration - NoSQL today requires a lot of skill to install and a lot of effort to maintain.
5. Expertise - almost every NoSQL developer is in a learning mode.
6. Does not Support ACID(atomicity, consistency, isolation, durability) making it suitable for non-transactional purposes only

What is NewSQL ?

As per the 451 Group, “NewSQL” is shorthand for the various new scalable/high performance SQL database vendors. [...NewSQL vendors] have in common the development of new relational database products and services designed to bring the benefits of the relational model to distributed architectures, or to improve the performance of relational databases to the extent that horizontal scalability is no longer a necessity.

We would include (in no particular order) Clustrix, GenieDB, ScalArc, Schooner, VoltDB, RethinkDB, ScaleDB, Akiban, CodeFutures, ScaleBase, Translattice, and NimbusDB, as well as Drizzle, MySQL Cluster with NDB, and MySQL with HandlerSocket. The latter group includes Tokutek and JustOne DB. The associated “NewSQL-as-a-service” category includes Amazon Relational Database Service, Microsoft SQL Azure, Xeround, Database.com and FathomDB.

It is only a matter of time as these new technologies mature and will make their way into the MDM landscape. Exciting times ahead !

4 comments:

Anonymous said...

Very interesting article.

Our Blog here

Unknown said...

Interesting blog, thanks for taking time to share this.
Informatica Training center Chennai | Informatica Training Institute in Chennai

Allen Marry said...

IT's very informative blog and useful article thank you for sharing with us , keep posting learn more about Product engineering services | Product engineering solutions.

IT Solutions Provider said...

MDM Trends To Look For In 2022