Sunday, May 24, 2015

CHOOSING A DATA STORAGE OPTION

CHOOSING A DATA STORAGE OPTION

No one approach is right for all scenarios. If anyone says that a particular technology is the answer, the first thing to ask is "What is the question?" because different solutions are optimized for different things. The relational model has definite advantages; that’s why it’s been around for so long. But there  are also downsides to SQL that can be addressed with a NoSQL solution.
Often, what we see work best is a composite approach in which SQL and NoSQL are used in a single solution. Even when people say they’re embracing NoSQL, a closer looks reveals that they’re using several different NoSQL frameworks—they’re using CouchDBRedis, and Riak for different things. Even Facebook, which uses NoSQL solutions extensively, uses different NoSQL frameworks for different parts of the service. The flexibility to mix and match data storage approaches is one of the qualities that’s nice about the cloud; it’s easy to use multiple data solutions and integrate them in a single app.
Here are some questions to think about when you’re choosing an approach:
Data semanticWhat is the core data storage and data access semantic (are you storing relational or unstructured data)? Unstructured data such as media files fits best in Blob storage; a collection of related data such as products, inventories, suppliers, customer orders, etc., fits best in a relational database.
Query supportHow easy is it to query the data? What types of questions can be efficiently asked? Key/value data stores are very good at getting a single row when given a key value, but they are not so good for complex queries. For a user-profile data store in which you are always getting the data for one particular user, a key/value data store could work well. For a product catalog from which you want to get different groupings based on various product attributes, a relational database might work better. NoSQL databases can store large volumes of data efficiently, but you have to structure the database around how the app queries the data, and this makes ad hoc queries harder to do. With a relational database, you can build almost any kind of query.
Functional projectionCan questions, aggregations, and so on be executed on the server? If you run SELECT COUNT(*) from a table in SQL, the DBMS will very efficiently do all the work on the server and return the number you’re looking for. If you want the same calculation from a NoSQL data store that doesn't support aggregation, this operation is an inefficient “unbounded query” and will probably time out. Even if the query succeeds, you have to retrieve all the data from the server and bring it to the client and count the rows on the client.  What languages or types of expressions can be used? With a relational database, you can use SQL. With some NoSQL databases, such as Azure Table storage, you’ll be using [OData](http://www.odata.org/)[,](http://www.odata.org/) and all you can do is filter on the primary key and get projections (select a subset of the available fields).
Ease of scalabilityHow often and how much will the data need to scale? Does the platform natively implement scale-out? How easy is it to add or remove capacity (size and throughput)? Relational databases and tables aren’t automatically partitioned to make them scalable, so they are difficult to scale beyond certain limitations. NoSQL data stores such as Azure Table storage inherently partition everything, and there is almost no limit to adding partitions. You can readily scale Table storage up to 200 terabytes, but the maximum database size for Azure SQL Database is 500 gigabytes. You can scale relational data by partitioning it into multiple databases, but setting up an application to support that model involves a lot of programming work.
Instrumentation and ManageabilityHow easy is the platform to instrument, monitor, and manage? You need to remain informed about the health and performance of your data store, so you need to know up front what metrics a platform gives you for free and what you have to develop yourself.
OperationsHow easy is the platform to deploy and run on Azure? PaaS? IaaS? Linux? Azure Table storage and Azure SQL Database are easy to set up on Azure. Platforms that aren’t built-in Azure PaaS solutions require more effort.
API SupportIs an API available that makes it easy to work with the platform? The Azure Table Service has an SDK with a .NET API that supports the .NET 4.5 asynchronous programming model. If you're writing a .NET app, the work to write and test the code will be much easier for the Azure Table Service than for a key/value column data store platform that has no API or a less comprehensive one.
Transactional integrity and data consistencyIs it critical that the platform support transactions to guarantee data consistency? For keeping track of bulk emails sent, performance and low data-storage cost might be more important than automatic support for transactions or referential integrity in the data platform, making the Azure Table Service a good choice. For tracking bank account balances or purchase orders, a relational database platform that provides strong transactional guarantees would be a better choice.
Business continuityHow easy are backup, restore, and disaster recovery? Sooner or later production data will become corrupted and you’ll need an undo function. Relational databases often have more fine-grained restore capabilities, such as the ability to restore to a point in time. Understanding what restore features are available in each platform you’re considering is an important factor to consider.
CostIf more than one platform can support your data workload, how do they compare in cost? For example, if you use ASP.NET Identity, you can store user profile data in Azure Table Service or Azure SQL Database. If you don't need the rich querying facilities of SQL Database, you might choose Azure Table storage in part because it costs much less for a given amount of storage.
Microsoft generally recommends that you should know the answer to the questions in each of these categories before you choose your data storage solutions.
In addition, your workload might have specific requirements that some platforms can support better than others. For example:
  • Does your application require audit capabilities?
  • What are your data longevity requirements—do you require automated archival or purging capabilities?
  • Do you have specialized security needs? For example, your data might include personally identifiable information (PII), but you have to be sure that PII is excluded from query results.
  • If you have some data that can't be stored in the cloud for regulatory or technological reasons, you might need a cloud data storage platform that facilitates integration with your on-premises storage.

No comments:

Post a Comment