How do NoSQL databases work? Simply Explained!
NoSQL databases power some of the biggest sites. They're fast and super scalable but how do they work?
Behind-the-scenes, they use a keyspace to distribute your data across multiple servers or partitions. This allows them to scale horizontally across many thousand servers.
NoSQL databases can operate in multiple modes: as key-value store, document store or wide column store.
You can run your own NoSQL database with software like Cassandra, CouchDB, MongoDB or Scylla. You can also use a cloud version like AWS DynamoDB, Google Cloud BigTable or Azure CosmosDB.
💌 Sign up for Simply Explained Newsletter:
newsletter.simplyexplained.com
Monthly newsletter with cool stuff I found on the internet (related to science, technology, biology, and other nerdy things)! No spam. Ever. Promise!
🌍 Follow me
Twitter: / savjee
Instagram: / simplyexplained_com
TikTok: / simplyexplained_com
Website: simplyexplained.com
❤️ Become a Simply Explained member: / @simplyexplained
📚 Sources used to make this video:
savjee.be/videos/simply-expla...
#database #aws #amazon #dynamodb #simplyexplained
This video incorrectly conflates query language with storage architecture. These concepts are independent of each other. Columnar, distributed storage provides most of the horizontal scaling discussed in this video. In today's leading distributed/columnar databases, the data can be accessed with either SQL or NoSQL depending on the access requirements of the application problem you need to solve while realizing the performance benefits of horizontal scaling.
Cosmos DB even offers two storage modes. The default option is a typical KVS described here, while the analytic mode is columnar.
But does it scale?? /p laughts in old meme language
In SQL, you can also partition databases and it's called sharding. You can also duplicate your databases and it's master and slave databases
Thats racist
Although your comment is a very good one its not liked by the author because your comments nullified the purpose of nosql
@@googleJay Numbers? Also why is this racist? I am all for switching from blacklist to deny-list, because what has color to do with this. Ancient Rome had Master and slave. Maybe we should express that the "Master" does all the work here. So I would call them "active" database and "follower". With interrupt controllers and IDE drives it would be high priority, and low priority controller / drive or really primary and secondary because Interrupt Numbers and Drive Numbers count upwards.
@@jaeken "Master" and a "Slave" has nothing to do with race to begin with. People in the past often enslaved people of the same race. And if someone had no bad feelings towards other races, but still enslaved someone - that person wasn't a racist.
Sharding is essential for large relational databases, but usually it's a pretty and expensive problem to solve that involves making strategic decisions based on your data. NoSQL dbs give that to you out of the box.
Thanks for this great simple and short explanation of data storage in NoSQL-databases. Exactly what I was looking for 🙂
Relational databases can be way more efficient at acessing well structured data, as it is placed in predictable memory locations, and the relation keeping processing needed isn't comparable with the overhead a noSQL DB has at acessing data. The choice of SQL vs noSQL really depends on the specific application, one isn't better than the other.
Yes, he mentioned that. You just repeated what he said lol
Exactly. It's like when people ask what's the best programming language.
its a bit late after your post, but what are the resources the server needs to maintain the relationships? That should only be the case IF you make the relations clear to the SQL-Server right?
Actually no, this heavily depends on how you access the data in the database and how many branch and cache misses you have. If you regularly access a significant amount of the data that is arranged so that the other data to be accessed can be loaded with it into the cache then a normal SQL is going to be outperformed by orders of magnitude by the alternatives. There is actually an error in the basic explanation though, as the unique key in the first "wide scaling" example is formed by a hash function there is a relation between the unique key and the stored data, so it doesn't do away with relationships it just leans really heavily into a single one. What should really be explained here is how NoSQL databases break "Normal Forms" and atomicity of operations to improve performance.
Absolutely. But you'll still get the religious nuts here claiming that database ''x' is the cure to all of the world's ills and a golden bullet for all of those hated platforms by Oracle, Microsoft and IBM that the majority of the world still run on. NoSQL is good for some things and crap at others. Just like relational databases. But the fanboiz will never have this.
In MySQL, you don't HAVE to use the relational structuring of your data. you can simply have multiple tables and treat them as truely separate tables where you run multiple queries to get all your data, each query runs on one table for example. Then you can kame the relations in your program if you need it.
Well, that is the point of the relational model. relationships enforcement is optional. tables are just tuples of data.
What almost everyone seems to be leaving out is that arrangement of data and retrieval is not the whole story. Normalizing data makes it quite logical when you want to analyze it later. SQL is useful that way.
@@delturge Very true. Doing RDBMS well means actually thinking about and planning the data structure. This leads to better architecture and designs and makes using that data for other purposes, including adding new features, easier. NoSQL tends to be a lot messier - kinda like spheghetti code. Sure, I'm sure some devs do a great job with NoSQL. But MOST just see it as an excuse so they don't have to bother learning or thinking about databases.
@@Me__Myself__and__I Issue is performance & very large tables that have billions of records. SQL coding is far, far easier than NoSQL since noSQL code has to handle any relational lookups or decoding the value (ie when it contains a structure with multiple fields).
I feel like NoSQL is what you get when a coder creates a database that never learned to clean up his room or organize anything.
It's basically a stack made of everything XD
I started watching, thinking "what the hell is NoSQL?". The further I went, the more I was just thinking "holy shit why"
That is EXACTLY true. NoSQL is ideal for certain limited limited use cases, but it is very popular because lazy programmers don't need to bother learning SQL or good database design. And since NoSQL is so "flexible" these same lazy developers often end up storing JSON that is inconsistent over time so incredibly hard to work with and maintain long-term.
Me myself I, so you wanna say that all tech giants like Apple, Amazon, Facebook, etc use noSQL just cuz their engineers “are lazy and never bothered to learn SQL” xD. You are so delusional, it’s actually scary
@@evergreen- No, I'm the rare developer with extensive experience in both dev & databases/DBA. Plus, apparently you can't read. I said there are some non-typical cases where nosql is a good choice. But it is a bad choice for the majority of typical business systems. Also, anyone who confuses mega-scale projects at huge companies like Amazon, Google, Facebook, etc. with the work that 95% of devs do is delusional.
"EVERYTHING OLD IS NEW AGAIN !" What you are describing is also called an Indexed Sequential Access Method (ISAM) database/file. I believe it was invented by IBM in the 60s or 70s. This was long before relational database existed. Other types of databases existed but they were very complex and hard to program and maintain. In those days "management" want reports about what was n these data files. IBM invented software to "query" these file called Report Program Generator (RPG). It was easy to use and could be setup to do various queries quickly. It was supperseeded by RPG-II that lived on for many, many years. Digital Equipment Corporation (DEC) had their own version that came out in the 70s on PDP-11 and VAX computer systems. It had its own querey/report generator language cal Datarieve. Datatieve was a bit query as it was both an interactive as well "fixed" report generator.
I am more familiar with the DEC implementation of ISAM. First, unlike many modern implementations, the "partitions" lived within the same file. Accessing any file (other than direct maping) was through a layer called Record Management Services (RMS). This was mostly invisible to the programmer/end user, but it did allow simple sequential access to ISAM files for any utility/program without having to understand how the data was actually organized on disk. The "key partition" was "hidden" when using sequential read/writes. While a unique primary was required, you could have multiple secondary keys and these did not have to be unique. Accessing a record via the primary or secondary key was fast ! The key "definition" (meta-data) was actually stored within the file. It could be accessed using a utility called the File Definition Language (FDL). FDL was also used to define/create a new file with different keys. If you wanted to add a key to an exiting ISAM it was really quite easy using Datatrieve. You would define the new file using FDL, use Datarieve to read the old file and populate the new file. Large ISAM file did require some maintenance for optimal results. Tools were supplied to do this.
Although, curiously, the default storage engine for MySQL is "MyISAM", though InnoDB does a much better job of implementing SQL principles like ACID.
When I think of ISAM, I think of spinning tape. But it's not indexed, the PK is hashed. If you that, go back to IDMS... that would make me very happy. All calls will be to a Calc. I was an IDMS DBA for a long time and it was a pretty simple system to manage and navigate if you undetstood pointers. No.. we wants RDBMS and huge cartesian products.
@@JimAllen-Persona - Datatrieve was DECs answer to RPG. It was very much a "Swiss Army knife" ! It could do interactive queries or be programmed to generate "reports". It also had a callable interface. Last it could be extended by writing your own "functions" and link them into the product. One of the coolest features is it could "search" flat (plain, sequential) files as if they were ISAM. Slow, but it worked ! Last, using the DEC Convert utility and its companion File Definition Language you could quickly load an ISAM file OR add another key. I could also be used with DECs relational data base (Rdb) and hierarchical data base by only changing the file description. Sadly, very few people purchased the product and even fewer people used ISAM from any of the multitude of high level languages.
You can store key value in relational database. And you can partition relational database to multiple servers based on primary key or something else. Relational db can do everything nosql can, but not the other way around
WRONG. Relational DB cannot be distributed across hundreds of servers.
@@ydvisual5530 Yes, they can. You can hash and partition(shard) on a relational database.
Also I gorgot to mention one thing. Let's say that you have billions and billions of records on 1000s of servers. What do you do with them? Pick one record by key? How often do you know the key of a record. Most often you pick the key from a search result list. When you go to online shop do you say "I want to buy products with skus 5, 7 and 25"? Not really. You filter by category, you search by keyword and you sort by price and thrn you pick products - none of these are possible with no-sql. no-sql could be useful in few specific cases, but its usage is very limited.
@@ydvisual5530 lol look up what a galera cluster is.
Oops sorry guys, I was really drunk last night and was just trolling with that comment. haha sorry. I looked at this morning and laughed. sorry about that. I dont know much about what I am talking about anyways haha. I have recently been working with oracles OBIEE. Its quite powerful. anyways, sorry about that !
There has been a style of database around since the late 1960’s that offers the best of both, plus many features that both SQL and NoSQL lack. It’s Pick and it’s many variants and descendants. Hashed, variable length, delimited, and it’s native programming language is a high-octane version of that old favorite, BASIC. It may not scale to Amazonian proportions, but it is great for most of the real world. The company I worked for started using it in 1996. By the time I retired in 2009, they had grown to over $100 million in annual sales and are still thriving on Pick clone Unidata.
Now I got it. I took me a while to understand how does the nosql works. I've been working with SQL Server and MySQL for a few years so the nosql idea had confusing my brain a little bit lol. Thanks
Teradata (since the 1980's) and IBM's DB2 Parallel Edition (since the mid-1990's) are examples of SQL databases that use sharding to scale horizontally in a linear fashion. Although both of these typically have more than just a key and a single value, they could be used with only one value (or one value in XML format) just like NoSQL databases, but they can also have multiple values. The problem with Teradata and DB2 Parallel Edition is that the cost of license is significantly higher than NoSQL databases, which are typically open-source software with only support fee costs.
Keep in mind that the most popular NoSQL database, which is MongoDB, is not Open Source. It is free to use, which does help with its popularity somewhat.
2:50 is such a surprising random indian accent moment
Thank you for a fantastic overview! Appreciate it, well done!
just as nosql servers might have a sql layer, so have sql servers like postgres already the ability to partition or have dynamic data in json blobs (including a subset of sql to query it). not sure how powerful that is however compared. also, your queries only double if you keep it in the pk range; otherwise they dont as each partition has to perform your query, i would suppose
I've been using NoSql databases for a while but could never understand what the partition key was or how to use it, but your video has finally explained it, thanks!
Excellent information and presentation. Thank you!
@3:30 a hash is used rather than just the primary key's range directly because the hash will always partition evenly (i.e. the magnitude of the computed hash is randomly spread throughout the range)
If it is a good hash.
True - This worked very well in the mid-1970's when computer resources were extremely limited / expensive. I designed a simple hash within a custom access method for storing & retrieving all medical & dental claims history, as part of a software package for our client, for Equitable Ins. The file size was 16X IBM's max at that time, even after data compression & spanning multiple hard drives! So part of the hash was which of the 16 physical files the data was in. In testing we achieved an average of 1.07 physical (system level) read requests per logical application request -- before a cache was added! Code was also tight -- about 50 Assembly Language instructions to calc physical record # (or block), and about 70 AL / machine instructions within the IBM access method (BDAM) to convert our record # to CHR (Cylinder / Head / Record). This was still in use in the 1990's.
I have been doing this for over 30 years with relational db's. We have never used fixed foreign keys (db managed ones), but always done this ourselves inside the code. This means our relational db's have been stand alone tables (with indexing) since the start and all queries we do just link the keys to get the data. Still scaling is an issue, but we can split chop, remove data into archives/servers at will and the db doesnt break.
Foreign keys are one of those things that SQL offers to maintain data integrity, but they have a performance cost. They are best used in development and test so that when the code violates data integrity, there is a noisy failure; but once the code is working properly, it should be left out of the production environment.
Thanks bro, you've made it easy to understand.
Thanks for explaining .Simple and to the point .Subscribed !
Simple explanation, perfect.
excellent, clear, highlighting and unveiling key examples, in a powerful and simple fashion
Nicely explained! Thank You.
That's what I'm looking for, I want to store non consistent entities in my database, I'd have to make many relationships between tables and many queries to construct the object of an entity, not to mention the complexity. Thaks a lot, now I know how to make the things easier.
Brilliant explanation, thanks! Earned a sub
Ouf... There are so many debatable stuff in there, I don't know where to start. Let me just say that NoSQL databases have lot more disadvantages than what is explained here. The main one being that they are very slow in input (write) operations, making them unusable for most real life use - no supermarket chain or telecomm company would use a NoSQL database to capture real time transactions. Their market acceptance, which is still very low, testifies that NoSQL databases are a niche product.
Yes, I agree with that. This video is actually very misleading. And people upvoting it should really spend more time understanding the difference between the two models. This video basically shows "problems" with SQL database versus "advantages" of No_SQL database... Like Robert said, there are WAY more disadvantages to the NO-SQL database and you will face a lot of different issues with different problems that SQL database can solve.
@@livelaurent Same feeling. I watched it to the end and was like "well? what are the drawbacks of NoSQL? Everyone would have used them if there were none"
Apache Kafka
@@ArneChristianRosenfeldt It's not a database; Kafka is essentially a pub / sub framework.
THIS. the hesitance of adopting NoSql is not without a reason.
Super helpful explanation. Thanks dude
really like your videos, I have a request that make a video on OOP or Polymorphysum specially
So simply explained that I understood it! Many thanks ;)
And just like that you gained a new subscriber. Thanks for the excellent video.
About NoSQL consistency: the problem is not "eventual consistency" of mirrors or replicas - relational databases also have mirrors, replicas, stand-by copies and whatnot. The problem is that _logical_ consistency between different pieces of data is now completely applications' responsibility, because there are no referential constraints, chack constraints and other consistency checks performed by the database server itself. Several more questions worth exploring: - How do NoSQL databases handle atomicity of complex transactions, various isolation levels etc? Is the answer "they don't, it is too resource-intensive", or something else? - How do NoSQL databases (I would prefer to call them data stores) compare to ancient index-sequential files? - How do schema-less databases handle update-intensive loads? - How do they compare to schema-constrained, but very fast "index plus physical address" database systems such as ancient CODASYL? I have an impression we are reinventing the wheel, yet again.
2 & a half minutes, and I know what NoSql is. Thanks video creator.
Perfect video! Helped me so much!
Wonderful Explanation. I loved it.
This video is so good that I have watched a few times and also referenced it during my paper in college.
Thank You for providing Clarity.
Thanks .. I definitely learn new things from your videos
Precise, Simple Explanation, Adorable. Keep Posting
Very intuitive explanation - thank you!
Thank you, that was a great primer.
Such a great explaination thank you so much..and its a request sir kindly make a vedio on what is data warehouse and what is data mining in detail.
Thanks for this simple explanation.
Thanks for the wonderful explanation.
Concise and interesting, thanks!
So I am at 1:37 and "not only SQL" was not yet mentioned. This picture of the buildings reminds me of the " New Testament ". How am I supposes to follow this metaphor? I mean, network hardware was insanely overpriced for some time, but today? Mainframe CPU runs at 5 GHz, commodity CPU runs at 3 GHz. There is no "vertical" anymore. So your database uses a RAID -- uh that is horizontal right there.
Was looking for an explanation regarding NoSQL and this was the best one I've found. Straight to the point and complete. Congrats! Subscribed and looking for more videos in your channel :)
Nice video! Can you do a video about Kademlia distributed hash tables?
Thank you very much. I was struggling to understand NoSQL, and this video not only helped me understand it but also answered several questions I had.
Simple and straight forward explanation of NoSql. Looking forward to more videos...
Interesting, short and precise.
Very informative. Thank you so much.
What a wonderfully simple take on the subject. Thanks for sharing and making!
Not only simple, sometimes dead wrong and inaccurate. See comments
Worth every minute watching this video :)
Wonderful & easy-to-understand video which really shows that you've mastered this area of technology.
The comments on this video make me very happy. Lots of people calling out the inaccuracies and bullshit in this video. Shows that there are still many people out there who actually understand databases and tradeoffs.
The video is actually very good in explaining NoSQL databases. I like SQL databases and after watching this video I still prefer SQL databases and don't see many advantages using NoSQL but many drawbacks.
Hate to disappoint you, but most of the bullshit in this comment thread is coming from the RDBMS proponents. It's pretty clear that none of they folks have actually worked with a NoSQL database in recent times have any idea why they exist.
@@sguillory6 So enlighten us, why do they exist? I mean, other than letting programmers be lazy so they don't have to bother learning things like proper database design or bother thinking about things like performance.
Awesome explanation! Thanks a lot for sharing. Thumbs up and Subscribed.
Very nicely explained - I learned a lot as IT admin.
Thank You for the awesome video, Very clear explanation regarding NoSQL.
It's a very nice video to better understand the NoSQL database concept.
Awesome explanation
I'm happy to have learned that both concepts of DBs have their legitimacy.
precise & short - the perfect explaination
Mongodb can filter by field and also by nested fields, has aggregation pipelines. Has fast writes due to LSM tree.
Best explanation ever. Just WoW
Precise . Illustrations at every point and even someone without DB experience can understand easily. As the name suggests ... Simply Explained.
very nice explanation, thank you for the info :)
I thing that was "glossed over" was the initial size of the "key partition" which of course is related to the algorithm used to create the key based on the data field. If the key was too small, a new record could create a "collision" and the key would have to be split, effectively making it a sequential search. If the key was too large, there would be a lot of wasted space in the key partition.
Nicely explained.
Great explanation, got the gist in few minutes!
This video is brutal, thanks brother!
Well explained. Thanks!!
excellent understanding
best explanation I've seen yet
I have a question, when tha data stored in the data storage becomes large and humongous in the multiple servers, will there be any issues when the users try to collect data maybe from the data in the early servers.and when users update data, will it cause any problems?
That's was really helpful. Thanks.
Very clear, thank you!
Have been going through KZhead, watching everything to try to understand the topic. You are the first one to succeed. Brilliant explanation. Thank you!
Super explanation!
Thanks for explanation But : how a nosql DB is any different of a filesystem ?
Nice explanation can you share some practical videos where you are demonstrating the use of these NO SQL tools
You got me on the NoSQL
Very insightful. Thanks.
Nice video, great explanation. In my opinion, they are to different types of databases for different purposes. Cheers
Great video simply explained
A lot of hate to NoSQL in comments 😂 I was looking for an efficient way to implement a shipping system, the problem was in storing some data and I found that using NoSQL is the easiest, fastest and efficient method to do that. SQL and NoSQL are both a "powerful tools" to achieve goals, stoo comparing them and enjoy them both.
So how does rhe db onow whether it needs to teturn the Macbook or the hand sanitizer if they both have the same hash?
Great explanation, thank you so much! After watching so many videos and getting thoroughly confused, you helped clear it up. Thanks again.
Could you please make a video on Wide column vs column family vs columnar vs column oriented DBs with some examples?
Sir kya Google Play Store se video recorder app se video record karte hai kya vo video app developer ke pass bhi save hota hai kya Please reply.....
That's literally the best ever video on explaining "NoSQL", so simply. Thanks! 💯
Great explanations, hope you can provide some examples of when to use Sql and NoSql.
i think nosql is only for gafams really, perhaps scientist with huge databases trying to train IAs also, things like that but other people I'm pretty sure don't deal with that much data to justify it
Looks like another example of a question with a simple and easily understandable wrong answer.
Thanks. Short and precise. Nice work!
Ive been working with this NoSQL since 2007 and completely happy about it. However I actually used it on an MySQL database engine, because back then NoSQL DB is not yet existed. Kinda crazy to think of it, eh ...
the moment I realized this is animated in ms powerpoint completely threw me off haha. great explanation!
that was a great lecture. Quality communication!
You should do the same thing regarding HANA Databases
You can easily design software to scale SQL over multiple servers. Have customers in one DB, products on another, and orders on another. The advantage is easy searching.
Easy to understand. 😍
You would need to hash to a pointer not the actual data itself, otherwise it would be too inefficient. Foreign Keys would need to be hashed that way as well, but you lose the neat splitting across partitions. Or else then you would have to duplicate data.... On top of all that you would need some conflict resolution on the hashing function for keys that produce the same hash result. So it is not quite so simple. All in all it is all swings and roundabouts. You need to look at various combinations of warehousing options to suit your needs. As would be the case for all applications, you need to look at the best alternatives (or combination of alternatives) to achieve the scale and performance to produce the desired results.
a hash key collision sounds like a super rare occasion.
Pick works quite well with hashing to a group which then stores the whole of the data item, with multiple items in a group just stored one after the other in the group, linking extra frames from the overflow spare frames of disk space. A file with 23 groups is guaranteed to have hash collision with 24 items stored in it...
Me: *watch this video then go to google* "What database does Facebook Use?". Google: "MySQL". Me: "Yep I'm good with relational databases then."