Damn boy that’s some big data

Damn boy that’s some big data

The phrase “big data” is being thrown around a lot these days. So what is big data? There are two main pieces: the actual raw data itself (which as the name suggests is big…more on that in a bit), and the techniques used to analyze that data. Apparently “Big Data” is also the name of some music remixing group if you’re into that sort of thing.

Even though the name big data is pretty stupid, it captures the idea perfectly. Big data is exactly what it sounds like: data that is absolutely massive – so massive that a human can’t comprehend it, much less make any use of it. If you even try, it will blow your mind. You have been warned.

Think of an Excel spreadsheet (or maybe Google Sheets for younger readers). Usually if you’re making some kind of basic chart or list you only use a few rows and columns. But if you scroll really far to the right the rows go on seeminlgy forever. You get rows W, X, Y, Z, AA, AB and so on for a long ways; similarly if you scroll down the numbers keep growing and growing (they are indeed capped). If every single one of those boxes was filled, you’ve got an example of big data. Once data gets this big then storing it or using it is really difficult, so there are special techniques which have been created to help us do all those things. These are simply known as “big data techniques,” but oftentimes (like 98% of the time as far as I can tell) these techniques are confusingly also called “big data.” So as I mentioned earlier, the term big data refers to both big datasets and the tools/techniques used to analyze those datasets.

It’s important to note that there are no special properties about these large datasets other than their size. The size can come in two flavors: data points (columns) and data entries (rows). Tax season just ended, so let’s use that as an example. All the different pieces involved in your return – income, amount already paid, amount owed, charitable contributions, number of dependants and the other billion things – those are data points, or different columns. Your entire return is a single data entry; your neighbor’s is another, and everyone in the US (who actually pay taxes) are all individual data entries. So lots of data points or lots of data entries is what makes regular data into big data.

In this series I am assuming the data in question looks like a spreadsheet…but that is not always the case. Data which can be broken down and stored in a spreadsheet style is oftentimes called “structured data,” while “unstructured data” is any data that isn’t broken into nice rows and columns. Processing unstructured data also falls into the big data realm…but that is all the more I will say about that. Also, I only used Excel as an example…most folks working in the big data field are using Excel. They are using much more advanced tools (like R, Hadoop, SQL, etc.) which I won’t get into in this series, I just wanted to mention it. In fact, once data gets really big you couldn’t even it open it in Excel (since there is only enough room for a little over a million entries anyway).

Ok now that we have our bases covered, let’s continue. If you haven’t yet paid your taxes you should probably get on that shit. If you have, read on for more big data info!

Written By

admin

Leave a Reply

2 comments

×