Tursio’s Post

View organization page for Tursio, graphic

1,357 followers

T-13 DAYS 𝑭𝒓𝒐𝒎 𝑯𝒂𝒅𝒐𝒐𝒑 𝑴𝒂𝒑𝑹𝒆𝒅𝒖𝒄𝒆 𝒕𝒐 𝑪𝒍𝒐𝒖𝒅 𝑫𝒂𝒕𝒂 𝑾𝒂𝒓𝒆𝒉𝒐𝒖𝒔𝒆𝒔 The rise of big data in the 2010s was rooted in the belief that big data is not really a database workload. It involves processing large volumes of semi-structured, or even unstructured data, that traditional databases are not suited for. The solution people came up with was MapReduce and its open-source implementation, Hadoop, where non-expert users could write imperative programs and let the system scale them embarrassingly. Hadoop was for processing existing data, like how generative AI is for generating new data. Hadoop gained a lot of traction, offering an alternative to traditional databases to process data with ease of use and to process directly from the cloud repositories. Over time, however, Hadoop became more like a database than people had imagined. Projects like Hive, Impala, and Spark introduced database techniques like declarative query processing, query optimization, data layouts, indexing, partitioning, and so on. Hadoop evolved from MapReduce engine to data lake platform, to modern cloud data warehouses that have the same scalability, flexibility, and ease of use as envisioned in the big data movement. Indeed, we have come full circle with databases absorbing all the goodness of Hadoop and MapReduce.  The best part? Hadoop MapReduce style processing continues to live as workloads in modern databases – ones that have gone through a generational change. Stay tuned for the next post in our countdown series! #GenerativeAI #ExcitingThingsAhead #EnterpriseData

To view or add a comment, sign in

Explore topics