Arindam Banerji’s Post

View profile for Arindam Banerji, graphic

Global Vice President, CTO - Data Sc., ML, LLMs, RAG, DSPy, NLP, Deep Learning (Retail, Supply Chain)

4 Stages of Data Modernization for AI (Concluding) How each stage of next-gen data engineering supports today’s AI.   Summary: Changing needs of modern AI application forces us to re-look at how we do data engineering. Data engineering today must be re-shaped to enable knowledge creation & reasoning engines, without giving up the operational and semantic needs of traditional insight generation. From part 2 - the structure of this data engineering shift is a set of stages with each stage addressing some specific needs/gaps of AI enablement: 1.     Trusted Actionable Insights 2.     Traditional ML for qoq revenue/profitability 3.     LLM-apps, Vision-products etc. 4.     Multi-component inference, Agents & Systems Intelligence & Ops artifacts that are added to each stage of Data Engineering Each stage needs specific add-on components to enable the kind of semantic intelligence and operational effectiveness necessary for the modern array of AI apps. These add on mechanisms, when aggregated, is called Data-Intelligence-Ops. (see Graphic in attached paper). Formally, DataIntelligenceOps is an abstract set of operations meant to increase (a) semantic intelligence (b) operational intelligence & (c) governance abilities of data. It builds on top of existing investments in data-lakes, cloud-EDW, dbt automation, ELT, feature-stores etc. The main architectural artifacts are: · Semantic Intelligence Enhancements: a broad set of components for complex data products, which can be aggregated or configured through a low code IDE. · Connected DataOps: a connected DataOps architecture that “causally” ties together observability, lineage, storage/gov/sec-Ops, programmable pipelines, data contracts – to create an embedding layer for the above intelligence enhancements. Implemented as a full-featured knowledge graph that captures data platform wide meta-data. · Governance as Code enablement: Governance DAGs embeddable within pipelines allow for governance simplification, as well as policy implementations to be seamlessly executed. The effect of DataIntelligenceOps is to enhance the “intelligence” of a firm’s data, thus facilitating today’s AI apps. Parting thoughts:   1.     AI apps are rapidly increasing in complexity and capability – so, old boundaries of data engineering do not apply. 2.     The way to enable this AI led shift is to move to a modern style of data engineering that systematically adds semantic and operational value in 4 different stages of maturity. 3.     In many cases firms will choose to skip a stage to move faster and nothing prevents that. 4.     Existing building blocks such as ingestion mechanisms, pipeline tools, cloud EDW etc., remain unaffected – this is not a rip n’ replace design. 5.     Data engineering must now support knowledge enoblement, reasoning engines & qoq AI ROI. Paper - https://lnkd.in/gqG25drN

  • graphical user interface, text, application

To view or add a comment, sign in

Explore topics