Clinton Jones’ Post

"Features seldom used or undiscovered are just unclaimed technical debt" I engage on Software Engineering and all things #ProductManagement

3mo

Why did the data catalog breakup with it's partner? because it couldn't handle so many relationships... on a serious note though, how many entities can your catalog responsively support? how many do you need to support, how does your catalog scale? Alex Solutions supports tens of millions, quite frankly, if the infrastructure is big enough to support the knowledge graph in all its glory, the limits are probably relatively boundless. but it remains an important question. Consider that a table is just one entity, if it has ten attributes it just jumped to 11 entities and those 10 attributes all have a relationship with the table. Your database now contains at least 21 records. Start adding views, stored procedures, ETL and reporting applications, it very quickly exploded to hundreds and then thousands and then millions of entries. Start adding data people, controls, technology describers and business processes, quality measures, KPIs and metrics and it extends further. Can your catalog adequately serve up the answers you need in this context?

To view or add a comment, sign in

More Relevant Posts

Sam Lodaria

Director, Business Integration & Transformation | Driving Successful Digital & Business Transformation | MIT Certified CDO & Transformation Professional | AI and Digital Solutions | Strategic Partner & Advisor
6mo
Report this post
In a sea of data, integration is your lifeline. Here's why. Knowing how to combine different data sources effectively is a superpower in a world where data is king. 1) ETL (Extract, Transform, Load): → This is the classic method where data is first extracted from various sources → It is then transformed (like cleaning or formatting), and finally loaded into a database or a data warehouse. 2) Data Warehousing: → This involves gathering data from multiple sources into a single, comprehensive database for better analysis and reporting. 3) Data Virtualization: → Imagine being able to view and analyze data from different sources without having to move or copy it → That’s what data virtualization does – it creates a virtual layer that allows you to access and work with data from various systems in real-time 4) Middleware Tools: → These are software applications that help different programs communicate with each other → They ensure that the data from one application can be read and used by another. Understanding these techniques is crucial in our data-driven world. It helps us make better decisions based on comprehensive information. What techniques are you using to integrate your data? ________ P.S. Need help with your Digital Transformation journey? Check out my featured section to set up a 1:1 Call. P.P.S. I post daily at 7.15 am ET. Like this? Please Repost ♻️ so the community can benefit.
1 Comment
Like Comment
To view or add a comment, sign in
P JAYANTH

Informatica Developer | Specializing in Data Warehousing, ETL, and SQL | Experienced in Healthcare Domain | IICS | AZ -900
6mo
Report this post
why transformation is important in ETL....? -- Part 2 Business Logic Implementation: Transformation allows the implementation of business rules, calculations, and logic to derive meaningful insights from raw data. This may include aggregations, calculations, filtering, and enrichment to transform raw data into actionable intelligence. Normalization and Denormalization: Transformation processes can normalize or denormalize data structures based on the requirements of the target system or analytical workload. Normalization reduces redundancy and improves data integrity, while denormalization optimizes query performance and simplifies data retrieval.
Like Comment
To view or add a comment, sign in
Toochukwu Obasi

Data Engineering | Robotic Process Automation | Analytics Solutions Developer
1mo
Report this post
What is a data warehouse. Really? Is it just the storage point of the different data sources across an enterprise or a sum of all the moving parts that make it possible? For the end user, who only sees data-at-rest, the lineage of data that were Extracted, Transformed and Loaded (ETL) to form the data is inconsequential; until there is a breakdown of the continuous integration/development (CI/CD) processes. That raises the question of points of failure. Several of the resources we depend on today are often reliant on a few points of failure, some of which are highlighted in red in the diagram below. Such that when they collapse for whatever reason, the scramble to fix begins. A few ways to handle these failures can be 1. the simply "after a few seconds, try again, pretty please automation; preferably with the same parameters" 2. Don't fail at all, by ensuring to optimize resource calls below throttling/unhealthy queuing thresholds 3. Windowing batch calls to the smallest possible size with latency in mind 4. Scaling resources to exceed requests 5. A few others While there are some points of failures even as far as the libraries used in open source software which are completely out of our control. We can ensure a 99.99% uptime for up-to-date data for business and technical users with just a few of these little techniques to ensure reliability. How are you minimizing CI/CD failures?
1 Comment
Like Comment
To view or add a comment, sign in
Carla Gentry

Data Scientist/Contractor/Influencer @ Analytical-Solution | Certified Scrum Product Owner
1mo
Report this post
Whether an ETL vs data pipeline is best for your organization depends on several factors. The characteristics of the data are critical to this decision. Data pipelines are ideal for real-time, continuous data streams that require immediate processing and insight. ETL pipelines, on the other hand, are suitable for structured data that can be processed in batches where latency is acceptable. Business requirements also play an important role. Data pipelines are ideal for use cases that require real-time data analysis, such as monitoring, fraud detection, or dynamic reporting. In contrast, ETL pipelines are best suited to scenarios that require extensive data consolidation and historical analysis, like data warehousing and business intelligence. Scalability requirements must also be considered. Data pipelines offer high scalability for real-time data processing and can efficiently handle fluctuating data volumes. ETL pipelines are scalable for large batch processing tasks but may ultimately require more infrastructure and resources. https://lnkd.in/e_TSf988
Like Comment
To view or add a comment, sign in
Datashift

8,887 followers
6d
Report this post
One of the biggest concerns in core system migration is crystal clear: “Will all our data be accurate in the new system?” Our client had every reason to worry about data quality: 1. New Processes: They were built directly on the contemporary application. 2. Complex Queries: With a vast volume of data points, they needed accurate and consistently structured queries. 3. Data Transformation: Many data points in the new database came from combining and transforming multiple sources. But here’s the good news! 💪 With a relentless focus on data quality and a commitment to continuous improvement, we successfully tackled these challenges head-on. Curious about how we did it? Check out the use case for insights and strategies that can help you in your own data migration journey! 👉 Let’s connect if you have questions! #DataMigration #SystemMigration #DataQuality

Towards One Data Truth Achieving Data Consistency and Trust Through Controlled Migration · Datashift

datashift.eu
Like Comment
To view or add a comment, sign in
Lucas Morse

Solution Architect | I talk about solution design and problem solving.
8mo
Report this post
Troubleshooting data pipelines can be tough due to their complexity. I've been there! I once had a service incident escalate to senior leaders due to my incomplete understanding of the data pipeline. The service was overwhelming users with inaccurate tasks and the business suffered from a prolonged root cause investigation. While I eventually resolved the issue, it took unnecessary stress and time. I learned valuable context along the way, but knowing the answers to the following questions about my data pipeline sooner would have simplified the resolution process: 1. What is the root source of your data? (e.g. Data vendor, user input, upstream service) 2. How is the data moved between root source and client service? (e.g. Service bus, FTP, APIs) 3. What ETL processing happens in the data pipeline between the root source and your end service? (Data enrichment, calculations, format conversions/manipulation) 4. What individuals/teams own each step in the data pipeline? 5. What validation, monitoring and alerting capabilities do they have in place? How will you find out about any issues at their component level? Understand the end-to-end journey of your data, and your road becomes much smoother.
Like Comment
To view or add a comment, sign in
Ilan Madjar

Managing Partner and Owner at xLM Solutions LLC
8mo
Report this post
Data cleanup might not sound exciting, but one of our eight "must-haves" for a successful data migration. Read our latest blog post to discover the other seven: https://bit.ly/43Zzyfy #DataCleanup #DataMigration #PLM

Streamlining PLM Data Migrations: 8 Proven Strategies for Seamless Transitions

xlmsolutions.com
Like Comment
To view or add a comment, sign in
Flatfile

6,578 followers
9mo
Report this post
If you want to make solid decisions efficiently with your complex real-world data, a data exchange solution can give your company the functionality and reliability it needs. 📈 But how do you choose the ideal data file exchange solution for your use case and needs? Here are the key factors you should keep in mind: https://buff.ly/3TKJ3fd #datamanagement #dataimport #dev #developers #developertools

Data file exchange checklist: How to choose the ideal solution

flatfile.com
Like Comment
To view or add a comment, sign in
Data Ropes.ai

2,847 followers
1mo
Report this post
As businesses grow, so does the complexity of managing data across platforms. ETL tools play a pivotal role in ensuring that data flows smoothly from diverse sources to meaningful insights. But with the variety of available tools, how do you choose one that fits your needs today and can scale with you into the future? From automation to real-time data processing, selecting the right ETL tool is about aligning technology with your goals for speed, efficiency, and flexibility. In 2024, data integration will continue to be a game-changer for competitive advantage. Explore which ETL tool can support your data journey and drive your business forward.
Like Comment
To view or add a comment, sign in
Pratik Gosawi Pratik Gosawi is an Influencer

Senior Data Engineer | LinkedIn Top Voice '24 | AWS Community Builder | Freelance Big Data and AWS Trainer
3mo Edited
Report this post
Data Warehouse: - Structured repository for processed, filtered data - Optimized for fast queries and analysis - Schema-on-write (predefined structure) Pros and Cons: Data Warehouse Pros: 1. Optimized for fast querying and reporting 2. Ensures data quality and consistency 3. Suitable for business users without advanced technical skills Data Warehouse Cons: 1. Less flexible for new types of analyses 2. Can be expensive to scale 3. Limited to structured data Use Data Warehouse when: - You need consistent, high-performance querying for reporting - Working primarily with structured data - Serving business users who need reliable, easy-to-use data Avoid Data Warehouse when: - Dealing with rapidly changing data structures - Storing large volumes of unstructured or semi-structured data - Needing to keep raw data for undefined future use

6 Comments
Like Comment
To view or add a comment, sign in

5,893 followers

View Profile Follow

Clinton Jones’ Post

More from this author

What's in a name?

Cracking the whip

It's always tea time

Explore topics