Google Data Fusion
Google Cloud Summit Madrid

Google Data Fusion

By:  Enrique Sola Gayoso , Big Data consultant at Bosonit.


We are thrilled to bring you the latest updates from BosoTrends! In an exciting development, our team had the incredible opportunity to visit the renowned Google office and enjoy the Google Summit with Carlos de Antonio, immersing ourselves in a world of innovation and cutting-edge technology. This unique experience allowed us to gain valuable insights, forge new connections, and further expand our knowledge in the ever-evolving realm of  Google Cloud . We are delighted to share our highlights and explore the fascinating intersection of our Bosonit expertise with the pioneering environment of Google.

About Data Fusion

Data analytics poses a significant challenge due to the scattered nature and varying formats of data. This often requires multiple integration tasks to be completed before valuable insights can be derived. Data Fusion addresses this challenge by providing a comprehensive solution for enterprise data integration, encompassing ingestion, ETL, ELT, and streaming. With an execution engine optimized for SLAs and cost efficiency, Data Fusion simplifies the lives of ETL developers, data analysts, and data engineers working in Google Cloud, Hybrid Cloud, or Multi-Cloud environments. It serves as a centralized hub for all data integration activities, enabling streamlined and efficient data processing.

Data Fusion in Google Cloud is a powerful service that enables organizations to integrate, transform, and analyze data from various sources in a unified and scalable manner. With Data Fusion, users can build data pipelines and workflows to efficiently ingest, process, and manage data, regardless of its format or location.

One of the key benefits of Data Fusion is its visual interface, which allows users to design data integration and transformation flows using a drag-and-drop approach. This intuitive interface eliminates the need for complex coding and enables data engineers and analysts to collaborate effectively in building data pipelines.

Data Fusion supports a wide range of data sources, including structured, semi-structured, and unstructured data, enabling organizations to handle diverse data types such as relational databases, CSV files, JSON documents, and more. It also integrates seamlessly with other Google Cloud services, such as BigQuery and Cloud Storage, for efficient data storage and processing.

By leveraging Data Fusion, organizations can accelerate their data integration processes, reduce development time, and improve operational efficiency. The service provides built-in data quality, validation, and transformation capabilities, ensuring data accuracy and consistency throughout the pipeline. It also supports real-time data processing, enabling organizations to make faster and more informed decisions based on up-to-date data.


Data integration capabilities offered by Data Fusion include:

  1. Optimized analytics and accelerated data transformations: Data Fusion enables efficient data integration, enhancing the speed and effectiveness of analytics and data transformations.
  2. Broad range of connectors and formats: With support for over 200 connectors and formats, Data Fusion allows you to seamlessly extract and blend data from diverse sources, empowering you to work with a wide variety of data types.
  3. Visual pipeline development: Data Fusion provides a visual environment for developing data pipelines, improving productivity and ease of use.
  4. Data wrangling and collaboration: Data Fusion offers data wrangling capabilities to prepare and operationalize data, facilitating collaboration between business and IT teams.
  5. REST API for pipeline management: You can leverage the extensive REST API to design, automate, orchestrate, and manage the lifecycle of pipelines, enabling streamlined management and control.
  6. Support for various data delivery modes: Data Fusion supports batch, streaming, and real-time data delivery modes, making it a comprehensive platform suitable for both batch and streaming-related use cases.
  7. Operational insights and optimization: Data Fusion provides operational insights to monitor data integration processes, manage SLAs, and optimize integration jobs, ensuring efficient and effective data processing.
  8. Unstructured data parsing and enrichment: Data Fusion offers capabilities to parse and enrich unstructured data using Cloud AI, enabling tasks such as converting audio files to text, sentiment analysis with NLP, extracting features from images and documents, and converting HL7 to FHIR formats.


Data Fusion's data consistency features empower businesses to make confident decisions by ensuring reliable data:

  1. Structured transformations and data quality checks: Data Fusion mitigates the risk of errors by offering structured methods for specifying transformations and performing data quality checks using the Wrangler tool. Predefined directives further enhance data consistency.
  2. Data observability for quality identification: With Data Fusion, you can track data profiles during the integration process, enabling you to identify and address quality issues. This data observability empowers informed decision-making based on the health and reliability of the data.
  3. Handling data drift and change: As data formats evolve over time, Data Fusion assists in managing data drift. It detects changes in data formats and provides customization options for error handling, ensuring consistent and accurate data processing despite variations.
  4. Metadata: You can collect technical, business, and operational metadata for datasets and pipelines and easily discover metadata with a search.


The benefits related to Data protection are:

  1. Secure access to on-premises data: Data Fusion enables secure access to on-premises data through private IP connections, ensuring the confidentiality and integrity of data during transmission.
  2. Encryption for data at rest: By default, Data Fusion encrypts data at rest, providing an added layer of security. Additionally, users have the option to utilize Customer Managed Encryption Keys (CMEK) to maintain control over data encryption across supported storage systems.
  3. Data exfiltration protection: Data Fusion offers data exfiltration protection through the use of VPC Service Controls. These controls establish a security perimeter around platform resources, preventing unauthorized access and enhancing data security.
  4. Integration with Cloud Key Management Service (KMS): Sensitive information such as passwords, URLs, and JDBC strings can be securely stored in Cloud KMS. Data Fusion also supports integration with external Key Management Systems, ensuring robust key management and protection.
  5. Integration with Cloud Data Loss Prevention (DLP): Data Fusion seamlessly integrates with Cloud DLP, enabling advanced data protection capabilities. Users can leverage Cloud DLP to mask, redact, and encrypt data in transit, safeguarding sensitive information from unauthorized disclosure.

Here is how to use Cloud Data Fusion.

I have personally embarked on a journey to prepare for the Google Cloud Professional Certification. As I delve into the intricacies of the Google Cloud platform, I will be sharing my progress, study tips, and resources in upcoming newsletters. Join us as we explore the highlights of our visit to Google and my preparations for the Google Cloud Professional Certification.

Stay tuned for an enlightening edition filled with insights, industry trends, updates from our team's visit to the Google office, and my journey towards achieving the Google Cloud Professional Certification.

Sara Geovanna Alvear Muñoz

Desarrolladora Web | Community Manager

1y

Fue una gran experiencia; me hubiera encantado asistir 😊

Like
Reply
Santiago Urizarna Varona

Calidad y Comunicación en ASPRODEMA-RIOJA

1y

La verdad es que un desarrollo de este tipo es muy interesante, especialmente en organizaciones de servicios a personas que pretendar ofrecerles servicios individualizados. Multiples tipos y fuentes de datos, formatos,...

Like
Reply
Celia Lozano Grijalba

Head of Data & AI at Bosonit ¦ Data Scientist ¦ PhD

1y
Like
Reply

To view or add a comment, sign in

More articles by Bosonit

Insights from the community

Others also viewed

Explore topics