Beyond Text-to-Speech: Decoding Samsung's Research Journey to Live Voice Translation

Beyond Text-to-Speech: Decoding Samsung's Research Journey to Live Voice Translation

Samsung, a titan in technology innovation, has long recognized the indispensable nature of seamless communication in its global strategy. The introduction of the Samsung Live Translation Feature is a testament to the company's commitment to making digital communication more inclusive and accessible.

This feature highlights Samsung’s dedication to enhancing user experience and underscores its role in shaping the future of mobile technology.

By integrating sophisticated translation capabilities directly into its devices, Samsung has enriched its product ecosystem and reinforced its position in the competitive tech market. 

This strategic move enhances user engagement, opens new markets, and ultimately drives brand loyalty by delivering practical, real-world utility at the touch of a button.

The impact of such innovations extends beyond mere convenience. As Samsung continues to refine this technology, it sets new standards for what smartphones can achieve, transforming them into not just tools of communication but bridges to a more unified global community.

How does Real-time Voice Translation work?

Samsung's real-time voice translation feature is a sophisticated application of artificial intelligence and machine learning technologies. Here’s a general breakdown of how it works:

Samsung Live Translation: A Timeline of Innovation

2014 - S Translator - a Text and Speech Translation Mobile App

In 2014, Samsung released S Translator, a text and speech translation app on the Galaxy S5. The app assisted users in foreign countries through a conversational mode that quickly converted spoken words between languages. The Samsung App Store provided high-quality language packs for numerous regions, offering broad global coverage. 

Source: Android Central

This application marked an early step in Samsung's research and development of text-to-speech and translation technologies.

2016 – Development of existing S-voice powered by AI

Samsung was working towards video-to-chat transcribing and automated interpretation, as is evident from their patent filings. Exemplary to the subject are US9961294B2, which aims to generate subtitles adaptively in environments with excessive or low ambient noise to ensure optimal user experience, and US10599784B2, which focuses on automated interpretation along with a machine translation method in real-time scenarios.

However, based on their research, they were looking for luck with products. Thus, they acquired Viv Labs, the intelligent interface to everything, for 238.93 billion won. The founders of Viv Lab were the actual co-founders of Siri by Apple, and this acquisition was more about bringing a product of the same.

Viv has developed a unique, open artificial intelligence (AI) platform that allows third-party developers to use and build conversational assistants and integrate a natural language-based interface into renowned applications and services.

2017: Bixby Introduction and Innoetics’ Acquisition

On March 20, 2017, Samsung introduced Bixby alongside the Galaxy S8 smartphone. Initially, Bixby's capabilities were limited compared to established competitors. It offered basic voice commands, image recognition, and contextual awareness within Samsung's ecosystem of apps and services.

In Jul, Samsung acquired Innoetics for under $50 million. Innoetics has developed text-to-speech and voice-to-speech systems. Their technology reportedly includes the capability to analyze a person's voice, train on what that person is saying, and then read out a piece of entirely unrelated text in that same voice. 

2018: Expanding Language Support and Capabilities

Samsung focused on expanding Bixby's language support and improving its natural language processing abilities.

Samsung launched the first of its kind, Bixby Voice, thus bringing years of research work to life.

In 2018, they focused on developing a method for natural language generation. Exemplary to the subject is US11100296B2, which aims to provide an efficient and accurate way to generate natural language responses by utilizing latent variables, attention information, and neural network models.

Also, patent filings by Samsung in the AI domain have seen a jump from 2018 onwards.

The company also introduced Bixby 2.0, which featured enhanced conversational abilities and deeper integration with third-party apps. It was said to be able to recognize different people speaking to it. 

2019: Bixby Marketplace and Developer Tools

Samsung launched Bixby Marketplace, which allowed developers to create and users to download Bixby Capsules—similar to apps—to extend Bixby's functionality. This year also marked the introduction of Bixby Templates and Natural Language Categories, which simplified the creation and discovery of these capsules.

Further, Samsung collaborated with Professor Yoshua Bengio to expand its SAIT AI Lab into the Montreal Institute for Learning Algorithms (MILA). This lab aims to strengthen Samsung’s fundamentals in AI research and drive competitiveness in system semiconductors.

Samsung and Professor Yoshua Bengio have patented a speech recognition technology (US11282501B2) that improves accuracy across different dialects. 

The invention combines a parameter generation model with a dialect-trained speech recognition model, enhancing performance in interpreting diverse accents and linguistic variations. This advancement aims to make voice-controlled devices more accessible and reliable for users from various linguistic backgrounds.

2021: Enhanced Natural Language Processing

Bixby received significant upgrades to its natural language processing capabilities, allowing for a more nuanced understanding of human speech patterns and colloquialisms. This improvement laid the groundwork for future translation features.

Further, Samsung filed a patent, KR20220170330A, that describes a text-to-speech model. 

2022: Multimodal Interactions and Improved Voice Recognition

Samsung introduced multimodal interactions, allowing Bixby to process and respond to voice, touch, and visual input combinations. The company also improved Bixby's voice recognition accuracy across various accents and dialects.

Samsung was researching text-to-speech models. For example, this research paper explores an efficient acoustic model that provides simple but consistent control over voice pitch and speaking rate while still generating high-quality synthesized speech from text.

Moreover, Joon Hee Choi was hired back as director, leading various AI projects and an AI-ML team of 7 members. Joon Hee Choi was previously at Qualcomm and has AI and LLM model expertise.

Reports also suggested that they are bringing in real-time call transcribing.

2023: Bixby Text Call and Tease Real-Time Translation Feature

In Feb 2023, Samsung announced the Enhancement of Bixby, such as Bixby Text Call in English, allowing users to respond to calls by typing a message, which Bixby then voices to the caller. Samsung also introduced a custom voice creator for Bixby, enhancing the personalization of interactions.

Source: Samsung

In Nov 2023, Samsung teased a real-time translation feature, showcasing Bixby's potential to break down language barriers in real-world scenarios.

2024: Launch of Bixby Real-Time Translation

In Feb 2024, Samsung introduced a real-time translation feature for its Bixby voice assistant, adding a new capability to the voice technology landscape.

Samsung’s Galaxy AI seems to be a fused version of text-to-speech research with Bixby’s real-time call transcribing as a feature, as is evident from their press release stating, “Galaxy AI is Integrated with Bixby, Making the Power of Mobile AI Easier than Ever to Access.”

Tracing Bixby's Live Translate Call Technological Advancements

To understand how Samsung achieved such remarkable progress with Bixby, particularly in real-time voice translation, one must examine various sources of information:

1. Research Papers:

Samsung Research, Samsung's research arm, regularly publishes papers on artificial intelligence, natural language processing, and machine translation topics. Analyzing these publications allows one to gain insights into the underlying technologies powering Bixby's advancements.

2. Patent Filings:

Patent applications and grants provide a wealth of information about Samsung's R&D efforts. These documents often contain detailed descriptions of novel technologies and methodologies that may be incorporated into Bixby.

3. Strategic Collaborations:

Samsung's partnerships with academic institutions, research organizations, and technology companies can offer clues about the direction of Bixby's development. Collaborations in neural machine translation or speech recognition may have directly contributed to the real-time translation feature.

4. Key Hires and Acquisitions:

Tracking Samsung's hiring patterns and acquisitions in relevant fields can provide insights into the company's focus areas. For instance, acquiring a startup specializing in real-time language processing could indicate a strategic move to enhance Bixby's capabilities.

5. Developer Documentation and APIs:

As Samsung expands Bixby's capabilities, developer documentation and API changes can reveal new features and functionalities being added to the platform.

6. Conference Presentations and Tech Demos:

Samsung often showcases its latest technologies at industry conferences and events. Presentations and demonstrations related to Bixby can offer glimpses into upcoming features and improvements.

Conclusion

Samsung's transformation in AI, from Bixby to Galaxy AI, is a remarkable achievement. This journey, marked by years of dedicated research and development, strategic partnerships, and innovative thinking, has positioned Samsung at the forefront of AI assistant technology.

Partnering with a technology intelligence firm like GreyB can provide invaluable insights and strategic guidance for companies seeking to understand and potentially replicate Samsung's success. By leveraging expert analysis of patents, research papers, and market trends, businesses can gain a competitive edge in the rapidly evolving landscape of voice assistants and AI-powered translation technologies.

As the field advances, staying informed about the latest developments and understanding the underlying technologies will be crucial for companies looking to compete. Whether you're a technology firm, a research institution, or an investor, the expertise provided by technology intelligence consultants can help you navigate this complex and fast-paced domain.

Contact a consultant today to learn more about how GreyB's technology intelligence services can support your research and development efforts in voice assistant technology. Gain the insights you need to drive innovation and stay ahead in the competitive world of AI and language processing.

Author: Mayank Maloo


Brett Sherman

Helping R&D, Innovation & Product Development teams Improve and Launch New Products, and Packaging through Custom Technology Landscape Analysis, Triage, & Scouting. Helping IP teams w Searches, FTO, Infringement & more

2w

👍👍

To view or add a comment, sign in

More articles by GreyB

Insights from the community

Others also viewed

Explore topics