Cloud Computing

Google Cloud Speech API Improves Longform Audio Recognition -- Adds New Language Variants

Cloud Speech, Google’s neural network powered speech-to-text API, provides an easy way for developers to recognize and convert audio into text. The service also enables this conversion to happen in real time and in environments with background noise.



Sponsored Content

What is “Inside Microsoft Teams”?

“Inside Microsoft Teams” is a webcast series, now in Season 4 for IT pros hosted by Microsoft Product Manager, Stephen Rose. Stephen & his guests comprised of customers, partners, and real-world experts share best practices of planning, deploying, adopting, managing, and securing Teams. You can watch any episode at your convenience, find resources, blogs, reviews of accessories certified for Teams, bonus clips, and information regarding upcoming live broadcasts. Our next episode, “Polaris Inc., and Microsoft Teams- Reinventing how we work and play” will be airing on Oct. 28th from 10-11am PST.

With the most recent updates to the service, Cloud Speech can now be used to transcribe files up to three hours long, a 44.4 percent increase from the previous maximum file length of 80 minutes. If an application should require support for files longer than three hours, users can apply for a quota extension via Google’s Cloud Support. However, these extension requests will be granted on a case-by-case basis. While this particular update likely will not be used by everyone, it can certainly be beneficial to those who may have to transcribe longer audio files, like companies that offer transcription services.

Additionally, Google added support for 30 additional language varieties to Cloud Speech, which already supported 89 different languages prior to the update. Given the reach of technology today, this can open up voice and audio services to more people around the world. A complete list of the languages supported by Google Cloud Speech is available on the service’s “Language Support” page.

Google also unveiled new word-level timestamps, which enable users to jump to a particular moment in an audio file where the associated text was spoken. Conversely, this feature can also be used to display the associated text while playing back the audio. According to Google, this was its most requested feature. By adding this functionality, developers can allow users to quickly find the information they are looking for, especially when it comes to longer audio transcriptions.

Whether it is a voice-powered personal assistant or an automated phone menu, more devices and services are adding the ability to record and interpret audio and speech. Given this, it is important that services like Google Cloud Speech continue to iterate and improve in order to provide the best experience possible for both developers and users alike.

Related Topics:


Don't have a login but want to join the conversation? Sign up for a Petri Account

Comments (0)

Leave a Reply

External Sharing and Guest User Access in Microsoft 365 and Teams

This eBook will dive into policy considerations you need to make when creating and managing guest user access to your Teams network, as well as the different layers of guest access and the common challenges that accompany a more complicated Microsoft 365 infrastructure.

You will learn:

  • Who should be allowed to be invited as a guest?
  • What type of guests should be able to access files in SharePoint and OneDrive?
  • How should guests be offboarded?
  • How should you determine who has access to sensitive information in your environment?

Sponsored by: