Transcription robot created by BR leads global downloads – 07/10/2023 – Tech

[ad_1]

Curious and a card-carrying nerd, Brazilian Jonatas Grosman created the most downloaded audio transcription model in the world. With that, the doctor in computer science surpassed the tools of companies like Facebook, Google and Microsoft.

Downloaded over 71.9 million times on the Hugging Face artificial intelligence code-sharing platform, the model in Portuguese for the Brazilian was created when he was undergoing treatment for lung cancer.

The doctor’s journey through the Department of Informatics at the Scientific Technical Center of PUC (Pontifical Catholic University) in Rio de Janeiro was a long one. As soon as he finished his master’s degree in 2017, his idea was to continue studying natural language processing (NLP).

However, it failed. During the first two years, she changed themes several times. When he settled on one, he chose to identify language model bias. The idea was to improve the result of existing works. Only lung cancer got in the way of his plans.

Grosman put his doctorate on hold to stay with his family and undergo treatment, which involved localized surgery and chemotherapy. To distract himself from the adverse situation, he looked for something he could do that involved programming.

“I came across a job by Facebook Research, now Meta AI, which was related to speech recognition, basically to transcribe audio. They proposed a model that I found interesting, and I started to implement it.”

While researching how to interact with the neural network, a complex system that tries to make artificial intelligence work like the human brain, Grosman participated in the 2021 competition of Hugging Face, a company that promotes open technology initiatives, which use open code and are therefore called “Open Source”.

By the end of the dispute, he built the best speech recognition models to train robots to understand languages such as English, Spanish, Portuguese, Russian, German, French, Italian and Polish. Thus, they are able to transcribe into text what they heard in audio.

The experience changed his life. In 2022, back to the doctorate, another change of theme. From there, he would develop the thesis “Assessing the Robustness of Large Pre-trained Models in Speech Recognition”. That same year, he won another edition of the Hugging tournament.

“I am very flattered to have won both competitions and when I see how many downloads have been made of my model.”

Currently, he acts as Venture Partner, business partner, of the company Lanx Capital Investimentos, connecting startups to the financial manager.

Putting the thesis into practice

Grosman says the AI solution, initially used to distract him from a difficult time, is now helping others.

“Anyone can download my models and use them for commercial purposes, and even earn a lot of money from them, while I don’t earn a dime. My intention is to help the Open Source people.”

Grosman explains that the solution can be used for several activities, from transcribing interviews to producing automatic subtitles on Youtube. “Many people have already contacted me asking for help. There was one person who asked for help to use it to ‘report’ x-rays. She would record the audio of the report and then use my model to transcribe the audio.”

Because Grosman’s model was trained to only identify the sound of common words, the user had to make some adjustments. After a period of training, the solution began to identify common words in the medical field. At another time, the doctor of computer science model was used to transcribe call center conversations.

Life before model creation

The scientist recalls that interest in technology was awakened in adolescence. At the age of 15, he started to maintain computers, both hardware and software. He changed RAM memory, fixed parts, configured the network. What was just a hobby became a source of income.

Because he likes to work with the physical part of machines, he even signed up for a technical course in mechanics during high school. But the interest did not last long. Years later, he graduated in information systems at the Faculty of Technological Education of the State of Rio de Janeiro.

He worked as a programmer, research assistant at the National Laboratory for Scientific Computing and later at the National Observatory. The return to the academy had a little push from Professor Hélio Côrtes Vieira Lopes, from the Department of Informatics at CTC/PUC-Rio.

“With it, I decided that my thesis would be in the NLP area, which is, with many quotation marks, a way of making the computer understand information, which can come in textual or sound form. I had to build an intelligence to extract information from texts.”

Now Grosman’s creation is doing more than that. As it is open source, the English version of its model has already undergone changes. One of the users who did this, for example, trained the robot to go beyond words and identify emotions in speeches.

[ad_2]

Source link

Tags: audio, Created, downloads, Global, leads, robot, sheet, Tech, technology, transcription

Transcription robot created by BR leads global downloads – 07/10/2023 – Tech

Putting the thesis into practice

Life before model creation

DeepMind: AI will solve humanity’s biggest questions – 04/16/2024 – Tech

WhatsApp: see how to use the new message filter – 04/16/2024 – Tech

Artificial intelligence: executive does not see a single model – 04/15/2024 – Tech

How soon will AIs surpass humans? – 04/15/2024 – Tech

Instagram presents instability and users have difficulty posting Stories

Musk’s posts guide networks and fuel the far right – 04/15/2024 – Forwarded Frequently

You may have missed

test

10 Finest Foreign Exchange Robot Merchants ️updated 2024*

test

test

What Is So Fascinating About Marijuana News?

Putting the thesis into practice

Market sheet

Life before model creation

More Stories

You may have missed