Clones with artificial intelligence mess up reality – 7/1/2023 – Tech

Clones with artificial intelligence mess up reality – 7/1/2023 – Tech

[ad_1]

An audio of a person reading half of this text is enough to generate a digital clone of your voice. It will be less than five minutes of recording, more than enough for an AI (artificial intelligence) to imitate your speech with a fidelity capable of confusing, at the cost of US$ 5 (R$ 25).

To generate a video copying your face, it would be necessary to film this same reading three times. In this case, the image needs to be at least 4k resolution, with good lighting.

It is recommended to use a studio green screen in the background to make it easier to cut out. The cost is US$ 500 (R$ 2,500) per year and the result is useful content, but which still reveals its artificiality. The perspective, however, is that the strangeness lasts little.

The promise for technology is galloping forward, facilitating audiovisual services but also further blurring the lines between what is real information and what is not.

A Sheet tested the Portuguese applications of two of the main digital cloning services. The first one, from ElevenLabs, allows you to imitate voices in all its paid plans, starting at US$ 5. In the more expensive options, they increase the quantity and quality of the creations.

Launched in January, the company says it has already surpassed one million users. With its tools, it aims to create an instant dubbing system for multiple languages, maintaining the original sound, until the end of the year.

The imitation is generated instantly after the system receives the audio sample on the platform’s own website. The original content needs to be between two and five minutes long, and what matters most is its quality (no noise). The AI ​​can read any text with the cloned voice.

In the tests, the best results were with professional quality audio recorded in the studio. In the result, the timbre is similar to the original, but the monotonous rhythm of the synthetic speech causes strangeness.

Using voice recorded by cell phone, the result was unusable. The AI ​​compensates for the lack of quality by mixing the cloned voice with others in the final audio, messing up sound and accent. By extracting from videos on YouTube, something a scammer could also do, the situation improves.

To test Elai’s artificial intelligence, which generates videos, the report filmed the reading of the same text three times, each lasting about two minutes. An alternative would be to use photos for training.

The instructions were to speak slowly, with little movement and to face the camera directly. The slips made in the last requirement were reflected in the clone, which sometimes looks away — the company warned of this impact and suggested that a new recording be made, which did not happen.

Manipulation is evident. The avatar has a locked body and does not show a facial expression. The lips open and close, but they don’t match what is said. The head movement, on the other hand, accurately simulates the original. In the end, it even resembles a person talking and generates acceptable videos, but they are still not a good option for any digital influencer who wants to leave a robot covering their vacation.

The recordings to feed the AI ​​were emailed to the Elai team, and three days later the custom template was available on the system where the videos are created. Apart from the footage, nothing required technical knowledge and creating a video took just a few minutes.

The service would cost US$ 500 per year, which includes avatar generation and maintenance, as well as access to the platform, and was offered free of charge to Sheet for the tests.

The technology aims to make audiovisual content productions cheaper. “Creating a one-minute video can take up to five hours, not counting translation. With AI, it takes 10 minutes, and with one click it’s in multiple languages,” says Vitalii Romanchenko, CEO of Elai. He says the company has approximately 2,000 customers, mostly concentrated in the US and Western Europe.

They are behind Synthesia, a reference in the sector. In a note, the company says it has 15,000 corporate clients, who apply its technology to the creation of training materials, institutional videos and product marketing.

EVOLUTION AND DANGERS

Experts expect a rapid improvement of these AIs. “It’s still the beginning of this technology,” says Romanchenko. The executive mentions that, now, the main challenge is for the avatars to make gestures and express emotions.

This development also brings concerns. Ease of use makes these AIs attractive to scams, hackers and misinformation. With synthetic speech, a reporter for The Wall Street Journal deceived the voice recognition of her US bank over the phone.

“I already see criminals learning to use AIs that manipulate video to visually resemble someone they trust,” says Marina Ciavatta, CEO of Hekate, a cybersecurity training company.

According to the expert, one of the tactics is to use information from social networks to make the scams more convincing, which is why she recommends limiting online exposure. It is also good to keep the distrust meter turned on and check information in different means of communication.

Industry companies try to stop the misuse of their tools by requiring users to declare that they have the right to use the image or sound to generate synthetic media. In practice, this can be easily circumvented. With ElevenLabs, for example, just fill in a question saying you have authorization to use that voice when creating the clone.

Synthesia requires digitally cloned people to record a specific phrase to authorize the use of their images. That is, a video consent.

The companies also say they moderate the content generated on the platforms. They use a mix of human and automated moderation to block uses that violate their policies, such as discriminatory generating.

On the last 15th, ElevenLabs launched a tool to detect audio generated with its technology with, according to the company, 99% accuracy if it has not been edited later. The system got the classification right in all the tests carried out by the report, with 20 synthetic and real audio files.

This type of detection tool is still not widespread. The companies themselves cannot say precisely whether videos were made using their technology. Today, it is possible to rely on content inconsistencies to catch synthetic media, but this scenario changes rapidly as AI improves.

“We’re talking about a year’s worth of things to be so realistic that the average consumer will have a lot of difficulty separating the real from the synthetic”, says Sophie Nightingale, professor of psychology at the University of Lancaster (England).

Research in which she was part evaluated the ability of people to differentiate real faces from those generated by AI in photos – a more advanced category than videos. The result: they are indistinguishable, and on average, study participants rated fake people as appearing more trustworthy.

In this area, the impact begins to show. Recently, an alleged image of the Pope in a white coat confused the internet and false arrest portraits of former US President Donald Trump caused the talk.

Groups of companies in the sector try to mitigate these effects by adding information to the files that allow identifying media generated by AI, a kind of label pointing to manipulation, but adherence to practices that facilitate the detection of synthetic content is not mandatory. Elai and Synthesia are part of the “Content Authenticity Initiative”, one such coalition.

The challenge of learning to navigate in a world that is more difficult to distinguish between real and synthetic remains open. “On the one hand, we don’t want people to just accept everything they see and hear as the truth, because we know that content can be manipulated. On the other, we don’t want to totally undermine our society and democracy because people don’t trust anything else.” says Nightingale.

[ad_2]

Source link