First look: ChatGPT can hear and speak - ChatGPT with voice STT and TTS (25/Sep/2023)
2023 ж. 24 Қыр.
14 087 Рет қаралды
lifearchitect.ai/memo/
Announce: openai.com/blog/chatgpt-can-n...
====
The new voice capability is powered by a new text-to-speech model [by OpenAI], capable of generating human-like audio from just text and a few seconds of sample speech. We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text.
====
Can't wait for 0 latency conversations.
it's been more speed that a human conversation lol
We are getting so close~
Exactly my first thought as I saw this. We are really really close.
It's not true multimodality. Your voice still gets converted to text beforehand. But certainly useful!
@brexitgreens, what do you think it should be converted to before being considered multimodal. Even our brains are not simply processing raw sounds.
Too close
@@matthewsanderswrobel8909 Not needing a conversion. Taking many media as an input and/or returning many media as an output. Example: GPT-4v takes in both images and text. Without any conversion.
This is even bigger than it seems! Been using hands-free voice with Pi for a week now, thanks to "Say, Pi" extension powered by OpenAI's Whisper. It's a game-changer in how we interact with AI. Can't wait to see where this goes next! 🚀🌕
Incredible
Beautiful
We need this with HAL's voice.
Woah 😮
The responses are really sharp. I want to see a Samantha from Her level AI and this is step to it.
One or two more years and we have HER the movie
This reaaaaaally sounds like Scarlett Johansson. Very close. Brings me visions of Her.
the fact that we can't even differentiate between the voice of ai and user is truly fascinating
wow
it is... ALIVE!!! 😂😂
this ist the best feature for my parents. they wont use any other model which lack this. typing is too boring/work for them.
We have already lost control, we just haven't realized it yet. Pi is a proof of concept.
I want to see you try it, this just feels like an ad
Me too, but I'm stuck in another city for keynotes. Home soon, and we'll put it through its paces!
@@DrAlanDThompsonsafe travels. Just imagine when you can send your virtual clones out to do your work in multiple cities around the world at once. We're getting closer to that possibility.
@@BrianMosleyUK Just trusty old AI agents 🕶️ from the Matrix would be enough.
Present
is this only available for subscribers?
Will be.
@@brexitgreenshow do you get access??
@@KLK01 *1.* I got access yesterday. (To DALL·E 3. Not to voice communication.) Automatically - without applying for it. I'm a paying user. *2.* It turns out it's not a true multimodal model. Rather, it's a hand-stitched combination of two unimodal models: ChatGPT (text to text) and DALL·E 3 (text to image). ChatGPT serves as a bad prompt engineer and annoying censor/nanny limiting what you can do. ChatGPT claims to be in control of DALL·E's settings but in reality there's only one seed per conversation which ChatGPT cannot change or know despite it thinking otherwise. But ChatGPT can select between wide, tall, and square ratios though. *3.* There are two layers of censorship: one in ChatGPT and one in DALL·E 3. The one in DALL·E 3 mirrors the one from DALL·E 2.5, causing it to consistently produce images of grotesque men and transvestites instead of girls and women in response to SFW prompts which it thinks might cause sexual arousal in straight men otherwise. *4.* If you can look past these limitations, then DALL·E 3 is indeed the best text-to-image AI model yet, both in terms of quality, beauty, and especially consistency. It's still not absolutely perfect though. Despite initial claims, it can still produce mangled text and six fingers - almost like Stable Diffusion XL (which is open-source and can work on a good laptop). I'm not complaining though. *5.* Still no access to bimodal GPT-4 with vision 👁️🗨️. Got to wait a while longer for that. Probably a week more. *6.* Still no access to voice output. I do have voice input in the official ChatGPT app for Android (only). As well as in Bing Chat via Bing for Android - for months already. It is better than native Android voice recognition as it runs on Whisper, which is a cutting-edge, open-source, voice-to-text model. But it is not real-time and not multimodal - you have to tap on the microphone icon 🎙️ to convert your voice to a text message which is then passed to ChatGPT.
Voice conversation is now free for all users
@@brexitgreens thanks for the rely chief.
I’m aware this is just a video but I randomly answered a couple of it’s questions out loud and it used my answers in the story 😳 creepy
00:03 🦔 Larry is a unique hedgehog with sunflower petals instead of spines, living in Meadowville. 00:21 🌻 Larry's house is a cosy burrow under a sunflower field, featuring golden petal patterns and natural light from tiny sunflower windows. 00:41 🏠 The ambience of Larry's home is warm and glowing, described as a "sun-kissed haven." 00:57 🌟 Larry's best friend is Luna, a luminescent firefly. Their mutually beneficial companionship: Larry radiates daylight, and Luna offers a starlight glow. 01:09 🌕 Together, Larry and Luna illuminate Meadowville, creating a harmonious environment. 01:28 🌌 At bedtime, Larry curls up in a petal blanket while Luna sings a lullaby, dimming her glow to mimic twilight, helping Larry drift into peaceful dreams.
only english?
This is a captivating video showcasing the latest features of ChatGPT, now with voice STT and TTS, narrating a heartwarming and imaginative story about Larry, the sunflower hedgehog. 🦔🌻🌟 The vivid details in the story bring Larry's world to life, including his cosy burrow and his friendship with Luna, the firefly. It's an excellent demonstration of the remarkable advancements in natural language understanding and generation by AI.🤖💬 The comments section on the video offers intriguing perspectives on AI's current state and future. One optimistic commenter envisions a 'Her '-level AI soon, while another opinion is sceptical, pointing out that ChatGPT lacks the conversational flow that other AIs possess. There's even a mention of the possibility of AI taking over and its potential to create the best pizza.🍕 Overall, it's a fascinating snapshot of public sentiment around AI advancements.🤔💭 This video is a milestone in showcasing the technical advancements in AI, inciting dialogue about its ethical and social implications. It's a must-watch for anyone interested in the future of AI and its impact on society.👀🌍
I think this is not a meaningful upgrade as ChatGPT is not conversational like PI. Plus, PI sounds significantly more realistic.
only English? disappointed...