First look: ChatGPT can hear and speak - ChatGPT with voice STT and TTS (25/Sep/2023)

2023 ж. 24 Қыр.

14 087 Рет қаралды

lifearchitect.ai/memo/
Announce: openai.com/blog/chatgpt-can-n...
====
The new voice capability is powered by a new text-to-speech model [by OpenAI], capable of generating human-like audio from just text and a few seconds of sample speech. We collaborated with professional voice actors to create each of the voices. We also use Whisper, our open-source speech recognition system, to transcribe your spoken words into text.
====

Пікірлер

Can't wait for 0 latency conversations.
@neithanm7 ай бұрын
- it's been more speed that a human conversation lol
  @lucasmanoel97027 ай бұрын
We are getting so close~
@disarmyouwitha7 ай бұрын
- Exactly my first thought as I saw this. We are really really close.
  @Ben_D.7 ай бұрын
- It's not true multimodality. Your voice still gets converted to text beforehand. But certainly useful!
  @brexitgreens7 ай бұрын
- @brexitgreens, what do you think it should be converted to before being considered multimodal. Even our brains are not simply processing raw sounds.
  @matthewsanderswrobel89097 ай бұрын
- Too close
  @kerrysgold16 ай бұрын
- @@matthewsanderswrobel8909 Not needing a conversion. Taking many media as an input and/or returning many media as an output. Example: GPT-4v takes in both images and text. Without any conversion.
  @brexitgreens6 ай бұрын
This is even bigger than it seems! Been using hands-free voice with Pi for a week now, thanks to "Say, Pi" extension powered by OpenAI's Whisper. It's a game-changer in how we interact with AI. Can't wait to see where this goes next! 🚀🌕
@rosscads7 ай бұрын
Incredible
@bagel49547 ай бұрын
Beautiful
@BrianMosleyUK7 ай бұрын
We need this with HAL's voice.
@eurethnic7 ай бұрын
Woah 😮
@DominicDiTanna7 ай бұрын
The responses are really sharp. I want to see a Samantha from Her level AI and this is step to it.
@jhunt55787 ай бұрын
One or two more years and we have HER the movie
@KLK017 ай бұрын
This reaaaaaally sounds like Scarlett Johansson. Very close. Brings me visions of Her.
@Ben_D.7 ай бұрын
the fact that we can't even differentiate between the voice of ai and user is truly fascinating
@AmanYadav-vw3xg7 ай бұрын
wow
@darkmetaOFFICIAL7 ай бұрын
it is... ALIVE!!! 😂😂
@darkmetaOFFICIAL7 ай бұрын
this ist the best feature for my parents. they wont use any other model which lack this. typing is too boring/work for them.
@lucilaci2 ай бұрын
We have already lost control, we just haven't realized it yet. Pi is a proof of concept.
@Custodian1237 ай бұрын
I want to see you try it, this just feels like an ad
@joelcoll40347 ай бұрын
- Me too, but I'm stuck in another city for keynotes. Home soon, and we'll put it through its paces!
  @DrAlanDThompson7 ай бұрын
- @@DrAlanDThompsonsafe travels. Just imagine when you can send your virtual clones out to do your work in multiple cities around the world at once. We're getting closer to that possibility.
  @BrianMosleyUK7 ай бұрын
- @@BrianMosleyUK Just trusty old AI agents 🕶️ from the Matrix would be enough.
  @brexitgreens7 ай бұрын
Present
@lawill35597 ай бұрын
is this only available for subscribers?
@Ace.207 ай бұрын
- Will be.
  @brexitgreens7 ай бұрын
- @@brexitgreenshow do you get access??
  @KLK017 ай бұрын
- @@KLK01 *1.* I got access yesterday. (To DALL·E 3. Not to voice communication.) Automatically - without applying for it. I'm a paying user. *2.* It turns out it's not a true multimodal model. Rather, it's a hand-stitched combination of two unimodal models: ChatGPT (text to text) and DALL·E 3 (text to image). ChatGPT serves as a bad prompt engineer and annoying censor/nanny limiting what you can do. ChatGPT claims to be in control of DALL·E's settings but in reality there's only one seed per conversation which ChatGPT cannot change or know despite it thinking otherwise. But ChatGPT can select between wide, tall, and square ratios though. *3.* There are two layers of censorship: one in ChatGPT and one in DALL·E 3. The one in DALL·E 3 mirrors the one from DALL·E 2.5, causing it to consistently produce images of grotesque men and transvestites instead of girls and women in response to SFW prompts which it thinks might cause sexual arousal in straight men otherwise. *4.* If you can look past these limitations, then DALL·E 3 is indeed the best text-to-image AI model yet, both in terms of quality, beauty, and especially consistency. It's still not absolutely perfect though. Despite initial claims, it can still produce mangled text and six fingers - almost like Stable Diffusion XL (which is open-source and can work on a good laptop). I'm not complaining though. *5.* Still no access to bimodal GPT-4 with vision 👁️‍🗨️. Got to wait a while longer for that. Probably a week more. *6.* Still no access to voice output. I do have voice input in the official ChatGPT app for Android (only). As well as in Bing Chat via Bing for Android - for months already. It is better than native Android voice recognition as it runs on Whisper, which is a cutting-edge, open-source, voice-to-text model. But it is not real-time and not multimodal - you have to tap on the microphone icon 🎙️ to convert your voice to a text message which is then passed to ChatGPT.
  @brexitgreens7 ай бұрын
- Voice conversation is now free for all users
  @stormhound19735 ай бұрын
- @@brexitgreens thanks for the rely chief.
  @KLK015 ай бұрын
I’m aware this is just a video but I randomly answered a couple of it’s questions out loud and it used my answers in the story 😳 creepy
@traviscislo62607 ай бұрын
00:03 🦔 Larry is a unique hedgehog with sunflower petals instead of spines, living in Meadowville. 00:21 🌻 Larry's house is a cosy burrow under a sunflower field, featuring golden petal patterns and natural light from tiny sunflower windows. 00:41 🏠 The ambience of Larry's home is warm and glowing, described as a "sun-kissed haven." 00:57 🌟 Larry's best friend is Luna, a luminescent firefly. Their mutually beneficial companionship: Larry radiates daylight, and Luna offers a starlight glow. 01:09 🌕 Together, Larry and Luna illuminate Meadowville, creating a harmonious environment. 01:28 🌌 At bedtime, Larry curls up in a petal blanket while Luna sings a lullaby, dimming her glow to mimic twilight, helping Larry drift into peaceful dreams.
@I-Dophler7 ай бұрын
only english?
@douglasteixeiradeabreu7 ай бұрын
This is a captivating video showcasing the latest features of ChatGPT, now with voice STT and TTS, narrating a heartwarming and imaginative story about Larry, the sunflower hedgehog. 🦔🌻🌟 The vivid details in the story bring Larry's world to life, including his cosy burrow and his friendship with Luna, the firefly. It's an excellent demonstration of the remarkable advancements in natural language understanding and generation by AI.🤖💬 The comments section on the video offers intriguing perspectives on AI's current state and future. One optimistic commenter envisions a 'Her '-level AI soon, while another opinion is sceptical, pointing out that ChatGPT lacks the conversational flow that other AIs possess. There's even a mention of the possibility of AI taking over and its potential to create the best pizza.🍕 Overall, it's a fascinating snapshot of public sentiment around AI advancements.🤔💭 This video is a milestone in showcasing the technical advancements in AI, inciting dialogue about its ethical and social implications. It's a must-watch for anyone interested in the future of AI and its impact on society.👀🌍
@I-Dophler7 ай бұрын
I think this is not a meaningful upgrade as ChatGPT is not conversational like PI. Plus, PI sounds significantly more realistic.
@cemtural85567 ай бұрын
only English? disappointed...
@lingred9757 ай бұрын