- AI Weekly
- Posts
- AI Weekly: 06/19/23
AI Weekly: 06/19/23
A $113M seed round, police robots, and hyperrealistic avatars - we’re living in a sci-fi world
Good morning and welcome to this week’s edition of AI Weekly!
In this week’s news, a new four-week-old OpenAI competitor called Mistral AI raised a $113 million “seed” round, while OpenAI launched its own set of new features for ChatGPT.
In other news, Epic released its new tool “MetaHuman,” which creates hyperrealistic avatars of individuals, and Apple announced its own plans to use AI and machine learning to create virtual avatars of wearers of their new Apple Vision Pro headset.
In our weekly regulatory update, UK Prime Minister Rishi Sunak announced that OpenAI, Google DeepMind, and Anthropic have committed to providing early or priority access to their AI models for research into evaluation and safety.
Oh, and an airport in Singapore now has two police robots with 360-degree vision patrolling its terminals. Fun stuff, right? Enjoy reading more about the above and much more AI news below!
- ZG
Here are the most important stories of the week:
TEXT
OpenAI has introduced new features and pricing updates for certain versions of ChatGPT. Link.
GPT-4 and GPT-3.5-turbo now have the "function calling" feature, allowing users to describe a programming function and have the bot generate the code to execute it.
GPT-3.5-turbo has an expanded "context window" that enables it to reference more previous text from the conversation, improving its ability to generate accurate responses without forgetting information.
The new version of GPT-3.5-turbo is priced at twice the cost of the vanilla version, but the vanilla version receives a 25 percent price reduction.
These updates aim to enhance the interaction capabilities of ChatGPT and provide users with more accurate and context-aware responses.
While the chatbot can provide answers to questions like the weather in Boston, alternative methods like searching on Google are also available.
Mistral AI, a Paris-based startup founded by Google's DeepMind and Meta alumni, has raised $113 million in seed funding to compete against OpenAI in the development of large language models and generative AI. Link.
The company aims to make AI useful by focusing on open source solutions and targeting enterprises. It plans to release its first models for text-based generative AI in 2024.
The funding round was led by Lightspeed Venture Partners, with participation from other investors including Redpoint, Index Ventures, and prominent individuals in the tech industry.
Mistral AI aims to build models using publicly available data to avoid legal issues and will open-source its models and datasets. It believes that open source can overcome potential misuse and provide tactical advantages in security.
The startup's focus is on enterprise customers, helping them understand and utilize AI effectively in their respective fields.
The investment from Lightspeed highlights the belief in Mistral AI's expertise in language models and the potential value in the AI market, comparing it to infrastructure plays like cloud computing.
Meta is working on making the next version of LLaMA, its open-source LLM, commercially available. Link.
This move comes despite recent inquiries and concerns from lawmakers, including a letter from two U.S. senators questioning LLaMA's leak to 4chan shortly after its announcement.
Meta's commitment to open-source AI sets it apart from other big tech companies, with its Fundamental AI Research Team (FAIR) led by Yann LeCun.
The decision to make LLaMA commercially available aligns with the increasing focus on regulating artificial intelligence and the emergence of new open-source LLMs.
Meta's CEO, Mark Zuckerberg, emphasized the integration of generative AI into all of the company's products and reaffirmed its commitment to an "open science-based approach" to AI research.
Zuckerberg also stated in an interview that LLaMA will power access to AI agents for small businesses and content creators using Facebook's apps.
Amazon is experimenting with AI to simplify reviews and enhance the shopping experience. Link.
The new feature includes a block of text summarizing customer reviews, with a note indicating that the information is AI-generated.
The AI-generated summaries aim to reduce review barriers and help customers discover potentially superior products.
Amazon confirmed the testing of this feature and stated its significant investment in generative AI across its businesses.
Specific details about the model used or the training process were not provided by Amazon.
This AI application by Amazon represents a quieter and less sensational use of the technology amid the ongoing debates surrounding AI's risks, benefits, and regulation.
Vectara a conversational search platform for enterprises that helps unlock data from text-based files by enabling developers to build conversational AI apps, has raised $28.5 million in seed funding led by Race Capital with participation from Emad Mostaque, the founder of Stability AI. Link.
The platform offers AI-powered, API-based search technology capable of handling queries of any length, ambiguity, and language across multilingual documents.
Vectara aims to address the challenges faced by industries in efficiently retrieving, summarizing, and maintaining data privacy for their vast corporate data stores.
Vectara allows users to ask questions about their company's data and provides a summary with citations to the source data set, enabling better utilization of data for insights.
The platform trains its AI models on publicly licensed data sources to mitigate inaccuracy and bias, while the content chosen by customers for indexing grounds the summarization provided.
Vectara's technology finds applications in various use cases such as legal discovery, ecommerce search, news monitoring, and financial analysis, and it aims to revolutionize user interfaces by enabling verbal expression of search intent.
IMAGE/VIDEO
Google Shopping introduces a virtual try-on experience to help users visualize how clothing will look on different body types. Link.
The feature initially focuses on women's tops from brands like H&M, Anthropologie, Everlane, and Loft.
Users can select different body sizes and see how the clothing looks on a range of diverse models with various skin tones, ethnicities, hair types, and body shapes.
The virtual try-on experience utilizes a generative AI model to realistically depict how clothing would drape, wrinkle, fold, cling, and stretch on the models.
Google Shopping also introduces new filtering options, powered by machine learning and visual matching algorithms, to help users find specific clothing items based on color, style, and pattern preferences.
Levi's previously announced using AI-generated models for online shopping but later clarified its stance after facing backlash, whereas Google's models are real people with AI used to shape the clothing around them.
Epic is releasing MetaHuman Animator, a tool that captures an actor's facial performance using an iPhone and applies it to a hyperrealistic "MetaHuman" character in the Unreal Engine. Link.
The tool emphasizes speed, with the final animation available in minutes, allowing studios to save money and be more creative by enabling quick experimentation and multiple takes.
MetaHuman Animator can apply facial animation to characters with just a few clicks and even animates a character's tongue based on audio performance.
The combination of iPhone capture and MetaHuman technology offers high-level detail and fidelity, supporting the use of existing vertical stereo head-mounted camera systems for even greater accuracy.
Epic's Blue Dot short film showcases the capabilities of the animation tool, featuring actor Radivoje Bukvić delivering a monologue with minimal post-capture interventions.
Instructional videos and documentation are available for developers interested in using MetaHuman Animator, accessible through the MetaHuman hub on the Epic Developer Community.
Apple made a strong statement about its focus on AI at its WWDC event, highlighting AI's role in its forthcoming hardware and software features. Link.
iOS 17 will have computer vision capabilities to suggest recipes based on iPhone photos and introduce an upgraded autocorrect powered by an AI model that learns a user's frequently used words.
Apple's Vision Pro augmented reality headset will use AI and machine learning to create virtual avatars of wearers, replicating facial contortions with accuracy.
The company aims to demonstrate its commitment to AI and bounce back from past underperformance in the field, including losing talented machine learning scientists to other companies.
Apple's move to ship products infused with AI showcases its seriousness about the technology and sets a benchmark for competitors.
The company's AI efforts were notable even if they weren't highly publicized during the event.
Meta has released I-JEPA, a machine learning model that learns abstract representations of the world through self-supervised learning on images. Link.
Initial tests show that I-JEPA performs well on computer vision tasks and is more efficient than other state-of-the-art models, requiring fewer computing resources for training.
Self-supervised learning is inspired by how humans and animals learn by observing the world, allowing AI systems to learn through raw observations without human-labeled data.
I-JEPA differs from other self-supervised models by focusing on high-level abstractions and predictions, making it less error-prone and costly.
The model is implemented using a vision transformer and a predictor ViT to generate semantic representations for missing parts of an image.
I-JEPA is memory and compute-efficient, requiring less fine-tuning and achieving strong performance on computer vision tasks with minimal training data, offering potential applications in robotics and self-driving cars.
Synthesia, a startup using AI to create synthetic videos, raised $90 million in a Series C funding round led by Accel, with strategic investment from Nvidia and participation from other investors. Link.
The funding brings Synthesia's total raised to $156.6 million and values the company at $1 billion.
Synthesia has over 50,000 customers, experiencing a year-over-year user growth rate of 456% and generating over 15 million videos on its platform.
The company's AI technology allows users to create instructional videos with AI avatars, selecting an avatar, language, and inputting text to generate videos.
Synthesia's clients include Tiffany's, IHG, Teleperformance, Moody's Analytics, and entities of the United Nations.
It is used by 35% of the Fortune 100 for training and marketing purposes.
Concerns have been raised about the potential misuse of Synthesia's technology for creating deepfakes, but the company claims to vet customers, require consent, and suspend accounts violating its terms of service.
The investment from the Series C funding will be used to enhance Synthesia's avatars' expressiveness and improve the platform's speed and collaboration features.
SPEECH/AUDIO
Meta has unveiled Voicebox, a generative text-to-speech model that generates audio clips, similar to how GPT and Dall-E generate text and images respectively. Link.
Voicebox has been trained on over 50,000 hours of unfiltered audio, including recorded speech and transcripts from public domain audiobooks in multiple languages.
The system can generate more conversational speech across different languages and performs almost as well as models trained on real speech, with just a 1 percent error rate degradation.
Voicebox can infill speech based on context and generate portions in the middle of an audio recording without recreating the entire input.
Voicebox is capable of editing audio clips, removing noise and replacing misspoken words, similar to using image-editing software for photographs.
Meta's AI outperforms current state-of-the-art models in terms of intelligibility and audio similarity, and operates up to 20 times faster than existing text-to-speech systems. However, the app and source code are not publicly released due to potential risks of misuse.
The Beatles will release their “the last Beatles record” using AI to extract John Lennon's voice from an old demo. Link.
The technology was used during the making of the documentary series "The Beatles: Get Back" by director Peter Jackson, who utilized AI to separate the voices of the Beatles from background sounds.
The specific demo, likely an unfinished 1978 love song called "Now and Then," was provided to McCartney by Yoko Ono and became the basis for the last Beatles record.
McCartney finds AI technology both scary and exciting, expressing curiosity about its future implications.
AI technology also enabled McCartney to virtually "duet" with Lennon at the Glastonbury Festival last year.
Holly Herndon, an artist who has worked with AI, explains that source separation, a process facilitated by machine learning, allows the extraction of a voice from a recording for further manipulation and accompaniment.
McCartney will soon open an exhibition featuring his previously unseen photographs taken during the early days of the Beatles, titled "Eyes of the Storm."
Meta has released its AI-powered music generator called MusicGen and open-sourced it. Link.
MusicGen can convert a text description into around 12 seconds of audio, optionally guided by reference audio or melody.
The tool was trained on 20,000 hours of music, including licensed music tracks and instrument-only tracks from stock media libraries.
MusicGen's output is reasonably melodic but not on par with professional musicians, comparable to Google's AI music generator, MusicLM.
Ethical and legal issues surround generative music as homemade tracks using AI to mimic authentic sounds have gone viral, prompting copyright concerns.
Lawsuits in progress will likely shape the future of music-generating AI, addressing artists' rights and the use of their work to train AI systems.
CODE/DEVTOOLS
Bito, a B2B startup, has launched Bito AI, an AI coding assistant powered by ChatGPT, and has raised $3.2 million in funding. Link.
Bito AI learns from a user's own codebase, keeping the information on the user's device for security and privacy. Natural language queries are routed to ChatGPT 3.5 and ChatGPT 4.
Developers can ask Bito AI to complete software development tasks in 25 supported languages and get results in 50 programming languages.
Bito AI offers features such as generating unit tests, explaining code, improving performance, checking for security issues, and providing insights into technical concepts.
The platform seamlessly integrates into a developer's coding environment, eliminating the need to toggle back and forth to a web page for results.
Bito AI is currently in its Alpha release, free to use, and compatible with Visual Studio Code, JetBrains IDEs, and the CLI. It has shown to enhance productivity, with developers reporting a 31% increase in productivity and significant time saved on routine tasks.
HARDWARE
AMD, known for its comeback in the chip industry, has unveiled an AI chip called the MI300X that aims to challenge Nvidia's dominance in AI hardware and software. Link.
The AI market is projected to reach $800 billion over the next decade, creating a significant opportunity for companies like AMD.
The MI300X is AMD's most advanced GPU and will begin shipping in Q3 2023, with mass production starting in Q4.
Amazon Web Services is considering partnering with AMD as a supplier, potentially giving AMD a flagship client and a strong foothold against Nvidia.
Nvidia currently holds 80% of the market share for AI chips and recently crossed a $1 trillion valuation.
The MI300X is expected to compete with Nvidia's Grace Hopper Superchip and attract interest from major clients like Microsoft. AI chips excel at training large-language models and processing vast amounts of data.
Hugging Face has announced a partnership with AMD as part of their Hardware Partner Program. Link.
The collaboration aims to deliver state-of-the-art transformer performance on AMD CPUs and GPUs.
The partnership will provide the Hugging Face community with access to the latest AMD platforms for training and inference.
AMD and Hugging Face will optimize performance on platforms including Instinct MI2xx and MI3xx GPUs, Radeon Navi3x GPUs, Ryzen client CPUs, EPYC server CPUs, and the Alveo V70 AI accelerator.
The collaboration will support various model architectures and frameworks, including transformer architectures for natural language processing, computer vision, speech, generative AI models, computer vision models, and deep learning recommendation models.
Hugging Face will work closely with AMD to optimize key models and integrate the AMD ROCm SDK into their open-source libraries, starting with the transformers library.
HEALTHCARE
Google has introduced new capabilities for its visual search tool, Lens, to help parents determine the seriousness of skin conditions. Link.
Users can upload a photo of their or their child's skin ailment to Lens and search for visually similar conditions, providing guidance in identifying the issue.
Lens is not a diagnostic tool and should not be seen as a substitute for professional medical advice from a doctor.
The feature allows users to make more informed decisions about whether to consult a pediatrician or try simple remedies like applying calamine lotion.
In addition to the skin condition lookup, Google also announced that Lens will be integrated with its AI chatbot Bard, allowing users to include images in prompts for identification purposes.
These new Lens features are accessible on Android and iOS through the Google app.
POLICY/LAW/ETHICS
UK Prime Minister Rishi Sunak announced that OpenAI, Google DeepMind, and Anthropic have committed to providing early or priority access to their AI models for research into evaluation and safety. Link.
The UK government has allocated £100 million to an AI safety taskforce, dedicating more funding to AI safety than any other government.
The government aims to make the UK the intellectual and geographical home of global AI safety regulation and will host the first-ever Summit on global AI Safety.
This shift in approach towards AI safety comes after concerns about the risks posed by AI technology and meetings between the Prime Minister and CEOs of AI companies.
The involvement of AI giants in publicly funded research raises concerns about industry capture and the shaping of future AI rules.
The government should ensure the involvement of independent researchers, civil society groups, and those at risk of harm from automation to produce robust and credible AI safety efforts.
The European Union (EU) has taken a significant step toward setting the world's first rules on the use of AI. Link.
The EU AI Act, once approved, will apply to companies developing and deploying AI systems in the EU, regardless of their location.
The Act categorizes AI applications into high-risk, low-risk, and prohibited categories based on their potential harm or risks.
Prohibited AI systems include real-time facial recognition in public spaces, predictive policing tools, and social scoring systems.
High-risk AI applications, such as those used to influence voters or recommend content on social media platforms, face tight restrictions and transparency requirements.
Violations of the regulations can result in hefty fines, with penalties of up to €40 million or 7% of a company's worldwide annual turnover, whichever is higher.
The Singapore Police Force has introduced two robots to patrol Changi Airport, with plans to deploy more across the city-state in the future. Link.
The robots serve as additional police presence and have 360-degree vision, standing at 1.7 meters tall but extending to 2.3 meters.
They can enforce cordons, warn bystanders using blinkers, sirens, and speakers, and allow the public to communicate with the police by pushing a button on the robots.
The robots have in-built speakers and LCD panels for broadcasting audio and visual messages.
The Singapore Police Force aims to enhance operational efficiency and capabilities with the integration of robotics.
Singapore has previously deployed robots for civic duties, including enforcing social distancing during the pandemic and cleaning metro stations, while also exploring technologies like flying taxis.
OpenAI CEO Sam Altman called for enhanced collaboration between the U.S. and China on AI development and safety during a conference hosted by the Beijing Academy of Artificial Intelligence. Link.
Altman has been actively advocating for AI regulation and has met with policymakers around the world to influence the development of AI regulations.
He signed a statement emphasizing the need to mitigate the risks of AI, positioning OpenAI as supportive of responsible tech.
Altman's appeals to China may be related to Beijing's serious approach to AI regulation, which poses less immediate threats to OpenAI's business interests compared to Chinese companies.
Altman's support for regulation is strategic, and he hopes that the United States will take a relatively laissez-faire approach to global AI standards to benefit OpenAI.
The Biden Administration is focused on cooperating with U.S. allies, while the EU has pursued more aggressive AI regulation. The U.S. faces debates on the extent of regulation, which may pit China hawks against big-tech hawks.
Stolen OpenAI API tokens are being advertised on the r/ChatGPT Discord subreddit, allowing unauthorized access to OpenAI's language models like GPT-4 and incurring charges to stolen accounts. Link.
One valuable OpenAI account with a limit of $150,000 worth of usage has been stolen and is being offered for free to other members through a website and a dedicated Discord server.
The stolen API keys were obtained by scraping a coding project collaboration website called Replit, where users unintentionally included their OpenAI API keys in publicly accessible code.
The pirate responsible, known as Discodtehe, has been increasing their usage of the stolen API key and has shared screenshots of the account usage.
Discodtehe also created a website where users can request free access to the OpenAI API by setting their default billing address to an organization they control.
OpenAI conducts automated scans of repositories to revoke any discovered API keys and advises users to rotate their keys if they suspect exposure. Users have also expressed concerns about OpenAI's authentication process and called for better security measures.
OTHER
OctoML has launched OctoAI, a self-optimizing compute service for AI that focuses on helping businesses use existing open-source models and fine-tune them with their own data or host their custom models. Link.
The platform abstracts away the complexities of ML infrastructure by automatically choosing the right hardware based on user priorities, optimizing models, and selecting the appropriate hardware (Nvidia GPUs or AWS Inferentia machines) for running the models.
OctoAI aims to simplify the process of putting ML models into production, which is often a challenge for many projects.
Users can set their own parameters and hardware preferences if they desire full control, but the CEO believes most users will opt to let OctoAI manage these aspects.
OctoML provides accelerated versions of popular foundation models, such as Dolly 2, Whisper, FILM, FLAN-UL2, and Stable Diffusion, which run faster and more cost-effectively compared to the vanilla models.
While OctoML will continue working with customers who only need model optimization, the company's primary focus moving forward will be on the OctoAI compute platform.
Versed, a European startup that aims to enable anyone to create their own role-playing game (RPG) by writing text-based stories and instructions using its AI platform, has raised €1.6 million ($1.7 million) in a pre-seed funding round led by Google's Gradient Ventures. Link.
Versed's AI interprets the narrative and assigns characters and locations from its in-house database to create immersive RPG worlds without requiring coding skills.
Versed uses a combination of language models and its own algorithms to match assets and build worlds, incorporating the writer's descriptions of characters and locations.
The platform leverages the Unity game engine for gameplay, with a focus on making game development accessible to storytellers without technical or design expertise.
Versed plans to offer a subscription-based model where gamers can subscribe to individual creators or types of adventures, with Versed taking a cut, to support creators and ensure ongoing content availability.
Gensyn AI, founded in 2020, has raised $43 million in Series A funding led by a16z crypto to develop a decentralized machine learning compute protocol. Link.
The aim of Gensyn is to connect all machine learning capable compute hardware globally and make it accessible to engineers, researchers, and academics on a pay-as-you-go model.
Gensyn has solved the problem of verifying proper execution of computational tasks on devices specifically for machine learning training, which sets it apart.
Gensyn is built on a layer one proof-of-stake blockchain based on the Substrate protocol, offering scalability and low verification overhead.
The funding round included participation from CoinFund, Canonical Crypto, Protocol Labs, Eden Block, and several angel investors.
The funds raised will be used to expand the Gensyn AI team, cover production costs, and launch a test network later this year to further develop the decentralized trust layer for machine learning.