- AI Weekly
- Posts
- AI Weekly 06/26/23
AI Weekly 06/26/23
Stability, Meta, MosaicML and more drop new models, Dropbox launches a new AI fund, and OpenAI is considering creating an App Store for GPT-powered apps
Good morning and welcome to this week’s edition of AI Weekly! This week’s news is full of a number of unique models released by some of the biggest players on the foundation layer of the tech stack. MosaicML released a new version of their proprietary language model, which they claim to have trained at a fraction of the cost of competitors’ models.
Also, Stability AI released their most recent version of Stable Diffusion, which knows how to properly generate images of hands.
Meta has also doubled down on its entrance into the foundation model game with its new model, Voicebox, that generates speech from text and is able to perform tasks like editing, noise removal, and style transfer.
In non-foundation model news, Dropbox launched a $50 million AI-dedicated venture fund, ElevenLabs officially announced their $19 million Series A round, and OpenAI is reportedly considering creating a marketplace for AI models, similar to an AI ‘app store.’ Continue reading about more AI news below!
- ZG
Here are the most important stories of the week:
TEXT
Inflection, an AI startup focused on creating personalized AI, has unveiled its Pi conversational agent powered by the Inflection-1 language model. Link.
Inflection-1 is a large language model comparable to GPT-3.5 in terms of size and capabilities.
The company claims that Inflection-1 performs competitively or surpasses other models on this tier, citing benchmarks against GPT-3.5, LLaMA, Chinchilla, and PaLM-540B.
Published results indicate that Inflection-1 excels in tasks like middle- and high-school level exams and common sense benchmarks but falls behind in coding compared to GPT-3.5 and GPT-4.
Inflection plans to release results for a larger model comparable to GPT-4 and PaLM-2(L) in the future.
The AI community has yet to establish formal divisions for AI models akin to boxing weight classes, and the field is still evolving with no consensus on model sizes and capabilities. Independent evaluation and wider use of Inflection's model are necessary to validate its claimed benchmarks.
MosaicML, an AI startup based in San Francisco, has announced the release of its language model, MPT-30B, which has been trained at a fraction of the cost of competitors' models. Link.
MPT-30B offers improved capabilities for summarization and reasoning over large amounts of data, making it attractive for enterprise applications such as dialog systems and text summarization.
MosaicML utilized optimization techniques like Alibi and FlashAttention mechanisms and had access to Nvidia H100 GPUs, resulting in faster training times and increased GPU utilization.
The company aims to democratize advanced AI technology by making it more accessible and transparent in terms of cost, time, and difficulty.
MosaicML allows businesses to train models on their own data using the company's model architectures and deploy them through its inference API, enabling enterprises to build custom models at a lower cost.
The availability of MPT-30B as an open-source model, along with MosaicML's model tuning and deployment services, positions the startup as a competitor to OpenAI in the market for large language models.
MosaicML's vision for the future of AI involves creating tools that assist experts in various industries, enhancing data quality, and empowering users to build more effective AI models.
By offering a more affordable and powerful option, MosaicML's MPT-30B has the potential to drive a new era of accessible and impactful AI solutions for enterprises.
Observe.ai has introduced its contact center LLM, which has a capacity of 30 billion parameters, along with a generative AI suite aimed at enhancing agent performance. Link.
The company highlights the calibration and control features of its LLM, allowing users to fine-tune and customize the model according to specific contact center requirements.
Observe.ai's LLM is trained on real-world contact center interactions, providing customized AI-based tasks such as call summarization, automated QA, and coaching.
The generative AI suite aims to improve agent performance in customer interactions across phone calls, chats, queries, and complaints.
The contact center LLM has shown superior performance compared to GPT-3.5, with a 35% boost in conversation summarization accuracy and a 33% improvement in sentiment analysis.
Observe.ai prioritizes customer data privacy by training its LLM exclusively on redacted data, ensuring the absence of personally identifiable information (PII) and implementing stringent data protocols.
Dropbox has introduced AI-powered products, Dropbox Dash and Dropbox AI, to enhance knowledge work, improve productivity, and provide a personalized work experience. Link.
Dropbox Dash is a universal search tool that enables users to quickly locate information across all tools, content, and apps with a single search bar.
Dash integrates with major platforms like Google Workspace, Microsoft Outlook, Salesforce, and Notion, organizing all content in one platform and offering a personalized experience.
Dropbox AI provides quick access to information within file previews, generating summaries from documents and video previews. It also features an "Ask Questions" capability to extract information from lengthy Dropbox documents and videos.
The company plans to expand the capabilities of Dropbox AI to include folders and entire Dropbox accounts in the future.
Dropbox aims to address the challenges of modern work environments by providing AI-powered tools that alleviate overwhelm, improve content search and organization, and provide valuable insights to users.
IMAGE/VIDEO
Stability AI has announced SDXL 0.9, an advanced version of its Stable Diffusion text-to-image model, offering improved image and composition detail. Link.
SDXL 0.9 can be accessed through ClipDrop with API access coming soon, and research weights are available with an open release planned for mid-July.
The model allows for creative use cases in generative AI imagery, enabling the generation of hyper-realistic creations for various industries such as film, television, music, design, and industrial applications.
Examples demonstrate the progress made by SDXL 0.9 compared to the beta version in terms of generating aesthetic and realistic images based on prompts.
The SDXL series offers additional functionalities like image-to-image prompting, inpainting, and outpainting, expanding its capabilities beyond basic text prompting.
SDXL 0.9 boasts one of the largest parameter counts among open source image models, with a 3.5B parameter base model and a 6.6B parameter model ensemble pipeline, resulting in improved composition and processing power.
Otter.ai has introduced Otter Chat, an AI chatbot designed for work meetings, which transcribes and condenses meeting data into a conversation-like format. Link.
OtterPilot, the collaborative chatbot, generates content such as blog posts and follow-up emails based on meeting data, facilitating collaboration among team members.
Unlike platforms like ChatGPT, Otter AI Chat sources information directly from team meetings, providing more accurate and relevant insights.
The chatbot can communicate simultaneously with all team members or engage in one-on-one conversations.
Otter.ai is known for its transcription services and other features catering to remote workers, including summarizing meeting contents and creating action item lists.
Otter AI systems transcribe a significant amount of text daily, and the AI Chat feature will be available to all users soon, with a focus on data privacy and no storage of information by third parties.
SPEECH/AUDIO
Meta Platforms' AI research arm introduced Voicebox, a machine learning model that can generate speech from text and perform tasks like editing, noise removal, and style transfer. Link.
Voicebox is trained using Meta's "flow matching" technique, which allows it to learn from varied speech data without requiring careful labeling.
The model is trained to predict a segment of speech given surrounding audio and the complete text transcript, enabling it to generate natural-sounding speech from text in a generalizable way.
Voicebox can perform tasks it has not been specifically trained for, such as generating speech for new text, replicating voices across languages, and editing out mistakes in speech.
The model has limitations in transferring to conversational speech and providing full control over attributes like voice style and tone.
Due to concerns about potential misuse, Meta has not released the Voicebox model but shared technical details in a paper and included a classifier model to detect speech and audio generated by Voicebox for risk mitigation.
ElevenLabs, a startup specializing in synthetic voices generated by AI, has raised $19 million in a Series A funding round led by Nat Friedman, Daniel Gross, and Andreessen Horowitz. Link.
The funding round values ElevenLabs at $99 million post-money, a significant figure for a startup that launched just over a year ago.
The company's AI text-to-speech models can generate speech using synthetic voices, cloned voices, or entirely novel artificial voices that mimic different genders, ages, and ethnicities.
ElevenLabs is launching a workflow called Projects, which allows users to edit and create long-form spoken content within the platform.
The startup's technology has applications in scalable and multilingual audiobook creation, voicing video game characters, generating voiceovers for digital articles, supporting accessibility for visually impaired individuals, and powering AI radio.
ElevenLabs faced negative publicity when its tool was used for malicious purposes, but the company has introduced safeguards and an AI Speech Classifier to detect AI-generated content.
YouTube is testing a new tool in collaboration with AI-powered dubbing service Aloud to help creators automatically dub their videos into other languages. Link.
The tool builds upon YouTube's support for multi-language audio tracks, which allows creators to add dubbing to their videos and reach a wider international audience.
Previously, creators had to partner with third-party dubbing providers, but Aloud allows them to dub videos at no additional cost.
Aloud transcribes and translates videos, generates dubbed versions, and allows creators to review and edit the transcription.
The tool is currently being tested with hundreds of creators and will be opened to all creators soon.
YouTube aims to improve the translated audio tracks to sound like the creator's voice with more expression, lip sync, and plans to introduce features like voice preservation, better emotion transfer, and lip reanimation using generative AI in the future.
Parrot, a transcription platform for the legal and insurance industry, has raised $11 million in a Series A funding round co-led by Amplify Partners and XYZ Venture Capital. Link.
The round brings Parrot's total raised to $14 million.
The company offers speech-to-text depositions and has introduced a new feature that provides deposition summaries in seconds.
Parrot was founded in 2019 by attorney Eric Baum and a team of engineers with AI and speech-to-text transcription expertise.
The platform aims to streamline the deposition process using large language models (LLMs) and bring technology to an underserved legal industry.
Parrot plans to invest in AI for the legal and insurance domains and develop tools to address industry challenges such as booking depositions and accessing accurate transcripts.
CODE/DEVTOOLS
Prophecy, a data engineering startup, has introduced data copilot, a generative AI assistant that can create trusted data pipelines from natural language prompts, saving time for data engineers. Link.
The data copilot tool uses natural language queries to suggest pipelines that bring data together for desired reports, allowing users to preview and accept or decline the suggested pipelines.
The tool aims to reduce the bottleneck on data engineering resources and improve the consistency and quality of data products.
Prophecy creates a comprehensive knowledge graph of a company's data models, incorporating technical metadata, business metadata, and historical queries and code.
The knowledge graph is used by a large language model to translate natural language queries into performant data pipelines, with the system continuously learning and improving based on user feedback.
Prophecy also introduced a platform to build gen AI solutions on top of privately-owned, enterprise data, enabling the creation of chatbots backed by OpenAI to provide answers based on internal documents and context.
Harness, a developer toolkit company, has announced the release of its AI Development Assistant (AIDA), a generative AI assistant designed to improve developer productivity. Link.
AIDA is aimed at optimizing various stages of the software development lifecycle, including writing code, building code, testing code, ensuring security and reliability, deploying changes, verifying changes, and managing costs.
The AI assistant offers automatic resolution of build and deployment failures, helping developers identify and fix issues that arise from changes made in the development process.
It also assists in finding and fixing security vulnerabilities, with developers having the final say in implementing the fixes.
AIDA utilizes natural language processing to help control cloud costs by providing suggestions on how to find savings.
The purpose of AIDA is to speed up the development process and increase efficiency, rather than replacing developers, with the goal of making them 30%-50% more efficient in their work.
ROBOTICS
Engineers at Carnegie Mellon University (CMU) have developed a model that allows robots to learn new skills by watching videos of humans performing tasks. Link.
The Visual-Robotics Bridge (VRB) method enables robots to perform household tasks like opening drawers and picking up objects after observing the actions in a video.
The VRB model requires no human oversight and can teach new skills to robots in just 25 minutes.
The model identifies contact points and understands the motions required to complete a task, allowing the robot to perform similar actions in different environments.
The research demonstrates that robots can learn from internet and YouTube videos, expanding their knowledge and capabilities.
The robots involved in the study successfully learned 12 new tasks during real-world tests, and the researchers plan to further develop the VRB system for multi-step tasks.
DeepMind has developed an AI model called RoboCat that can perform various tasks across different models of robotic arms. Link.
RoboCat is the first model claimed to solve and adapt to multiple tasks using different real-world robots.
The model was trained on a combination of images and actions data collected from robotics in simulation and real life.
Researchers collected demonstrations of tasks using a robotic arm controlled by a human and fine-tuned RoboCat on the task to create specialized models.
RoboCat was trained on a total of 253 tasks and benchmarked on 141 variations, showing varied success rates ranging from 13% to 99%.
DeepMind aims to reduce the number of demonstrations needed to teach RoboCat new tasks to fewer than 10 in future developments.
CHIPS
Cisco Systems has launched networking chips for AI supercomputers to compete with Broadcom and Marvell Technology. Link.
The chips from Cisco's SiliconOne series are being tested by major cloud providers, including Amazon Web Services, Microsoft Azure, and Google Cloud.
The speed of communication between individual chips has become crucial with the increasing popularity of AI applications like ChatGPT.
Cisco's latest ethernet switches, G200 and G202, offer double the performance of the previous generation and can connect up to 32,000 GPUs.
The new chips aim to enable more efficient AI and machine learning tasks, requiring fewer switches and reducing lag.
In April, Broadcom introduced the Jericho3-AI chip, capable of connecting up to 32,000 GPU chips together.
POLICY/LAW/ETHICS
Senate Majority Leader Chuck Schumer has stated that lawmakers will need to start from scratch in figuring out how to regulate the new wave of artificial intelligence in the U.S. Link.
Schumer acknowledged the complexity of AI and the lack of historical precedent for Congress to work off of in addressing its regulation.
Despite the challenges, Schumer expressed confidence that Congress is capable of addressing AI usage and mentioned his plans to reveal the framework of the SAFE Innovation Act, a bill aimed at protecting and harnessing the potential of AI.
The interest in AI regulation among lawmakers coincides with the introduction of AI-powered services by various companies, such as Microsoft's Teams Premium powered by Open AI's ChatGPT program.
President Biden also recently met with AI experts and researchers in San Francisco to discuss managing the risks associated with the new technology.
Hugging Face CEO Clement Delangue testified at a U.S. House Science Committee hearing, emphasizing the importance of open science and open-source AI for incentivizing innovation and aligning with American values and interests. Link.
Delangue credited open-source technologies such as PyTorch, Tensorflow, Keras, transformers, and diffusers for powering AI progress and positioning the U.S. as a leading country in AI.
The testimony comes in the context of concerns raised about the potential misuse of open-source AI models, such as Meta's LLaMA, expressed in a letter from Senators to Mark Zuckerberg.
Hugging Face, a New York-based startup, has emerged as a hub for open-source code and models and has played a prominent role in the open-source AI community.
Delangue highlighted that open science and open source support the development of AI startups and enable civil society, nonprofits, academia, and policymakers to counterbalance the power of large private companies.
Hugging Face's approach to ethical openness involves institutional policies, technical safeguards, and community incentives to ensure accountability, mitigate biases, reduce misinformation, and promote the involvement of all stakeholders in the value creation process.
The UK government is allocating £21 million to National Health Service (NHS) trusts to accelerate the deployment of promising AI tools in hospitals. Link.
The NHS has faced controversy in the past regarding its data-sharing partnership with Google's DeepMind.
The UK aims to position itself as a leader in AI development, offering guidelines and funding for AI projects.
The AI Diagnostic Fund has been launched to bring AI imaging and decision-support tools to diagnose and treat heart conditions, cancer, and strokes.
The government aims to deploy AI tools across all NHS stroke networks by the end of the year and utilize AI to analyze chest x-rays for early lung cancer detection.
NHS Trusts can apply for funding for any AI diagnostic tool, subject to justification based on value-for-money.
OTHER
AWS is launching a $100 million Generative AI Innovation Center to advance its AI accelerator efforts. Link.
The center aims to diversify interest in generative AI beyond the dominant ChatGPT and expand AWS's market share in cloud computing for generative AI services.
The center will serve as an AI sandbox and tutoring service, connecting AWS AI experts with customers to guide them in building and deploying custom generative AI products and services.
AWS encourages customers to start by cleaning up their data and then collaborate with AWS to brainstorm and create prototypes.
The expected use cases for generative AI include enhancing customer experiences, optimizing business operations, and increasing creative production.
AWS emphasizes the importance of cloud computing for training on large-scale data, as data centers or server farms are not as effective for this purpose.
Google testers can now utilize a new Duet AI feature in Google Workspace, specifically in Google Sheets. Link.
The feature allows users to describe their desired actions, and Duet AI creates custom templates to facilitate those actions.
This feature is particularly useful for tasks involving complex organization and tracking, such as product roadmaps, company retreats, and team budgets.
If the feature functions as intended, it has the potential to significantly save users time.
The feature is currently available in Workspace Labs.
Google has been expanding Duet AI's capabilities across various Workspace tools, including Docs, Gmail, and Slides, with plans to introduce more generative AI features in the future. Additionally, Bard AI can now export data to Google Sheets for organization and modification.
Parallel Domain, a San Francisco-based startup, has launched an API called Data Lab that allows customers to generate synthetic datasets using generative AI. Link.
The API gives machine-learning engineers control over dynamic virtual worlds, allowing them to simulate various scenarios.
Engineers can generate objects not available in the startup's asset library and layer real-world randomness on top of the 3D simulation.
The goal is to provide autonomy, drone, and robotics companies with more control and efficiency in building large datasets for training models.
Data Lab enables customers to create new datasets in near real-time, compared to weeks or months it took previously.
The API has the potential to accelerate the development of autonomous driving systems and other industries where computer vision is used.
Dropbox has launched Dropbox Ventures, a $50 million venture fund focused on AI startups. Link.
The fund aims to support startups that are developing AI-powered products to shape the future of work.
VC investments in AI have been increasing, with AI startups receiving over $52 billion in funding in the last year alone.
Dropbox has announced new AI-powered features for its cloud storage product, including Dropbox Dash, a universal search bar that can search across various tools and platforms.
Another feature, Dropbox AI, summarizes and extracts information from files stored in a Dropbox account using OpenAI's model.
Dropbox emphasizes its commitment to building fair, reliable, and privacy-conscious AI technologies.
OpenAI is reportedly considering creating a marketplace for AI models, similar to an AI 'app store'. Link.
The marketplace would enable companies to buy and sell customized chatbots tailored to specific needs.
The goal is to make advanced chatbots more useful across industries, allowing businesses to provide quick answers to industry-specific questions using current data.
OpenAI's marketplace could compete with its current partners like Salesforce and Microsoft, who have their own stores for selling AI chatbots built on OpenAI's technology.
Aquant and Khan Academy have already shown interest in OpenAI's marketplace, as they have developed customized versions of ChatGPT for their respective purposes.
OpenAI launched ChatGPT plugins earlier this year but has not gained much popularity, indicating the need for further product development and market research before launching a full-scale marketplace.
Opera has launched Opera One, a new version of its browser that includes an AI-powered chatbot called Aria. Link.
Aria lives within the browser's sidebar and can answer questions, generate text or code, brainstorm ideas, and more.
The chatbot is powered by Opera's Composer AI engine and connects to OpenAI's GPT model.
Users need to sign up for an Opera account to use the tool and can access Aria by clicking the Aria icon on the left side of the screen.
Aria can be used through a command line-like overlay or by highlighting text on a webpage for translation, explanation, or finding related topics.
While Aria offers similar functionalities to the Bing chatbot on Microsoft Edge, it lacks some features like a conversation style menu and one-click options for generating text.