AI Weekly: 09/25/23

OpenAI releases DALL-E 3, YouTube supercharges its video creation tools, and Google's Bard allows us to generate insights from our emails and documents.

Good morning and welcome to this week’s edition of AI Weekly! In this week’s news, OpenAI has released its next generation text-to-image model, DALL-E 3, and YouTube introduced a range of new AI-powered tools aimed at assisting creators in video production.

In Google news, their chatbot, Bard, now integrates with Gmail, Docs, and Drive, enabling users to seamlessly summarize emails, highlight key points in documents, or even transition that information into charts or bulleted summaries.

In devtool news, GitHub has extended the availability of its AI-powered Copilot Chat tool to individual users in Visual Studio and Visual Studio Code, following its public beta launch for business users in July.

Continue reading below for more of last week’s exciting AI news!

- ZG

Here are the most important stories of the week:

TEXT

More information has been released around Google’s next-generation multimodal AI model, Gemini, with its development spearheaded by a team from the merged AI divisions of DeepMind and Google Brain. Link.

  • Touted as a notable advancement in natural language processing, Gemini is anticipated to process both images and text, enhancing functionalities like analyzing visual graphs in text, and augmenting code-generating capabilities to compete with Microsoft's GitHub Copilot, powered by OpenAI.

  • The tech community has been closely monitoring the upcoming release, speculating on Gemini's potential to surpass the capabilities of GPT-4.

  • The model draws inspiration from AlphaGo, a historic AI by Google's DeepMind that defeated a professional human Go player, and aims to meld the technological strengths of AlphaGo with the extensive language capabilities of large models like ChatGPT.

  • Early user feedback suggests an advantage for Gemini over GPT-4, attributed to Google's extensive consumer product data and internet information, which might enhance the model's comprehension of user intents.

  • The testing phase has also hinted at Gemini's potential to generate fewer incorrect responses, a known issue in AI referred to as hallucinations, which should be further bolstered by Google's access to high-grade chips.

Google's Bard AI chatbot now integrates with Gmail, Docs, and Drive, enabling users to extract and utilize information from these platforms efficiently by asking Bard to summarize emails, highlight key points in documents, or even transition that information into charts or bulleted summaries. Link.

  • Bard’s extended functionalities, termed as extensions by Google, aim to alleviate the burden of manually sorting through emails or documents to find specific information.

  • Although integrating Bard with personal emails and documents raises privacy concerns, Google assures that the data accessed will neither be used to train Bard’s public model nor be viewed by human reviewers.

  • To utilize this feature, users can either direct Bard to search within Gmail by prefacing questions with @mail or simply ask Bard to check for specific information, enhancing user convenience and the seamless interaction between the chatbot and personal data.

  • Beyond Gmail, Docs, and Drive, Bard is also integrating with Maps, YouTube, and Google Flights, enabling users to obtain real-time flight information, discover nearby attractions, or find YouTube videos on specific topics, making Bard a more versatile and helpful tool.

  • Additional updates to Bard include a "Google It" button to cross-verify Bard’s responses through Google Search, and support for Google Lens and code generation/debugging capabilities, highlighting Google's ongoing efforts to enhance Bard's usefulness and accuracy.

Microsoft introduced "Copilot," a new AI-powered assistant for Windows PCs, designed to assist users with various computing tasks such as generating text, finding focus music, organizing windows on the screen, and aiding with creative endeavors around photos or video. Link.

  • This innovation aims to amalgamate Microsoft's existing AI tools across different applications into a "single experience," enhancing the coherence and utility of AI assistance for users.

  • Copilot, integrated with OpenAI's ChatGPT within Bing, represents a step towards natural language interaction where users can articulate their requests naturally, and the technology can respond, create, or act accordingly.

  • Set to be launched as a free update to Windows 11 starting September 26, Copilot illustrates a new phase in AI that alters how users interact with and benefit from technology, as described by Microsoft consumer chief marketing officer Yusuf Mehdi.

  • Copilot is engineered to assimilate context and intelligence from the web, work data, and user's current PC activity to offer refined assistance, with a strong emphasis on preserving user privacy and security.

  • Besides Copilot, Microsoft's announcement was juxtaposed with Amazon's significant update to Alexa for more natural conversation capabilities, and Microsoft also launched new hardware including the Surface Laptop Studio 2, Surface Laptop Go 3, and Surface Go 4 tablet during the event.

San Francisco-based AI startup Galileo updates its LLMs Studio to help users understand and explain LLMs output through new monitoring and metrics capabilities, shedding light on occurrences like AI model hallucinations. Link.

  • The updated studio enables real-time evaluation and observation of both inputs and outputs to the LLMs, providing insights into the generation logic of model outputs and helping to optimize the models through new metrics and guardrails.

  • Galileo intercepts API calls going into the LLMs and those for generated output, offering near real-time data on model performance and output accuracy, aiming for a continuous improvement of LLM applications.

  • Galileo's update introduces guardrail metrics allowing users to set limitations on what the model can generate in terms of information, tone, and language to ensure compliance with regulatory standards, especially in sensitive sectors like finance and healthcare.

  • A new metric called "groundedness" is introduced to evaluate whether a model’s output is relevant or within the bounds of its training data, helping to detect when a model deviates from the expected contextual responses.

  • Galileo's advancements aim to reduce risks associated with inaccurate or inappropriate model outputs, providing a more transparent and controlled use of LLMs, and addressing industry challenges such as hallucinations and model grounding in real-time operational settings.

IMAGE/VIDEO

OpenAI unveiled an enhanced version of DALL-E, dubbed DALL-E 3, integrated with ChatGPT to simplify the process of crafting prompts for image generation. Link.

  • DALL-E 3 uses ChatGPT to assist users in refining prompts for image requests, enabling a more interactive and precise input for image generation directly within the chat app.

  • The integration helps expand a short prompt to be more descriptive, guiding DALL-E 3 to produce better-aligned image outputs.

  • Besides ChatGPT integration, DALL-E 3 now generates higher-quality images, particularly with longer prompts, and handles challenging content like text and human hands more effectively.

  • New features in DALL-E 3 aim to address algorithmic bias and improve safety, such as rejecting requests mimicking living artists' style or portraying public figures, while allowing artists to opt-out from having their artwork used in training future models.

  • DALL-E 3 rollout is set for October to premium ChatGPT users, with subsequent access to research labs and API customers, keeping OpenAI competitive in the rapidly evolving generative AI and image-synthesizing domain.

YouTube introduced a range of new AI-powered tools aimed at assisting creators in video production and expanding their audience reach during their annual "Made On YouTube" product event. Link.

  • CEO Neal Mohan emphasized the potential of generative AI in simplifying creative expression and intends to make these powerful tools accessible to a broader user base.

  • Among the new features is "Dream Screen," an experimental tool for YouTube Shorts, allowing creators to add AI-generated video or image backgrounds to their vertical videos by simply typing an idea into a prompt.

  • Other AI tools unveiled include a brainstorming aid for video outlines, music search via descriptive phrases, and an AI-powered dubbing tool for language translation, facilitating global content sharing.

  • Noted creator Alan Chikin Chow expressed excitement for the AI dubbing feature, which could help him reach a wider international audience, also acknowledging the role of AI as a collaborative tool in enhancing creative work.

  • Amid the technological advancements, some industry experts expressed concerns regarding potential misuse like misinformation spread through deepfakes, intellectual property rights issues, and broader societal risks associated with powerful AI tools.

Capsule, a startup specializing in AI-powered video editing, is launching its enterprise-focused editor to the public, aiming to accelerate video production for content and marketing teams. Link.

  • The platform leverages AI to simplify video editing, offering an intuitive user interface and browser-based editing capabilities, eliminating the need for high-end computing resources.

  • Users can easily generate titles, captions, images, and motion graphics using AI, making video creation more accessible without formal editing experience.

  • CapsuleScript, a video scripting language developed by Capsule, powers the video editing process, incorporating layout, animation, dynamic expressions, and modular components.

  • After successful beta testing with over 160 companies, including HubSpot and Zapier, Capsule is launching into public beta.

  • The platform offers free access for individual business users with company email addresses, while enterprise pricing is competitive with other creative tools. Capsule has raised $7.75 million in funding to date from various investors.

SPEECH/AUDIO

Amazon unveiled a new Alexa voice assistant powered by its new Alexa LLM with improved conversational capabilities and contextual understanding. Link.

  • The new Alexa can respond to more conversational phrases and requires less specific commands, making it easier for users to interact with their smart home devices.

  • Amazon has funneled over 200 smart home APIs into the new LLM, enabling it to proactively manage and control various connected devices based on context.

  • This Alexa version can handle multiple requests within a single command, allowing users to execute complex actions with ease.

  • Developers can leverage Alexa's new cognitive functions and integrate their products and services into the more conversational format using tools like Dynamic Controller and Action Controller.

  • The new Alexa will launch first in a preview program in the US, with additional invitation-only previews for smart home features at a later date.

CODE/INFRA

GitHub has extended the availability of its AI-powered Copilot Chat tool to individual users in Visual Studio and Visual Studio Code, following its public beta launch for business users in July. Link.

  • Copilot Chat assists users in real-time as they write code, offering a platform to learn new languages or frameworks, troubleshoot bugs, and answer coding questions using simple, natural language interactions within the Integrated Development Environment (IDE).

  • This service is now accessible in public beta for GitHub Copilot individual users, enhancing their coding experience by providing on-the-go assistance.

  • Interested individuals can subscribe to GitHub's Copilot tier at a cost of $10 per month or $100 per year to access this feature.

  • The realm of coding assistance has seen a growing presence of AI chatbots, with tech giants like Google and Amazon launching similar tools aimed at aiding developers.

  • The introduction of Copilot Chat aligns with a broader industry trend leveraging AI to simplify and enhance the coding process, showcasing GitHub’s commitment to offering innovative solutions for both teams and individual developers.

HiddenLayer, a cybersecurity startup based in Austin, Texas, has secured a $50 million Series A funding round led by M12 (Microsoft's Venture Fund) and Moore Strategic Ventures, with participation from Booz Allen Ventures, IBM Ventures, Capital One Ventures, and Ten Eleven Ventures. Link.

  • The company focuses on safeguarding AI and machine learning (ML) models used by enterprises, with clients including Fortune 100 firms in sectors like finance, government, defense, and cybersecurity.

  • HiddenLayer's "MLSec" Platform passively monitors the performance and operations of enterprise ML/AI models in real-time, scanning for vulnerabilities, offering recommendations for hardening, and detecting malicious code or malware injections.

  • The platform provides a dashboard for security managers to assess the security state of their AI models, prioritizes security issues, and facilitates compliance, auditing, and reporting.

  • HiddenLayer offers consulting services by Adversarial Machine Learning (AML) experts, including threat assessments, training, and red team exercises to test and improve clients' defenses.

  • The company plans to expand its team by hiring 40 personnel by the end of the year and continue growing its client base after raising the Series A funding.

Anyscale, the lead commercial vendor behind the open-source Ray framework for distributed machine learning training and inference, has announced the general availability of Anyscale Endpoints, which enables organizations to fine-tune and deploy open-source LLMs easily. Link.

  • Anyscale has expanded its partnership with Nvidia to optimize Nvidia's software for inference and training on the Anyscale Platform.

  • The company highlights its success metrics, including Instacart's ability to train models up to 12 times faster with 100 times more data and Pinterest's 40% cost reduction for AI processing training of thousands of models using Ray.

  • Anyscale Endpoints is a service that provides API access to open-source LLMs without requiring organizations to deploy or manage the models on their own.

  • The company is enabling fine-tuning for open-source LLMs to help organizations customize models for improved performance and quality, especially for smaller, more cost-efficient models.

  • Anyscale is also launching Private Endpoints, allowing organizations to deploy Anyscale Endpoints within their own virtual private cloud (VPC), providing control over sensitive data and backend deployment customization, making it more efficient and cost-effective to work with LLMs.

Secoda, a Toronto-based AI-powered platform for data search, cataloging, lineage, and documentation, has raised $14 million in a Series A funding round, bringing its total funding to $16 million. Link.

  • The investment was led by existing investor Craft Ventures and included participation from Abstract Ventures, YCombinator, Garage Capital, and notable data ecosystem leaders.

  • Secoda's platform aims to make it easy for enterprise users, regardless of their technical background, to search, understand, and use company data, with a user experience similar to searching on Google.

  • The company addresses the challenge of disjointed data in enterprise IT stacks, where data is siloed across various systems, making it difficult for employees to access and use relevant data.

  • Secoda integrates with business intelligence and transformation tools as well as data warehouses to create a unified data catalog. Users can write documentation and search the catalog using natural language queries, thanks to a ChatGPT-powered assistant.

  • With the new funding, Secoda plans to strengthen its engineering team, conduct further research and development, and introduce Secoda Monitoring to ensure data quality and accuracy, aiming to provide users with insights and operational efficiency tracking for their data teams.

POLICY/LAW/ETHICS

Anthropic has released a Responsible Scaling Policy (RSP) aimed at mitigating catastrophic risks associated with advanced AI models. Link.

  • The RSP highlights Anthropic's commitment to reducing risks linked to AI systems that could cause large-scale devastation, leading to "thousands of deaths or hundreds of billions of dollars in damage."

  • The policy introduces AI Safety Levels (ASLs), a risk tiering system ranging from ASL-0 (low risk) to ASL-3 (high risk) to reflect and manage potential AI risks.

  • Anthropic acknowledges that the policy is not static but will evolve as the company learns and gains feedback. The goal is to ensure responsible scaling of AI systems without reckless expansion.

  • The policy includes measures for independent oversight, requiring board approval for policy changes to prevent potential bias and uphold safety standards.

  • Anthropic's commitment to AI safety and ethics aligns with growing industry scrutiny and regulation, setting a high standard for ethical and safe AI development.

Some in the music industry are exploring legal avenues that protect against the unauthorized use of an artist's likeness, sidestepping copyright complexities and drawing on existing state-level publicity rules. Link.

  • The creative industries, including music, are increasingly affected by AI technologies, with examples such as AI-generated music, voice cloning, and deepfakes becoming prominent.

  • Copyright issues are at the forefront of AI regulation debates, with AI models using vast amounts of copyrighted data for training, leading to legal disputes in both music and other creative fields.

  • Likeness laws primarily focus on protecting a person's reputation and identity, including their voice, face, or name, and can provide a promising legal framework for artists concerned about AI imitating their work.

  • AI poses unique challenges under copyright law, as it generates work that may resemble an artist's style without directly copying any specific content.

  • The current state of US likeness law is fragmented, with only 14 states having specific statutes, making it challenging to establish a comprehensive legal framework for regulating AI-generated content.

OTHER

A study by researchers from the University of California-Irvine and MIT challenges the existing assumptions regarding the energy use of generative AI models, asserting that AI systems emit significantly less carbon dioxide equivalents (CO2e) compared to humans when generating text or images. Link.

  • The paper finds that producing a page of text via AI systems like ChatGPT emits 130 to 1500 times fewer CO2e and creating an image via AI systems like DALL-E 2 emits 310 to 2900 times less CO2e compared to human-driven processes.

  • The study underpins the potential of AI in performing several significant activities with considerably lower emissions, indicating a positive environmental impact.

  • However, this study sparked a debate among AI experts regarding the complexity of measuring interactions between climate, society, and technology, and the challenges that arise in accurately accounting for these factors.

  • Critics point out flaws in the study's methodology, particularly in directly comparing human emissions to AI models and argue that there's a lack of real-world data and transparency from tech companies on hardware usage, energy consumption, and energy sources, which are crucial for accurate carbon footprint estimates.

  • The authors emphasize the importance of a transparent, science-based approach to understand the environmental impact of AI and invite others to test their results, underscoring the broader discussion of sustainability in AI development and usage.

Gizmo, a generative AI learning startup, secured $3.5 million in seed funding to globally scale its platform, hire additional engineers, and introduce new features, offering a gamified quiz-based learning experience generated from user’s class notes. Link.

  • The AI can extract information for quiz generation from various formats like PDFs, PowerPoint presentations, web pages, or YouTube videos, with an additional feature to import notes from other flashcard tools like Quizlet and Anki.

  • Proprietary LLMs are utilized to transform user content into quizzes, and users can also request AI-generated notecard decks from scratch on specific topics which can be shared privately or publicly.

  • Gizmo employs learning science techniques like active recall and spaced repetition in its multiple-choice quizzes, aiming to enhance memory retention in a fun, engaging manner similar to how language learning app Duolingo uses daily streaks for user engagement.

  • A subscription model offers users additional benefits like unlimited lives and AI-generated quizzes for $8.80 per month or $52.80 per year.

  • With a user base of over 300,000 and growing at 50% month-over-month, Gizmo is set to roll out new features including on-platform note-taking, collaborative study sessions, and direct AI questioning.

KYP.ai, a European productivity optimization software startup, has raised nearly $18.7 million in a Series A funding round led by OTB Ventures, with participation from existing investors 42CAP and Tola Capital. Link.

  • The company plans to use the funding to scale up in the U.S. and expand its customer base in Europe and Asia, helping clients adopt new generative AI models, apps, agents, and tools.

  • KYP.ai offers a secure web app management tool called "Productivity 360°," which helps automate various business tasks by analyzing data infrastructure, identifying inefficiencies, and suggesting automation solutions.

  • The platform also provides a "heatmap" to visualize repetitive tasks and opportunities for automation, resulting in average annual savings of $2.7 million and a 37% increase in automation across the client base.

  • KYP.ai assists businesses in adopting new generative AI tools and technologies, acting as a consultant and resource for clients looking to leverage Gen AI for efficiency gains.

  • Notable clients using KYP.ai's Productivity 360° include DHL, Mindsprint BPS, Hollard, Qinecsa, Allied Global, and Alorica, with the platform helping organizations adapt to hybrid and remote workforces and software processes.