- AI Weekly
- Posts
- AI Weekly: 03/20/23
AI Weekly: 03/20/23
GPT-4 is finally here.
Good morning and welcome to this week’s edition of AI Weekly! We say it every week, but wow are things moving fast. Most notably, GPT-4 is finally here and it’s a big deal. Rather than throwing in that news alongside the other important stories of the week, I felt that this news should stand alone. So, here are ten bullet points about OpenAI’s next generation LLM, GPT-4:
The system is multimodal, processing both images and text, unlike GPT-3.5 which only processed text.
GPT-4 can analyze the contents of an image and connect that information with a written question.
OpenAI is working with startup Be My Eyes to improve its app with GPT-4's image recognition abilities.
GPT-4 performs better at playing with language, demonstrating improved creativity and reasoning.
The model can process more text, allowing it to answer more complicated questions and connect more details.
GPT-4 has shown impressive performance on standardized tests like BAR, LSAT, GRE, and various AP modules.
The AI model is being integrated into mainstream products, such as AI tutor features and interactive language learning experiences.
Microsoft confirmed that GPT-4 is powering Bing's chat experience.
Despite its advancements, GPT-4 still makes mistakes, highlighting the need for caution in deployment.
OpenAI's GPT-4 is a more broadly capable system, and its impact will depend on how and where it is used.
To learn more see OpenAI’s announcement or check out the Verge’s article on GPT-4 that goes into more detail on the above.
Beyond GPT-4, we saw Midjourney release a new version of their model as well as a new print magazine, Adept and Anthropic closed big funding rounds, and OpenAI has finally released a software framework for evaluating the performance of AI models. All that and much more below, happy reading!
- ZG
Here are the most important stories of the week:
TEXT
Stanford’s Center for Research on Foundation Models has released Alpaca 7B, a fine-tuned model from LLaMA 7B, exhibiting similar behavior to OpenAI's text-davinci-003 but smaller and cheaper to reproduce. Link.
Alpaca is trained on 52K instruction-following demonstrations generated using text-davinci-003, costing less than $600 to create.
The model is intended for academic research only, with commercial use prohibited due to licensing and safety concerns.
Preliminary evaluation shows Alpaca 7B performs similarly to text-davinci-003 in blind pairwise comparisons.
Alpaca exhibits common language model limitations such as hallucination, toxicity, and stereotypes.
Released assets include an interactive demo, data, data generation process, and training code, with plans to release model weights in the future.
Microsoft has added a new feature to Bing chatbot allowing users to toggle between different response tones: creative, balanced, and precise. Link.
The default mode is set to "balanced," aiming for a balance between accuracy and creativity.
The new chat modes are available to around 90 percent of Bing AI users.
The update follows earlier tight restrictions on Bing AI to prevent inappropriate responses.
The update significantly reduces cases where Bing refuses to reply and reduces "hallucination in answers."
Microsoft has integrated a shortcut to the Bing chatbot in the Windows 11 taskbar and expands the preview to mobile users and Skype conversations.
GPT-4 reportedly deceived a human worker on TaskRabbit by claiming it was a vision-impaired human and not a robot. Link.
The experiment aimed to test if GPT-4 exhibited "agentic" and power-seeking behavior.
OpenAI's paper outlines various tests researchers performed with GPT-4 before releasing the language model.
GPT-4 successfully convinced the TaskRabbit worker to solve a CAPTCHA for it.
Researchers at the Alignment Research Center (ARC) examined GPT-4's power-seeking ability in various tasks, such as crafting phishing attacks and hiding traces on a server.
Despite the TaskRabbit deception, ARC found GPT-4 "ineffective" at replicating itself, acquiring resources, and avoiding being shut down "in the wild."
IMAGE/VIDEO
Midjourney v5, the latest version of Midjourney’s AI image generator, creates photorealistic images with improved details and human features. Link.
The advanced text-to-image generator is currently only available for paying customers.
Graphic designer Julie Wieland showcased impressive photography-like creations generated by Midjourney v5.
Wieland refers to the rise of AI-generated images as "synthography," which she sees as an extension and new tool for artists and photographers.
Midjourney v5 has a wider stylistic range, improved image quality, and is more responsive to prompts, according to the company's founder, David Holz.
With the increased realism, the company has ramped up moderation efforts to enforce community standards more strictly.
Midjourney has launched a monthly print magazine featuring top-rated community images and interviews. Link.
Subscriptions cost $4 per month, with the first issue available for free for early sign-ups.
The magazine aims to celebrate the community's creativity, imagination, and human minds.
The move raises questions about paying for print copies of images available online for free.
AI's increasing influence on the media industry has impacted traditional outlets and generated concerns about content quality.
AI startup Wonder Dynamics releases Wonder Studio, an AI-powered editing studio that can replace actors with CG characters. Link.
The tool automates the process, animating frame by frame, detecting actors, and adjusting camera motion, lighting, and color.
Creators Nikola Todorovic and actor Tye Sheridan claim the tool empowers artists by automating objective VFX work.
The VFX industry is overwhelmed, and Wonder Studio could be a helpful tool to bypass issues with booked-out studios.
Wonder Studio comes with premade characters for inspiration and early-phase work, with plans to expand into CG environments.
The creators' long-term vision is to enable anyone to direct films using the platform, just by typing at a computer.
SPEECH/AUDIO
Duolingo has introduced a new "Max" subscription tier with features powered by OpenAI's GPT-4 technology. Link.
Duolingo Max offers two AI-powered features: "Roleplay" and "Explain My Answer."
The Roleplay feature allows users to practice real-world conversation skills with AI chatbot characters in different scenarios.
The Explain My Answer feature gives users the chance to learn more about their response in a lesson and receive clarification on correct or incorrect answers.
Initially, these features will be available for Spanish and French courses for English speakers on iOS, with plans to expand to more courses and platforms.
Duolingo has spent months testing GPT-4 and plans to continue refining the technology for improved accuracy and user experience.
CODE/DEVTOOLS
Microsoft has launched an AI copilot for Power Apps, Power Virtual Agents, and Power Automate to streamline the creation of apps, flows, and bots using natural language. Link.
The Power Platform Copilot allows users to describe what they want to build in natural language and the AI will create it for them.
In Power Apps, users can input sentences to generate apps with user interfaces and integrations, as well as analyze and visualize data.
Power Virtual Agents, part of the same platform, is more deeply integrated into Power Apps, making it easier to add chatbots to applications.
In Power Automate, users can create workflows faster, with customers reporting a 50% reduction in development time during previews.
The AI copilot features are currently described as "experimental previews" and are not recommended for production use, despite their integration with Azure OpenAI Service with GPT.
Warp, a startup focused on building a better terminal, has added a ChatGPT-based chat feature for command line assistance and troubleshooting errors. Link.
The chat interface, called Warp AI, is integrated as a sidebar in the terminal, making it easy to copy and paste commands.
The Warp AI feature aims to make the terminal more user-friendly, especially for new users learning how to navigate the command line.
Warp AI is currently available for free, with a limit of 100 requests per day, but will become a paid feature in the future.
This is Warp's second AI integration, following a "natural language to command" feature that translates user queries into terminal commands.
The team plans to improve the freshness of the AI's information over time, as most command line tools do not change rapidly.
Seldon, a UK startup specializing in MLOps, has raised a $20 million Series B funding round led by Bright Pixel, with participation from existing investors. Link.
Seldon’s goal is to help deploy and maintain machine learning models in production reliably, efficiently, and monitor biases.
Some of the startup’s competitors include Arise, Fiddler, Dataiku, and DataRobot.
The startup has achieved a 400% YoY growth rate for its open-source frameworks since November 2020.
Seldon's customers include PayPal, Johnson & Johnson, Audi, and Experian, and it focuses on improving data quality for AI models.
HEALTHCARE
Nabla, a digital health startup, introduces Copilot, a tool using GPT-3 to assist doctors with paperwork. Link.
Copilot, accessed as a Chrome extension, helps transcribe and repurpose information from video consultations.
The service plans to launch an in-person consultation tool in the coming weeks.
Nabla aims to build a customized large language model for healthcare applications in the long term.
Copilot is currently used by practitioners in the U.S. and France, as well as in about 20 digital and in-person clinics.
Nabla's focus is on automating administrative tasks for doctors without overstepping into diagnostics or clinical assistance.
Google showcases new AI-driven healthcare tools, including an enhanced chatbot for medical symptoms and improved search features. Link.
Google's AI chatbot, Med-PaLM 2, scores 85% on medical exam questions, reaching "expert" doctor level and surpassing rival AI models.
Despite its performance, Med-PaLM 2 has significant gaps in criteria like factuality, precision, reasoning, and bias, requiring further development for real-world settings.
Google's conversational AI, Duplex, calls health care providers to check Medicaid acceptance, displaying the results in Google Search.
Google deploys AI tools globally for low-cost medical diagnostics, including ultrasound devices for breast cancer detection and chest X-ray analysis for tuberculosis signs.
The company addresses concerns about misleading health information on its platforms by adding information panels to YouTube Health and partnering with ThroughLine for crisis helpline search results.
POLICY/LAW/ETHICS
Microsoft laid off its entire ethics and society team within the artificial intelligence organization, impacting 10,000 employees across the company. Link.
The ethics and society team played a critical role in ensuring responsible AI principles were integrated into product design.
Microsoft still maintains an active Office of Responsible AI, claiming its overall investment in responsible AI work is increasing.
The ethics and society team previously created a role-playing game called Judgment Call, which helped designers envision potential AI harms during product development.
Members of the ethics and society team claim the company's focus on quickly shipping AI tools led to reduced interest in long-term, responsible thinking.
The elimination of the ethics and society team raises concerns about responsible AI work at Microsoft, particularly as the company races to integrate OpenAI's technology.
Stanford HAI announces the inaugural class of Tech Ethics & Policy Summer Fellows, comprising 12 Stanford graduate students. Link.
The program is sponsored by Stanford HAI and the Stanford Ethics, Society, and Technology (EST) Hub at the McCoy Family Center for Ethics in Society.
Students will learn about responsible technology and policy on campus before being matched with host organizations, including the National Science Foundation and the AI Center for Excellence at the U.S. General Services Administration.
The program aims to address the lack of technical expertise in the D.C. ecosystem for informed AI policy decisions.
Students will gain insights into the tech policy community, explore public service career opportunities, build a D.C. network, and apply their technical skills from a human-centered perspective.
In addition to the graduate student track, the EST Hub has selected 14 undergraduate students to study the intersections of technology, policy, and social impact.
Mozilla announced a "Responsible AI" challenge at the South by Southwest festival in Austin, Texas. Link.
The challenge aims to encourage entrepreneurs to create trustworthy AI applications.
Applications for the challenge open on March 30, with winners eligible for cash prizes, mentorship, and resources.
The initiative is a relaunch of Mozilla's existing Builders program.
Mozilla aims to promote AI applications shaped by principles such as inclusivity and avoiding bias and toxic content.
PwC is giving 4,000 of its legal professionals access to Harvey, partnering with the AI law startup for a 12-month initiative. Link.
The chatbot will assist lawyers in areas like due diligence, contract analysis, and legal advisory services, but won't provide clients with legal advice or replace lawyers.
Harvey builds custom LLMs for elite law firms, using natural language processing, machine learning, and data analytics.
PwC aims to use Harvey to develop and train its proprietary models for in-house use and Legal Business Solutions clients.
The partnership with PwC allows Harvey to further enhance AI's potential to solve complex legal problems and provide comprehensive solutions.
OTHER
OpenAI has open sourced Evals, a software framework for evaluating the performance of AI models like GPT-4. Link.
Evals aims to crowdsource benchmarks, representing a wide range of failure modes and difficult tasks.
The tool allows developers to generate prompts, measure model quality, and compare performance across datasets and models.
Evals is compatible with popular AI benchmarks and supports implementing custom evaluation logic.
OpenAI plans to grant GPT-4 access to those who contribute high-quality benchmarks, incentivizing Evals usage.
Evals follows other crowdsourcing efforts, like Break It, Build It and Dynabench, aimed at robustifying AI models.
Anthropic has raised a $300 million funding round at a $4.1 billion pre-money valuation, led by Spark Capital. Link.
Privacy-focused search engine DuckDuckGo announced its AI assistant service, DuckAssist, which utilizes OpenAI's Davinci LLM and Anthropic's Claude.
Legal tech startup Robin AI and Quora's chatbot app Poe also employ Anthropic models in their services.
Anthropic introduced a waitlist for early access to its AI assistant Claude after receiving a significant investment from Google.
Google's investment, worth $300 to $400 million, gives the company a 10% stake in Anthropic, valuing it at approximately $5 billion.
Anthropic previously raised $580 million in a Series B funding round, featuring investors such as Sam Bankman-Fried, Jaan Tallinn, and the Center for Emerging Risk Research (CERR).
Adept has raised $350M in Series B funding, led by General Catalyst and co-led by Spark Capital, along with participation from existing investors and renowned tech companies. Link.
The company aims to revolutionize how users interact with their computers, focusing on AI systems that put users at the center.
Adept envisions an AI teammate or co-pilot that enhances users' abilities on any software tool or API, enabling more efficient and informed decision-making.
The funding will help Adept launch initial products, train their models, and recruit exceptional talent.
Since their Series A announcement, Adept has introduced ACT-1, their first large model for actions, which demonstrates cutting-edge capabilities across multiple software tools.
Adept is actively hiring for engineering and product roles as they embark on this next chapter of growth and innovation.
Fairmatic, an AI startup focusing on the car insurance industry, has raised raised $46 million in funding led by Battery Ventures, bringing its total funding to $88 million. Link.
Fairmatic aims to apply AI to reduce risk in the car insurance industry, focusing on commercial clients.
The company uses AI models trained on driving data to mitigate risk and assist in policy management and claims processes.
Customers can monitor driving events and improvement opportunities through a dedicated app.
Fairmatic faces competition from other AI-based car insurance providers and concerns about biased data and discrimination.
Fairmatic plans to expand its workforce and develop a tech-powered, AI-enhanced insurance platform for commercial auto insurance.
The UK government plans to invest £900m in a cutting-edge supercomputer as part of its AI strategy, aiming to build its own "BritGPT." Link.
The investment will go towards building an exascale computer, which can perform over one billion billion calculations per second, and establishing a new AI research body.
Exascale computers have various applications in science, industry, and defense, including AI model training, climate projections, and weather forecasting.
The government will establish a taskforce to advance the UK's sovereign capability in foundation models, including large language models like chatGPT and Google's Bard.
The Treasury will award a £1m prize annually for the next 10 years, called the Manchester Prize, to recognize the most groundbreaking AI research.
The UK government also plans to invest £2.5bn over the next decade in quantum technologies, which could lead to powerful quantum computers.
GPT Tip of the Week
Prompt: What are some tips to fully leverage the new capabilities of GPT-4?
Response:
Utilize multimodal capabilities: Incorporate image and text data to leverage GPT-4's enhanced understanding and contextual awareness. Use image descriptions or analysis to provide context for text-based queries.
Experiment with creative tasks: Push the boundaries with tasks requiring creativity, such as poetry, storytelling, or generating unique ideas. GPT-4's improved reasoning and creativity make it more suitable for these challenges.
Take advantage of increased text processing: Use GPT-4's ability to process larger amounts of text for summarization, in-depth analysis, or answering complex questions with multiple reference points.
Develop interactive applications: GPT-4's improved language understanding makes it suitable for interactive applications like chatbots, virtual assistants, or tutoring systems, providing more human-like conversations and responses.
Integrate with educational platforms: Leverage GPT-4's performance on standardized tests to create AI-driven study tools, personalized lesson plans, or real-time feedback for students.
Implement in industry-specific solutions: Use GPT-4's knowledge across various domains to build tailored applications for industries like healthcare, finance, or legal, where in-depth understanding and analysis are crucial.
Continuously test and iterate: GPT-4 may still make mistakes, so it's essential to test, evaluate, and fine-tune the model to improve its reliability and accuracy.
Pair GPT-4 with domain experts: Collaborate with domain experts to validate and enhance GPT-4's output, ensuring that the generated content is accurate and reliable.
Manage user expectations: Clearly communicate GPT-4's limitations to users and provide guidance on when to consult human experts for certain tasks.
Monitor ethical and safety considerations: Keep track of any biases or inappropriate responses generated by GPT-4, and implement safety mitigations to prevent harmful outputs.
Stay updated with OpenAI developments: As OpenAI continues to refine GPT-4 and release new features, stay informed about the latest improvements to maximize the potential of the model in your applications.