- AI Weekly
- Posts
- AI Weekly: 02/27/23
AI Weekly: 02/27/23
Amazon shakes up the game with a new multimodal model, OpenAI launches a new developer platform, and one design tool reached a million users in just 134 days.
Good morning and welcome to another edition of AI Weekly! What an exciting week it’s been in the world of AI. Like I wrote a few weeks ago, every week it feels like a new foundation model is being released. This week, Amazon released its first multimodal competitor to GPT-3.5, a European model called Luminous went toe-to-toe with its American counterparts in what some are calling the “Ryder Cup of AI,” and China continues to bolster its resources as they apply unprecedented valuations to foundation layer companies.
On the application side, one company is using AI to save endangered species, while another is creating synthetic voices so that one day we can all sound like Kendrick Lamar. We also saw one AI presentation designer claim to be the fastest productivity tool ever to reach 1 million users.
With regard to policy, we also saw 60 countries get together to endorse a non-binding resolution at the first-ever summit on Responsible Artificial Intelligence in the Military. All that and more below. Enjoy!
- ZG
Here are the most important stories of the week:
TEXT
Luminous, a European AI language model, performs on par with the world's leading AI language models while being more efficient. Link.
The benchmark tests include text classification, evaluation, and generation, as well as answering questions about specific text contents.
Luminous follows a decoder-only autoregressive architecture and is trained on a curated multilingual corpus containing sources in English, German, French, Italian, and Spanish.
The benchmarking setup uses EleutherAI’s Evaluation Harness (lm-eval) package and measures completion correctness with the soft accuracy metric and exact match accuracy metric.
Luminous outperforms smaller models on additional benchmark tasks, and its performance improves as model size increases.
Few-shot prompting helps to boost performance in completion tasks, and it allows for faster and cheaper inference using smaller models without compromising on accuracy.
Major news outlets have accused OpenAI of using their articles to train its ChatGPT software without proper licensing. Link.
Dow Jones, the parent company of the Wall Street Journal, said they take the misuse of their journalists’ work seriously and are reviewing the situation.
The concerns arose when a computational journalist discovered their work was being used to train ChatGPT and received a response naming 20 news outlets.
Other companies, including GitHub and Stability AI Ltd., have been sued for allegedly using copyrighted content without proper compensation or consent.
CNN believes that using its articles to train ChatGPT violates its terms of service and plans to reach out to OpenAI to be paid for licensing the content.
The use of artificial intelligence in the news industry has been controversial, with some journalists fearing the technology will take over their jobs and others worried about the spread of misinformation.
OpenAI LLC is launching an initiative to provide more transparency, improve quality and make ChatGPT customizable for users. Link.
The startup aims to address concerns raised about ChatGPT's politically biased, offensive or objectionable responses, with a new research effort and clearer guidelines for human reviewers.
OpenAI is releasing guidelines it provides to reviewers and plans to share demographic information about them, without violating privacy rules.
OpenAI engineers are developing features to enable users to customize ChatGPT, allowing for outputs that may be disagreed with.
The company plans to collect public input on improving default settings and restrictions for ChatGPT, and explore partnerships for third-party safety and policy audits.
OpenAI's goal is to reduce errors, improve the quality of the AI model's default settings, and create a more transparent and customizable experience for users.
Kaito, an AI-powered search engine for crypto, raised $5.3 million in a seed round led by Dragonfly Capital. Link.
The capital will be used to scale the team and accelerate development of the product, which aims to aggregate information on the social crypto landscape through platforms like Twitter, Discord, governance forums, Mirror, Medium, podcast transcripts and research, to name a few.
Kaito separates searches by "facts and opinions" and is currently focused on aggregating off-chain information and will eventually implement on-chain content, just like Glassnode or Dune Analytics.
The platform leverages AI to optimize its search engine through ranking, topic mining, personalization, recommendation, speech-to-text transcription, and AI-generated content.
The startup's search engine was in private alpha mode for its institutional investors since December and is now available to the public. A retail investor-focused search engine will be live in the second quarter of 2023.
Kaito is also planning to roll out a token-based community that will leverage a similar model to Wikipedia where users can contribute to the platform through a tiered system.
IMAGE/VIDEO
Conservation AI, a U.K.-based nonprofit, has deployed more than 70 AI-powered cameras across the world to track even the rarest and most reclusive of species in real time, enabling conservationists to protect them from threats such as poachers and fires before it’s too late to intervene. Link.
The organization was founded four years ago by researchers at Liverpool John Moores University and works with 150 organizations across the globe, including conservation societies, safaris, and game reserves.
The Conservation AI platform, built using NVIDIA Jetson modules for edge AI and the NVIDIA Triton Inference Server, in just four seconds analyzes footage, identifies species of interest and alerts conservationists and other users of potential threats via email.
The platform now enables conservationists to identify trends and species activities in real time, rapidly modeling trends in biodiversity and habitat health using a huge database of images and other metadata that would otherwise take years to analyze.
The team uses NVIDIA Jetson AGX Xavier modules to analyze drone footage from remote areas streamed to a smart controller that can count species population or alert conservationists when species of interest are detected.
The Conservation AI platform has been deployed by Chester Zoo, a renowned conservation society based in the U.K., to detect poachers in real time, including those hunting pangolins in Uganda.
SPEECH/AUDIO
Voicemod, led by CEO Jamie Bosch, has raised $14.5 million to offer tools to create entirely synthesized (unreal) voices. Link.
With AI, users can 'wear' these voices in real-time, enabling them to speak with a voice that isn't theirs, similar to Snapchat lenses or TikTok filters.
The sing-to-sing voice conversion feature, powered by AI, allows users to sing in someone else's voice, supercharging karaoke games.
Voicemod has acquired another audio effects startup, Voctro Labs, whose technology is being blended with its own to create an amped-up hybrid platform.
The sing-to-sing feature is one of many new projects and products that Voicemod plans to launch using generative AI technology.
Voicemod provides audio tools directly to consumers/creators, as well as via SDK and APIs for third parties to integrate into their own products, creating demand for its tech across the gamer-creator ecosystem.
CODE/DEVTOOLS
OpenAI is launching a new developer platform called Foundry that allows customers to run the company's newer machine learning models on dedicated capacity. Link.
Foundry is designed for cutting-edge customers running larger workloads, allowing inference at scale with full control over model configuration and performance profile.
The company delivers a static allocation of compute capacity dedicated to a single customer, with version control and more robust fine-tuning for OpenAI's latest models.
Foundry offers service-level commitments for instance uptime and on-calendar engineering support, with rentals based on dedicated compute units with three-month or one-year commitments.
The compute costs of running AI models are expensive, with some instances on Foundry costing $78,000 for a three-month commitment or $264,000 over a one-year commitment.
OpenAI is under increasing pressure to turn a profit after a multibillion-dollar investment from Microsoft, and has launched monetization efforts such as ChatGPT Plus, Bing Chat, and Copilot.
LangChain and Chroma have partnered to provide an easy-to-use framework for AI-native app development, combining LangChain's AI framework with Chroma's vector store and embeddings database. Link.
LangChain offers a flexible and modular framework for developing AI-native applications such as chat bots, question answering services, and agents.
Chroma is a vector store and embeddings database designed from the ground-up to make it easy to build AI applications with embeddings, with a focus on ease of use and lightweight performance.
Chroma's integration with LangChain provides an easy and seamless way to power LLM applications during local development.
The partnership between LangChain and Chroma aims to be the first, easiest, and best choice for most developers building LLM apps with LangChain.
The launch of StableDiffusion and ChatGPT has sparked an explosion of AI creativity, and LangChain and Chroma will remain at the forefront of the transformative era in AI development.
DESIGN
San Francisco-based productivity software maker, Tome, has raised $43 million in a Series B funding round led by Lightspeed Venture Partners, valuing the company at $300 million. Link.
The company, which uses AI to automatically generate presentation slides, claims to have reached a million users in just 134 days, making it the fastest productivity tool to ever reach that milestone.
Tome's AI can generate everything from children's bedtime stories to 3D prototypes but has resonated most as a PowerPoint and Google Slides alternative.
Tome’s software rethinks the PowerPoint system of slide-friendly, uniform tiles through software that can automatically place and size text and images dropped onto its canvas. Unlike a traditional presentation, Tome tiles are modular, with changes to one dynamically updating others.
AI models from OpenAI are integrated into Tome, making a number of queries behind the scenes to fulfill a user’s presentation prompt.
Tome is now looking to embed the outputs of other software startups within its platform.
MULTIMODAL
Amazon recently released Multimodal-CoT, a model that outperforms GPT-3.5 by 16% while being 784x smaller. Link. Link.
Large language models (LLMs) use chain-of-thought (CoT) prompting to generate intermediate reasoning chains for complex reasoning.
Existing CoT studies have focused on language only, but Multimodal-CoT proposes incorporating both language and vision modalities into a two-stage framework.
The two-stage framework separates rationale generation and answer inference to improve the generated rationales for answer inference.
Multimodal-CoT under 1 billion parameters outperforms the previous state-of-the-art LLM (GPT-3.5) by 16 percentage points (75.17%->91.68% accuracy) on the ScienceQA benchmark.
Multimodal-CoT even surpasses human performance on the ScienceQA benchmark.
Link to code is attached.
You.com has launched YouChat 2.0, a new "multimodal conversational AI" system that combines advanced conversational AI with community-built apps, offering a unique and interactive experience with each query. Link.
With its blended large language model known as C-A-L (Chat, Apps, and Links), YouChat 2.0 can serve up charts, images, videos, tables, graphs, text, or code embedded in its responses to user queries, reducing the need for multiple tabs.
YouChat 2.0 offers a rich visual experience, blending the power of chat with up-to-date information and dynamic content from apps such as Reddit, TikTok, StackOverflow, Wikipedia, and more.
Search results now include integrated generative AI apps that allow users to create content directly within the search results, including writing text, coding, and generating images.
YouChat 2.0 provides users with accurate information and cites sources, providing a level of trust and reliability that is often lacking in conventional search engines.
You.com is taking steps to increase the trust and reliability of its conversational search and invites developers to submit apps to its open platform to work together to create the ultimate chat-search-do engine.
HEALTHCARE
Stanford researchers have teamed up with Stable AI to explore the potential of text-to-image models in generating medical images to address the lack of diverse healthcare data in rare disease training. Link.
Radiologists always prepare a detailed text report that describes their findings in each image they analyze. By adding this training data into the Stable Diffusion model, the team hoped that the model could learn to create synthetic medical imaging data when prompted with relevant medical keywords.
The researchers found that with some additional training, the general-purpose latent diffusion model performed surprisingly well at the task of creating images of human lungs with recognizable abnormalities.
Foundation models trained in natural images and language have not performed well when given domain-specific tasks, as professional fields such as medicine and finance have their own jargon, terminology, and rules. But the team added training data into their Stable Diffusion model to account for medical keywords.
The researchers examined three sub-components of the Stable Diffusion model's architecture: the variational autoencoder, the text encoder, and the U-Net, and found that each component provided sufficient context to create medically accurate images.
After experimenting with prompts and benchmarking their efforts using both quantitative quality metrics and qualitative radiologist-driven evaluations, the scholars found their best-performing model could be conditioned to insert a realistic-looking abnormality on a synthetic radiology image while maintaining a 95% accuracy on a deep learning model trained to classify images based on abnormalities.
Medical AI Research Center (MedARC) has been launched to advance the field of AI in healthcare through open, collaborative research. Link.
Large deep learning models have enabled previously inconceivable applications in other fields, but the lack of training with medical data has limited their use in healthcare.
MedARC aims to develop large AI models tailored to medical applications and build interdisciplinary teams of clinicians and ML researchers and engineers to address clinical needs.
The Center welcomes involvement from machine learning researchers, clinicians, academics, and others to contribute to their current projects or propose new collaborative research projects.
MedARC operates on the principle of doing science in the open and aims to open source models and datasets, publish all results as preprints, and publish in reputed peer-reviewed journals and conferences.
The founder and CEO of MedARC is Tanishq Mathew Abraham, a 5th-year Biomedical Engineering PhD candidate at UC Davis who is interested in applying generative AI to microscopy and digital pathology.
POLICY/LAW
Representatives of 60 countries endorsed a non-binding resolution at the first-ever summit on Responsible Artificial Intelligence in the Military (REAIM). Link. Link. Link.
The resolution calls for responsible development, deployment, and use of military AI, and outlines how AI may be put to military uses, transform global politics, and how governments ought to approach it.
The resolution calls for collaboration on guidelines for responsible use of military AI, data protection mechanisms, human oversight, and exchange of information and best practices.
The US released a 12-point resolution covering military AI development, deployment, governance, safety standards, and limitations, which called for a ban on AI control of nuclear weapons and clear descriptions of the uses of military AI systems.
In 2021, 125 UN member nations sought to add AI weapons to a pre-existing resolution that bans or restricts the use of certain weapons, but the effort failed due to opposition from the US and Russia.
The resolution is seen as a step toward channeling AI into nonlethal military uses and establishing rules to rein in fully autonomous weapons.
New generative AI systems like ChatGPT and Dall-E raise legal questions about who owns the data that trains their systems, copyright of output, and responsibility for defamatory or dangerous information. Link.
Laws specific to AI don't exist in most of the world, so these issues will have to be addressed through existing law.
Legal uncertainty surrounds whether AI developers should pay for the rights to training data and whether generative AI output can be copyrighted.
It's unclear if AI systems can commit slander or libel, but their creators could be held liable if they were negligent in the creation of the systems.
There are questions around who is responsible if AI systems offer private or dangerously false information.
Both courts and lawmakers will likely play a role in determining how these and other legal issues play out.
OTHER
Amazon Web Services (AWS) is collaborating with Hugging Face to facilitate AI work in Amazon's cloud. Link.
AWS provides tools for developers to create AI-based software, such as proprietary computing chips for training AI algorithms and services that reduce chatbot development time.
Hugging Face is a central hub for sharing open-source code and models for AI developers.
Hugging Face and AWS are working together to make it easier for developers to take code from the site and run it on AWS cloud.
AWS is dedicating significant engineering resources to building shared products with Hugging Face.
The next generation of Bloom, an open-source AI model that competes with OpenAI's ChatGPT, will be run on Trainium, a proprietary AI chip created by AWS.
Inside the ChatGPT Race in China. Link.
ChatGPT has become very popular in China, even though it is not currently available in the country.
Chinese people have accessed ChatGPT through VPNs or paid workarounds, or have seen results through screenshots and social media videos.
ChatGPT's ability to answer questions in Chinese has exceeded expectations, as it can understand traditional and pop-cultural references in China and even mimic the writing style of propaganda mouthpiece editors and social media influencers.
Major Chinese tech companies are now developing their own ChatGPT-like products, such as Baidu's "Ernie Bot" and Alibaba's internal tool.
The current trend is driven by excitement and FOMO, as few tech products have managed to grab as much public attention as ChatGPT.
There has been some irrational corporate action with Chinese stock prices rising for companies with little background in AI who have announced plans to use ChatGPT-like tech.
There is a real opportunity for Chinese companies to catch up with the US in the chatbot arms race, as they have access to better Chinese-language AI training materials and are commercially motivated to develop new products quickly.
The Nvidia A100 chip has become one of the most critical tools in the artificial intelligence industry, powering many AI applications. Link.
Companies like Microsoft and Google are racing to integrate cutting-edge AI into their search engines as billion-dollar competitors such as OpenAI and Stable Diffusion release their software to the public.
Nvidia takes 95% of the market for graphics processors used for machine learning, and the A100 is ideally suited for machine learning models that power tools like ChatGPT, Bing AI, or Stable Diffusion.
Large language models like ChatGPT require hundreds of GPUs to train, and powerful chips like the A100 are needed for inference or using the model to generate text, make predictions, or identify objects inside photos.
Nvidia stands to benefit from the AI hype cycle, with its AI chip business reporting continued growth and rising sales of over $3.6 billion in the fourth quarter of 2022.
Companies that find themselves with a hit AI product often need to acquire more GPUs to handle peak periods or improve their models, and the cost of A100s can add up, with many data centers using systems that include eight A100 GPUs working together.
A researcher from Stanford has developed a prompting strategy to improve the performance of smaller and open-source LLMs. Link.
The "Ask Me Anything" (AMA) strategy involves studying the corpus and training procedure of LLMs to create effective prompts.
The AMA strategy combines multiple imperfect prompts with weak supervision to create predictions for the best inputs.
The researcher identified question-answering (QA) prompts as the most effective for LLMs.
Prompt chaining and aggregation were used to efficiently format task inputs and improve performance without labeled data.
The open-source GPT-J-6B model exceeded the few-shot performance of the GPT3-175B model on 15 of 20 benchmarks with AMA.
Key Papers
Toolformer: Language Models Can Teach Themselves to Use Tools. Link.
LMs struggle with basic functionality like arithmetic or factual lookup where simpler models excel, but excel in solving new tasks with few examples or instructions.
Toolformer is a model trained to use external tools via simple APIs in a self-supervised way with only a handful of demonstrations for each API.
Toolformer decides which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction.
Toolformer incorporates a range of tools like calculator, Q&A system, search engines, translation system, and calendar.
Toolformer achieves improved zero-shot performance across various downstream tasks, often competitive with larger models.
Toolformer does not sacrifice its core language modeling abilities.
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models. Link.
Large language models can perform various reasoning tasks using chain-of-thought prompting.
The quality of prompts depends on demonstrations given to the models, but creating them by hand is costly.
Synthetic prompting leverages a few handcrafted examples to prompt the model to generate more examples by itself.
The method alternates between backward and forward processes to generate new examples.
Backward process generates a solvable and clear question that matches a sampled reasoning chain.
Forward process produces a more detailed reasoning chain, improving the quality of the example, and outperforms existing prompting techniques on numerical, symbolic, and algorithmic reasoning tasks.
GPT Tip of the Week
Prompt: What are some best practices for evaluating the performance of an AI model?
Response:
Define clear evaluation metrics that align with the intended use case of the model
Use both quantitative metrics (e.g., accuracy, precision, recall) and qualitative measures (e.g., human feedback) to get a comprehensive understanding of the model's performance
Ensure that the evaluation data set is diverse and representative of the real-world scenarios the model will encounter
Regularly re-evaluate the model's performance and make necessary adjustments to improve its accuracy and effectiveness
Consider the limitations and potential biases in the evaluation process and take steps to mitigate them.