- AI Weekly
- Posts
- AI Weekly: 01/30/23
AI Weekly: 01/30/23
Now, as answers become accessible to everyone, the smartest people will be the ones who ask the best questions.
Welcome to the third week of AI weekly. Before diving into the top stories of this past week, I want to elaborate on a topic I briefly mentioned last week: Automatic Prompt Engineering (APE).
As many of us know, while LLMs have demonstrated a strong ability to perform as general-purpose computers, task performance is highly dependent on the quality of the prompt used to direct the model. To date, the most effective prompts are crafted by humans through a trial and error process, as they have little knowledge on the exact input that will generate their desired output. Understandably, this is not a scalable approach to yielding high quality outputs from LLMs.
Thus, researchers at the University of Toronto, Vector institute, and University of Waterloo have proposed APE as a solution to produce high quality prompts automatically and at scale.
The results from their experiments were promising.
According to their paper published on November 3rd, 2022:
Experiments on 24 NLP tasks show that our automatically generated instructions outperform the prior LLM baseline by a large margin and achieve better or comparable performance to the instructions generated by human annotators on 19/24 tasks.
So, how does it work? They use the LLM first, to generate possible instruction candidates using multiple input-output pairs for a given task. For example, the following input-output pairs would be given to the LLM:

The LLM, in turn, would use these pairs to generate variations of a potential optimal instruction prompt for this set of pairs:

The LLM scores these prompts by testing them out as instruction prompts and evaluating the generated results. Here, the prompt “produce the antonym of the input” generated a correct answer more consistently than the other two prompts and as a result, would be selected as the optimal instruction prompt for this task.
So what?
Assuming that the effectiveness of ALE models will continue to improve in all 24 tasks with time and larger amounts of training data, this technology has the potential to significantly impact the overall quality of results we can generate from LLMs. It should also expand the number of people than can meaningfully leverage LLMs’ capabilities.
Although the tasks that these researchers evaluated are fairly straightforward, we can start to imagine the impact of APE in the context of the more common creative tasks that we see people trying to accomplish with LLMs.
For example, let’s take a look at a simple use case in photo editing:
While an individual may know the changes they want to make for a given photo, they may not have an in-depth understanding of the different photography-specific terms that exist. This person may say, “increase the intensity of the colors,” when what they really mean is “increase saturation.” APE’s ability to understand an input and optimize it to achieve the intended output can allow people with little domain knowledge to accomplish tasks in a more expert manner.
Of course, there are many other potential examples of where APE can provide value, but no one reading this has all day. I’m looking forward to discussing the potential impact of APE with all of you as it continues to develop into a practically usable technology. Until then, enjoy the good ole’ days of trial and error while it lasts.
- ZG
Here are the most important stories of the week:
TEXT
Claude has a detailed understanding of its creation and ethical principles that guided its design.
Claude is a larger AI model with 52 billion parameters, based on the pre-trained autoregressive AnthropicLM v4-s3 model.
The size of Claude's available context window is 8,000 tokens.
Anthropic's method of training the AI model, called Constitutional AI, uses a model to generate the initial rankings of fine-tuned outputs based on a set of underlying principles.
Claude is a serious competitor to ChatGPT, with improvements in its ability to refuse inappropriate requests and more naturalistic writing.
Claude has the ability to write coherently about itself, its limitations, and its goals, allowing it to answer questions on other subjects.
Claude appears to be weaker in code generation and reasoning about code, but is similar to ChatGPT in calculation and logic problems.
PromptLayer is a new platform that allows engineers to track, manage, and share their GPT prompts. Link.
Acts as a middleware between the code and OpenAI's python library, and records all OpenAI API requests, allowing engineers to search and explore request history in the PromptLayer dashboard
Helps engineers to track, debug and replay old completions. It is designed to aid the process of exploration that is prompt engineering
AI21 Labs introduces an AI writing assistant that can suggest quotes, statistics, provide citations, and more. Link.
AI21 Labs, an Israeli start-up, has released a text-generating AI system called Wordtune Spices.
Unlike ChatGPT, Wordtune Spices does not compose emails or essays but instead suggests options to change the voice and style of already written sentences.
Wordtune Spices has "grounding and attribution" algorithms to search for relevant sources and present source links for each information it offers.
The tool offers suggestions for different professional documents and can help write a thesis statement and main ideas, including explanations and counterarguments.
The co-CEO and co-founder of AI21 Labs, Ori Goshen, says that Spices was designed to address a wide range of use cases, but isn't a tool to generate full essays and will not create strong writers.
CNET’s article-writing AI is already publishing very dumb errors. Link.
CNET had been quietly publishing articles generated by an AI engine, sparking criticism
The AI-generated articles were found to contain errors and provide unrealistic expectations to readers with low financial literacy.
The problems with the AI's description highlight the difficulties in distinguishing fact from fiction for current generation AI text generators.
There is a risk that human editors may become complacent and overlook errors in the AI-generated content.
Google created an AI that can generate music from text descriptions, but won’t release it. Link.
Google has created a new AI system called MusicLM that can generate music in any genre given a text description.
MusicLM is the first generative AI system for song that has the potential to produce high-fidelity, complex compositions.
The system can generate music based on descriptions, build on existing melodies, and create a melodic story or narrative.
MusicLM also has the ability to generate music based on images and captions and can create music inspired by places, epochs, or requirements.
Despite its impressive capabilities, the system is not without its flaws and raises ethical challenges and potential legal issues, including the risk of misappropriation of creative content and violating music copyright.
IMAGE/VIDEO
Google Cloud has created a new AI tool called "shelf inventory" to help big-box retailers better track their inventory on shelves. Link.
The tool uses videos and images from the retailer's own cameras or store associates to identify and analyze the availability of consumer-packaged goods on shelves.
Currently in preview and will be available to retailers in the coming months
Will help retailers improve product availability, replenish out-of-stock items faster and lose fewer sales opportunities
Scenario raises $6m for its AI platform that generates game art assets. Link.
Emmanuel de Maistre and Hervé Nivon co-founded Scenario, which focuses on AI to generate game art for artists and developers.
The $6m in seed funding comes from Play Ventures, Anorak Ventures, The VR Fund, Oculus co-founder Brendan Iribe, Twitch founder Justin Kan and Hugging Face founders Clem Delangue and Julien Chaumont.
5,000 people have signed up and 20,000 more are on the waitlist; pricing will start at $20/mo.
Scenario aims to avoid legal challenges around generative AI by only allowing users to use data they own or open source alternatives. Scenario does not claim ownership over created images or generators.
SPEECH
SoundHound raises $25m after laying off 40% of staff earlier this month. Link.
SoundHound, a voice AI company that provides conversational AI services for automotive, connected home, and hospitality applications, recently laid off 40% of its staff and raised $25 million in equity.
The funding is from a "diverse set of financial and strategic investors" and comes in the form of preferred equity convertible into shares of Class A common stock.
The company is using the funding for various purposes, including paying severance to laid-off employees.
SoundHound was once valued at $1 billion as a privately held startup and went public in 2021 via a SPAC merger with a valuation of $2.1 billion, but its market cap today is $295 million.
SoundHound has partnerships with companies such as Mercedes-Benz, Kia, and Toast, but the revenue generated from these partnerships is unclear. The company expects to see revenue growth of 50% YoY in 2023, based on its strong customer base, bookings foundation, and demand for its voice AI-enabled customer service products.
Supernormal raises a $10m in funding led by Balderton to automatically transcribe and summarize meetings. Link.
Supernormal provides meeting notes transcription service using OpenAI's text-processing AI.
The funding will be used for product R&D and hiring, with the goal of delivering end-to-end workflow solutions and next-generation tools.
Supernormal already has a growing customer base, with 50,000 users across over 250 organizations including Netflix, Airbnb, and Snapchat.
CODE
Coders using AI coding assistants are on average producing less secure code. Link.
Conducted first large-scale user study on AI Code assistant for security tasks across programming languages.
Participants with AI access wrote less secure code compared to those without access.
Participants with AI access had higher false confidence in writing secure code.
Participants with less trust in AI and more engagement with prompts provided more secure code.
Cursor is a new IDE built from the ground up with LLMs. Link.
Allows users to quickly ship new features and make complex changes across multiple files
Includes AI-powered listing to flag problematic code and suggest fixes
Has IntelliSense support, split panes, vim mode, multi-file search, and support for CoPilot.
HEALTHCARE
Researchers use AI to improve care for patients who show up at the hospital with acute chest pain. Link.
A study developed a deep learning AI model to identify patients with acute chest pain syndrome who are at risk for adverse outcomes based on chest X-rays.
Acute chest pain syndrome accounts for 7 million emergency department visits in the US annually.
The deep-learning model improved prediction of adverse outcomes beyond conventional clinical markers like d-dimer blood tests.
The model maintained diagnostic accuracy across age, sex, ethnicity, and race.
Using the model, 14% of patients could have additional testing deferred as compared to 2% using age, sex, and biomarker data alone.
AI-driven therapy has a mixture of benefits and challenges. Below are some of those examples. Link.
Benefits
Early detection of stress through technology
Collection of data from various sources (motion sensors, activity on apps, posts on social media) to flag patient's worsening mood
Improvement of treatment efficacy through notification of missed medications and detailed notes about patient's tone and behavior during therapy sessions
Challenges
Machines lack capacity to accurately respond to the whole spectrum of human emotion.
Risk that AI-driven therapy could discourage individuals from seeking help from a human therapist
Algorithms are not yet advanced enough to mimic the complexities of human emotion and emulate empathetic care.
AI passes U.S. Medical Licensing Exam. Link.
Two AI programs, including ChatGPT, passed the USMLE, according to two recent papers.
The first paper showed that ChatGPT performed at 50-60% accuracy across all three USMLE exams without any specialized training or reinforcement.
The second paper evaluated the performance of another language model, Flan-PaLM, on the USMLE using a medical question-answering database called MultiMedQA, and achieved 67.6% accuracy.
Healthcare professionals have expressed concerns over the use of AI in medicine, including questions on its capability to make meaningful contributions and its ability to consent to authorship.
The authors of the second paper hope their findings would spark further discussions to improve healthcare using AI.
Atomic AI envisions RNA as the next frontier in drug discovery as it raises a $35m Series A round led by Playground Global. Link.
Atomic AI has received $35 million in a Series A round led by Playground Global, with participation from 8VC, Factory HQ, Greylock, NotBoring, AME Cloud Ventures and other angels.
Their goal is to to study RNA and discover new treatments, similar to what AlphaFold did for proteins.
RNA is functional and worth pursuing for treatment methods, with its potential in the "non-coding" regions of RNA.
Atomic AI has developed a machine learning model, PARSE, that can accurately predict the structure of RNA molecules based on limited data.
Atomic AI plans to use its PARSE platform for its own drug discovery program, rather than licensing it as a "structure as a service" platform.
OTHER
Microsoft just announced the general availability of Azure OpenAI Service, a tool that allows businesses to access the most advanced AI models, including GPT-3.5, Codex, and DALL-E 2. Link.
Lets users create cutting-edge applications, improve user experiences and streamline internal efficiencies
Also includes ChatGPT which runs inference on Azure AI infrastructure
Giant Chinese drone-carrying AI ship enters service as a research vessel. Link.
China has launched the world's first crewless drone carrier with AI-navigation.
The ship has potential for both maritime research and military use.
The ship is equipped with air, sea, and submersible drones.
The vessel uses the world's first AI system for ocean stereo observation.
The drone mothership can be used for data collection, disaster mitigation, environmental monitoring, and military intelligence gathering, among other things.
Atwater Brewery releases AI-made beer. Link.
Atwater Brewery in Detroit is releasing a beer designed by artificial intelligence called "Artificial Intelligence IPA".
The beer is described as having tropical and citrus flavors and aromas from a blend of Centennial, Citra, and Amarillo hops.
The recipe for the IPA was created by AI, but it is brewed by humans.
The quality manager at Atwater, Joe Platt, came up with the idea and believes AI will revolutionize the brewing industry in the future.
Artificial Intelligence IPA will be on tap starting Jan. 19 at Atwater's Detroit, Grosse Pointe, and Grand Rapids locations and will be sold for $2.001 a pint as a nod to the AI character HAL 9000 from 2001: A Space Odyssey.
OpenAI used Kenyan workers on less than $2 per hour to make ChatGPT less toxic. Link.
OpenAI used outsourced Kenyan laborers to help make ChatGPT less toxic by training it to detect harmful content in text.
The workers were sent tens of thousands of snippets of text, including graphic descriptions of violence, hate speech, and sexual abuse, to label for use in training the AI.
The workers were paid a wage of between $1.32 and $2 per hour by OpenAI's outsourcing partner, Sama, a San Francisco-based firm.
Sama workers assigned to OpenAI's project reported being mentally scarred by their work and expressed dissatisfaction with the support provided by the company.
Sama and OpenAI's relationship briefly deepened in February 2022 with Sama beginning pilot work to collect sexual and violent images for OpenAI.
Some of the collected images were illegal under U.S. law and were categorized as "C4" (child sexual abuse), "C3" (bestiality, rape, sexual slavery), and "V3" (graphic violence and injury) by OpenAI's internal label.
OpenAI paid Sama $787.50 for the images collected, but Sama soon cancelled all its work for OpenAI after realizing the work included illegal content.
The AI industry, including OpenAI, often relies on hidden human labor in the Global South which can be damaging and exploitative.
FBI chief says he’s “deeply concerned” by China’s AI program. Link.
FBI Director Christopher Wray expressed concern about China's AI program at a panel session at the World Economic Forum
Wray said China's AI ambitions are based on stolen intellectual property and sensitive data
He fears that China could use AI advancements for hacking, IP theft and repression
U.S. officials have voiced similar concerns before and a 2021 AI commission led by former Google CEO Eric Schmidt advised the U.S. to counter China's AI skills
The Chinese Embassy in Washington did not immediately respond to Wray's comments.
New report by foreign policy experts at Northwestern University and the Brookings Institution outlines recommendations for defending against deepfakes. Link.
A research report "Deepfakes and International Conflict" was published this month by Brookings.
The authors predict that deepfakes are about to be used more widely in military and intelligence operations.
The authors emphasize the need for a code of conduct for government use of deepfakes.
The authors suggest several long-term strategies, including education of the public, systems to track digital assets, slowing down and verifying information, and robust oversight and accountability mechanisms for deepfakes.
Inscribe, an AI-powered document fraud detection service, raises a $25m Series B round led by Threshold Ventures to fight financial fraud with AI. Link.
Inscribe helps finance companies detect fraud, automate processes, and understand creditworthiness to approve more customers faster.
The solution parses, classifies and data-matches financial onboarding documents, generating customer risk profiles, and providing credit analysis and bank statement automation.
Company has raised a total of $38 million
Clients include TripActions, Ramp, Bluevine, and Shift
Claim to be differentiated from others by its AI-first approach, which is built using data collected from previous partnerships with customers.
Hawk AI, an anti-money laundering and fraud prevention platform for banks, raises a $17m Series B round led by Sands Capital. Link.
Plans to use the funding for product development and global expansion
Hawk AI offers cloud-native, modular AML surveillance system with "highest level of explainability" in its AI-powered decision-making engine.
Customers include Banco do Brasil Americas, Moss, and North American Bancard.
Hawk AI stresses its focus on addressing the "black box" problem of AI and machine learning algorithms and aims to provide full transparency and explainability in its AI-driven decisions.
GPT Tip of the Week
Prompt: What are the important questions to ask generative AI companies as it relates to the legal concerns in the space?
Response: Here are some important questions to ask generative AI companies regarding legal concerns:
What data sources or data sets are used to train their AI models and how were these obtained, ensuring compliance with data protection laws such as GDPR?
How do they handle intellectual property rights related to the generated content?
What processes are in place to ensure compliance with applicable laws regarding content generated by their AI models, including but not limited to, copyright, trademark and privacy laws?
Are there any audits or certifications in place that can attest to the ethical and legal compliance of the company and its AI models?
How does the company handle any legal claims arising from the use of its AI models, including claims of infringement of intellectual property rights or other legal violations?
Are there any specific industry or governmental regulations the company is subject to and how do they ensure compliance with those regulations?
Does the company have a clear and transparent policy for reviewing, removing or modifying generated content if necessary for legal or ethical reasons?