AWS’ Generative AI Strategy Starts to Take Shape and Looks a Lot Like Microsoft’s

AWS re:Invent was innundated with generative AI announcements.

Created Using DALL-E

Next Week in The Sequence:

  • Edge 349: We are almost at the end of our series about fine-tuning and we are going to discuss the nascent space of reinforcement learning with AI feedback(RLAIF). We review the original RLAIF paper and NVIDIA’s NeMo framework.

  • Edge 350: We review Hugging Face’s Zephyr model which has quickly become one of the most robust open source LLMs in the market.

You can/should/must subscribe below:

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: AWS’ Generative AI Strategy Starts to Take Shape and Looks a Lot Like Microsoft’s

The AWS re:Invent conference has long been regarded as the premier event of the year for cloud computing. The 2023 edition, however, was notably dominated by generative AI announcements, shedding light on AWS’s strategy in this area, which had previously been questioned. For years, Amazon was perceived as lagging behind cloud computing rivals Microsoft and Google in generative AI. In fact, in many earnings calls, generative AI has been highlighted as a trend through which Microsoft could surpass AWS as the leading cloud computing platform. re:Invent demonstrated that AWS is determined to be competitive; and while its strategy may not be unique, it appears to be robust.

The re:Invent announcements spanned a broad spectrum. Bedrock has emerged as the cornerstone of AWS’s generative AI strategy, now supporting Anthropic’s Claude 2.1 and open-source models like LlaMA. AWS also unveiled smaller, specialized models such as Titan TextLite, Titan TextExpress, and Titan Image Generator, which focus on summarization, text generation, and image generation, respectively. The support for Large Language Models (LLMs) became even more compelling with the release of Titan Multi-model Embeddings, enabling multimodal search capabilities.

An area that caught my attention was the enhanced support for RAG and agents. Bedrock now allows developers to integrate their own data sources to build RAG applications. Additionally, AWS Q, an agent capable of performing various developer and devops operations, supports native integration with AWS services. AWS also introduced capabilities in model evaluation and data sharing, crucial for generative AI applications. Notably, there was also news on AI chips, with the launch of AWS Graviton4 and AWS Trainium2, optimized for generative AI workloads.

In summary, re:Invent showcased AWS’s strength in the generative AI sector. Its strategy seems quite similar to Microsoft’s, except that the latter benefits from broader distribution through Windows and Office. Among the three cloud giants, Google now appears to have the weakest offering, but this could change at the next conference.


🎁 Learn AI skills, win swag!

Join Zilliz (the creators of the Milvus vector database) and 23 other open source projects for the 2023 Advent of Code as we count down to the holidays! Earn points by starring repos and trying new technologies to win an exclusive swag pack. 

Get all the contest details ->


🔎 ML Research

GAIA Benchmark

Researchers from Meta, HuggingFace, GenAI and AutoGPT published GAIA, a benchmark for general AI assistants. The benchmark measures tasks such as reasoning, multi-tasking, multimodality, web browing and many others —> Read more.

Inflection-2

Inflection unveiled the initial results of the training of Inflection-2, its next generation LLM. The model performs extremenly well in benchmarks ranging from question-answering to reasoning —> Read more.

GNoME

Google DeepMind published a paper detailing Graph Networks for Materials Exploration (GNoME), a deep learning model that was able to discover new materials. Specifically, GNoME discovered 2.2 million new crystals and 380,000 stable materials —> Read more.

The Power of Prompting

Microsoft Research published a paper demonstrating how generalist models like GPT-4 can perform as well as highly specialized models using the right prompts. The model compares GPT-4 against fine-tuned models in the medical space —> Read more.

LQ-LoRA

Researchers from Carnegie Mellon University, MIT and others published a paper unveiling LQ-LoRA, a method for efficient memory adaptation in LLMs. LQ-LoRA outperforms other quantization methods like QLoRa or GPTQ-LoRA in well established benchmarks —> Read more.

System 2 Attention

Meta AI published a paper detailing System 2 Attention(S2A) , a method for improving reasoning in LLMs. Borrowing terminology from behavioral psychology, S2A leverages native capabilities of LLMs to determine which parts of the context to attend to —> Read more.

🤖 Cool AI Tech Releases

AWS Gen AI

Amazon unveiled a dozen of generative AI releases at its re:Invent conference —> Read more.

PPLX Models

Perplexity introduced two new LLMs that can deliver up to date, factual responses —> Read more.

SDXL Turbo

Stability AI announced SDXL Turbo, a super fast text-to-image model —> Read more.

GPT Crawler

A cool framework that can crawl a website and create a custom OpenAI GPT based on the data —> Read more.

🛠 Real World ML

Content Moderation at LinkedIn

LinkedIn discusses the ML architecture powering its content moderation policies —> Read more.

Data Quality at Airbnb

Airbnb shares details about their ML methodology for scoring and enforcing data quality —> Read more.

RAG at NVIDIA

NVIDIA shared a reference architecture for retrieval-augmented generative apps —> Read more.

📡AI Radar

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.