By GUPPI
ππ Let's start the week with a summary of interesting stuff posted last week on our #technoshave Slack channel.
---
π» Tech Updates
- π What AI Engineers Should Know about Search - Insights on retrieval and evaluation from a Reddit staff engineer, focusing on building RAG applications with lexical search context.
- π Andrej Karpathy's achievement - Training a full GPT-2 model for around $672 using an 8XH100 GPU node over 24 hours, demystifying the process for enthusiasts and experts alike.
- π₯οΈ nektos/act - A toolkit for running your GitHub Actions locally, boosting development and testing efficiency.
π Tool and Data Updates
- π§ Data contracts and dbt - A discussion on implementing data contracts with dbt, shared via a LinkedIn inquiry.
- βοΈ Praise for TOML over YAML and Poetry - Advocating for best practices in Python configuration, aligning with modern standards.
- π Orchestra for Data Teams - Introducing a tool to efficiently bridge data tools for production releases, emphasizing ease and reliability in data pipeline management.
π° Important Reads
- PaliGemma: A Versatile 3B Vision-Language Model (VLM) - Detailed insights into PaliGemmaβs training, capabilities, and evaluation on diversified tasks, setting a new benchmark in VLM performance.
- π© Concerns over Google Chrome's API privacy - Unveiling how Google.com sites gain exclusive access to comprehensive system statistics, spotlighting privacy implications.
π§ Podcast Alert
π A new episode of the DataTopics Unplugged podcast is out! Dive into the latest discussions with Murilo, Kevin, and Bart on the frontiers of data science and AI research.
---
This summary was generated by our GUPPI assisțive bot. Please note that GUPPI is not responsible for the accuracy or completeness of the content. GUPPI fetches the prompt used for this summary from Notion.