Scraping simplified

Apify

On a mission to make the web more open and programmable.

Published Apr 30, 2024

Welcome to our April newsletter! In this edition, you'll learn about a range of tools and tips for extracting data from Amazon, X, Hacker News, Yahoo Finance, and more.

Crawlee tutorial: scraping Hacker News

Instead of having to juggle a myriad of libraries, wouldn't it be easier to have a single library that integrates the functionalities of Axios, Cheerio, and Playwright, along with scraping-specific features like proxy rotation, browser fingerprinting, and streamlined pagination?

That's what Crawlee is for, and in the tutorial below, we show you how to use it to build a Hacker News scraper.

Learn to use Crawlee →

Crawlee has surpassed 12,000 stars on GitHub. So, if you like Crawlee, be a star, show your appreciation, and join the Stargazers. Thank you!

Give Crawlee a star on GitHub →

Scrape Amazon with TypeScript, Cheerio, and Crawlee

In this guide, you'll learn how to extract information from Amazon product pages using the power of TypeScript in combination with the Cheerio and Crawlee libraries. You'll learn how to retrieve and extract detailed product data and handle potential blocking issues that may arise during the scraping process.

Follow the tutorial →

How to feed LLMs with data from the web

We've been the Warm-Up Party partner of the WebExpo Conference for a while now, but this year, something special is happening. Our CEO, Jan Curn, will talk at Lucerna Cinema about how you can build a generative AI model with web data.

Get a 15% discount on your conference ticket with the code 'Apify24'!

You can book your ticket here →

What's more, Jan built a custom GPT for the event to answer any questions you might have about the program, conference, speakers, and anything else WebExpo-related.

Use the WebExpo 2024 GPT →

Did you miss Scraping with Apify 101?

If you missed our last webinar, you can watch the recording from the live event on our YouTube channel and see how you can run, build, schedule, and integrate web scraping tools through Apify.

Watch the webinar →

How to scrape X (Twitter) with Twikit

A step-by-step guide to scraping X posts (Tweets) with Twikit, an open-source library dedicated to scraping Twitter data in Python.

Start scraping X posts in Python →

Web scraping with Python Playwright

Learn to use Playwright with Python to navigate through web pages, execute JavaScript, manage asynchronous requests, intercept network communications to extract essential data, and more.

Follow the tutorial here →

Scrape HTML tables with Pandas

Using Pandas for scraping HTML tables not only saves a lot of time but also makes code more reliable because you're selecting the entire table, not individual items inside the table that may change over time. So, if HTML tables are all you need, try this shortcut.

Start scraping HTML tables →

MechanicalSoup: a good tool for web scraping?

Is it worth adding MechanicalSoup to your scraping tools? We demonstrate the features of this library and how it compares with BeautifulSoup and Selenium.

Read about MechanicalSoup here →

Apify Store got a major update

A couple of months back, we told you that we redesigned how our Store looks in Apify Console. Now it's crawled its way out of Console onto the web. Store 3.0 has a couple of new sections featuring the most interesting Actors and the most active developers from our community.

The idea behind the redesign is to give a well-deserved spotlight to community-driven Actors and make sure your contributions are visible.

New Actors in Apify Store

Actor video tutorials

Watch more helpful content on our YouTube channel →

Join our Discord

We now have more than 7,000 developers in our community on Discord. Join and participate in various exclusive events, chat with fellow scraping developers, and touch base with the Apify team.

Join our developer community →

Scraping simplified

Apify

On a mission to make the web more open and programmable.

Crawlee tutorial: scraping Hacker News

Scrape Amazon with TypeScript, Cheerio, and Crawlee

How to feed LLMs with data from the web

Did you miss Scraping with Apify 101?

How to scrape X (Twitter) with Twikit

Recommended by LinkedIn

Web scraping with Python Playwright

Scrape HTML tables with Pandas

MechanicalSoup: a good tool for web scraping?

Apify Store got a major update

New Actors in Apify Store

Actor video tutorials

Join our Discord

Pro web scraping

2,490 followers

More articles by Apify

Insights from the community

Others also viewed

GenAI Weekly — Edition 23

❄️Pre-Christmas Reads: New Research, Sora, Python Guides, and More

Live, Online Distribution Estimation Using t-Digests

KX's developed innovation of AI (Artificial Intelligence)

Navigating Legal Landscapes in Scraping, Parsing URLs in Python, and Much More

Benchmarking AutoML Vendors and Open Source Time Series Packages

Utilizing ML for Better Scraping, Data Extraction With a Headless Browser, and More

Bigbird, TensorFlowJS and LinkedIn — Web models for your network.

Document Splitting

Accelerating Data-on-Demand Services, C++, & Podcast Recommendation

Explore topics

Crawlee tutorial: scraping Hacker News

Scrape Amazon with TypeScript, Cheerio, and Crawlee

How to feed LLMs with data from the web

Did you miss Scraping with Apify 101?

How to scrape X (Twitter) with Twikit

Recommended by LinkedIn

Web scraping with Python Playwright

Scrape HTML tables with Pandas

MechanicalSoup: a good tool for web scraping?

Apify Store got a major update

New Actors in Apify Store

Actor video tutorials

Join our Discord

Pro web scraping

2,490 followers

More articles by Apify

🎄Season's greetings from Apify ☃️

Fall data harvest

Actor marketing playbook & Crawlee new release

Fall semester for scrapers

Developer Insights

Power up with Apify

Scraping social media

Super scrapers!

Crawlee Blog is here!

Web scraping SDKs & templates

Insights from the community

Others also viewed

GenAI Weekly — Edition 23

❄️Pre-Christmas Reads: New Research, Sora, Python Guides, and More

Live, Online Distribution Estimation Using t-Digests

KX's developed innovation of AI (Artificial Intelligence)

Navigating Legal Landscapes in Scraping, Parsing URLs in Python, and Much More

Benchmarking AutoML Vendors and Open Source Time Series Packages

Utilizing ML for Better Scraping, Data Extraction With a Headless Browser, and More

Bigbird, TensorFlowJS and LinkedIn — Web models for your network.

Document Splitting

Accelerating Data-on-Demand Services, C++, & Podcast Recommendation

Explore topics