Scraping simplified
Welcome to our April newsletter! In this edition, you'll learn about a range of tools and tips for extracting data from Amazon, X, Hacker News, Yahoo Finance, and more.
Crawlee tutorial: scraping Hacker News
Instead of having to juggle a myriad of libraries, wouldn't it be easier to have a single library that integrates the functionalities of Axios, Cheerio, and Playwright, along with scraping-specific features like proxy rotation, browser fingerprinting, and streamlined pagination?
That's what Crawlee is for, and in the tutorial below, we show you how to use it to build a Hacker News scraper.
Crawlee has surpassed 12,000 stars on GitHub. So, if you like Crawlee, be a star, show your appreciation, and join the Stargazers. Thank you!
Scrape Amazon with TypeScript, Cheerio, and Crawlee
In this guide, you'll learn how to extract information from Amazon product pages using the power of TypeScript in combination with the Cheerio and Crawlee libraries. You'll learn how to retrieve and extract detailed product data and handle potential blocking issues that may arise during the scraping process.
How to feed LLMs with data from the web
We've been the Warm-Up Party partner of the WebExpo Conference for a while now, but this year, something special is happening. Our CEO, Jan Curn, will talk at Lucerna Cinema about how you can build a generative AI model with web data.
Get a 15% discount on your conference ticket with the code 'Apify24'!
What's more, Jan built a custom GPT for the event to answer any questions you might have about the program, conference, speakers, and anything else WebExpo-related.
Did you miss Scraping with Apify 101?
If you missed our last webinar, you can watch the recording from the live event on our YouTube channel and see how you can run, build, schedule, and integrate web scraping tools through Apify.
How to scrape X (Twitter) with Twikit
A step-by-step guide to scraping X posts (Tweets) with Twikit, an open-source library dedicated to scraping Twitter data in Python.
Recommended by LinkedIn
Web scraping with Python Playwright
Learn to use Playwright with Python to navigate through web pages, execute JavaScript, manage asynchronous requests, intercept network communications to extract essential data, and more.
Scrape HTML tables with Pandas
Using Pandas for scraping HTML tables not only saves a lot of time but also makes code more reliable because you're selecting the entire table, not individual items inside the table that may change over time. So, if HTML tables are all you need, try this shortcut.
MechanicalSoup: a good tool for web scraping?
Is it worth adding MechanicalSoup to your scraping tools? We demonstrate the features of this library and how it compares with BeautifulSoup and Selenium.
Apify Store got a major update
A couple of months back, we told you that we redesigned how our Store looks in Apify Console. Now it's crawled its way out of Console onto the web. Store 3.0 has a couple of new sections featuring the most interesting Actors and the most active developers from our community.
The idea behind the redesign is to give a well-deserved spotlight to community-driven Actors and make sure your contributions are visible.
New Actors in Apify Store
Actor video tutorials
Watch more helpful content on our YouTube channel →
Join our Discord
We now have more than 7,000 developers in our community on Discord. Join and participate in various exclusive events, chat with fellow scraping developers, and touch base with the Apify team.