Table of Contents
AWS Web Scraping Introduction
Cloud-based web scraping platforms are more convenient for “self-service” scraping, of course, if you have the technical knowledge of building web scrapers and want to try web scraping by yourself. Though such kind of platform has a friendly user interface, as soon as you try the easiest scraping task, you’ll understand that quite a bit of technical knowledge is still required. In this topic, we’ll explore web scraping with AWS – Amazon Web Services (EC2) platform using WebHarvy from the cloud.WebHarvy – A Powerful Web Scraper
WebHarvy is a web scraper enabling the extraction of web content (emails, URLs, HTML, and images) from target websites, and save data in various formats. With WebHarvy there is no necessity to write any code to script data; to extract the required data, you just need to select it and click your mouse. WebHarvy defines patterns of data in an automated manner; if it is required to scrape different items like name, price, or email address from a target page, all required configurations are made automatically.Web Scraping from Cloud
To start using WebHarvy, you need Windows OS. For Mac users, to run WebHarvy, it is required to install Windows through BootCamp or run it via Parallels. In case you do not want to run it on your local computer, you can run WebHarvy right from the cloud thanks to AWS Elastic Compute Cloud (EC2) platform, which is used to get secure capacity in the cloud. Amazon EC2 enables the running of a remote Windows instance in the Cloud via Remote Desktop. Take note that EC2 required minimal charges, but before that, you can enjoy а free tier for 12 months. When you are connected to the Windows instance through Remote Desktop, download and install WebHarvy. Make sure that .Net 3.5 is also installed in the Windows instance to run WebHarvy. Once you installed WebHarvy, you can start extracting data right away.- Open WebHarvy
- Navigate to the target page.
- Click on Start Config on the toolbar and select the data items to capture.
- Captured data will be shown below in Captured Data Preview pane.
- Click on Start Mine on the toolbar.
- Once the mining process is finished, click on the Export button
- Select the desired format and start exporting the extracted and mined data.