Octoparse is a powerful automated web scraping software with an easy-to-use point-and-click user interface, which enables users to apply different patterns to extract data from different websites with ease.
An Automated Web Scraping Tool 2017
It provides different advanced functions like Smart Mode, Cloud Extraction, API Access that helps users to capture data from any static or dynamic websites without any programming knowledge. Various export formats are available such as CSV, Excel, HTML, TXT. It also enables users to export extracted data into databases like MySQL, SQL Server, and Oracle.
Octoparse offers three editions to meet your data extraction needs, including Free, Standard and Professional. It is one the best free web scraping tools available in the market. Two paid editions provide cloud platform with multiple cloud servers for web scraping.
( For detailed features check out here)
Distinct Features of Octoparse
- Visual Workflow Designer: Octoparse provides a simple and user-friendly Visual Workflow Designer that enables users to extract data in bulk in the easiest and fastest way. Users can configure an extraction rule to instruct the program: which web page is to be crawled, which data fields to be collected etc.
- No coding needed: All you need to do is to follow simple steps to configure a rule while extracting data. No coding needed. It has very rich set of tutorials on how to extract data with Octoparse.
- Smart Mode: This feature enables users to instantly turn web pages into Excel with only one click – enter your target URL in the text box and click “SMART”. It is a lot easier and the extraction rule is automatically created by the program, which lowers the barrier to entry for anyone who needs data. Octoparse works perfectly on list or table pages such as category pages, search results pages, etc. It usually takes less than a minute to get data for one page.
- Cloud Extraction: Cloud Extraction allows users to run the data extraction tasks on the cloud platform. When you run the task using Cloud Extraction feature, technically, it speeds up data extraction (4 to 10 times) than Local Extraction.
If it takes around 1 second to load a web page, 4*7*24*3600 web pages will be scraped with 4 cloud server per week when running 1 scraping task. When running 2 extraction tasks, 2 cloud servers will be assigned to each task and 2*7*24*3600 pages will be scraped per week.
- Scrape data from behind a log in.
- It Scrapes data from a website with infinite scroll like Twitter or Facebook.
- Scrape a website with pagination.
- XPath Tool and RegEx Tool: These tools enable you to scrape data you want precisely. With these two tools, you will find it much easier to define an XPath or write a regular expression. You can also modify the XPath in Octoparse to exactly locate the data on the web page and extract the data you want.
- Incremental Extraction: This function allows you to extract the updated data without having to configure another rule. Updated data is identified by new URLs that are generated by new pages.
- Ad Blocking: This feature enables you to get rid of annoying ads including banners, pop-ups, etc. when scraping website using Octoparse. To use Ad Blocking feature, you simply choose Ad Blocking option while setting up Basic Information step. Ad blocking feature will optimize the loading time and reduce the number of web requests hence boosts the extraction speed.
- API Access: Octoparse has APIs available for you to access data. Users can create an API to connect the system to the scraped data in real time. To use Octoparse APIs, users must get the task ID of an extraction task. The easiest way to get the task ID is to right click a task and select “Create an API”.
- Schedule Data Extraction: Octoparse enables users to run an extraction task at a scheduled time. Once setting the schedule time, the program will automatically run the task at that particular time.
- Various Exporting Capabilities: Octoparse provides different export formats like CSV, Excel, HTML, TXT. It also enables users to export extracted data into different databases. (MySQL, SQL Server, and Oracle)
- Proxies & IP Rotation: Octoparse enables you to scrape websites by rotating anonymous proxy servers to prevent your IP address from being blacklisted. The cloud platform has rich proxy servers and users don’t have to manually create a connection with different proxies. Or you can add a list of external proxy servers manually and configure connection for automatic rotation.
- Support: There are rich tutorials on the website for both beginners and experienced users. For technical support, users can reach the support team through Skype, Facebook Messenger and email
Cons: As for now, Octoparse couldn’t handle CAPTCHA. The Smart Mode couldn’t deal with complex websites that need users to login. Moreover, it doesn’t have more controlled logging and error handling facilities.In conclusion, Octoparse is a feature-rich visual scraping application and worth a try. It can help you to get any public web data easily and efficiently.