Scraping E-commerce through Brands
September 24, 2020
Data, E-commerce, Tutorial
In the previous blog, we explained how to retrieve the necessary data the classical way by going through the categories, sub-categories, then the products, etc. It is the most primitive and intuitive way of gathering data with Web Scraper extension; however, sometimes a problem arises when the website layout is changed or the placement of various products is altered. For this reason. previously created sitemaps for that specific website may break, or stop working properly.
However, there is a tip of how to create scrapers for e-commerce websites through the way that in many cases when a website is changed, the scraper has a lower probability of breaking. On e-commerce sites, you most likely will visibly see a section called “brands”, or “A-Z” or such. Basically a section on which you can find products categorized by labels (such as brands). This section will list all the products that are on the page, but also the products will be presented in the same manner/layout even after the primitive, most visible categories have been differentiated from the previous layouts.
This section is a huge relief for the scraper since in most cases all the products can be retrieved in an easy and well-functioning manner since the layouts for each category match.
On the demonstrated website, the section is easy to find.
We start the scraping process as always - by creating a sitemap and designating the starting point - the “Start URL”.
Now the process is very similar to the classical scraping, the only difference is that now we select the brand categories.
Then visit the first brand and create the pagination and the product URL selectors.
And lastly, create the three text selectors that will retrieve the titles, the prices, and the colors of the products.
And this is how our selector graph looks like. When we compare it to the one for the classical way of scraping, it is visibly shorter; however, the process does not differ much, there are only fewer steps.
This might not be applied in cases when it is necessary to know from which subcategories the products come from. Also, it is very important to check if there are any pagination limits. For example, bigger e-commerce websites, such as, for example, AliExpress, there are pagination limits that allow only a limited number of products to be shown, when in reality, it is known that they have thousands of products on their page. If this kind of problem arises, the process of scraping all the products becomes more complicated. Depending on each e-commerce website that has pagination limitations, mostly it is necessary to create a filtering strategy and create scrapers based on those. But, as mentioned previously, it really depends on each individual e-commerce website and the way they display their products.
Scraping through brands is rather a tip than another method of scraping. It might not guarantee that a created sitemap will never break; however, by scraping through the label page, there is a significantly higher probability that the scraper will not be affected if products of the main categories will be altered.