Web scraping and other automated data acquisition methods rose to prominence through the application of dynamic pricing. Put simply, pricing data is collected from competitors (and, sometimes, other sources) and is matched accordingly through the use of mathematical modeling. Of course, the modeling might be as simple as “bring it lower”.
Dynamic pricing exploded onto the scene because it’s so easy to understand and so effective. Unfortunately, that resulted in dynamic pricing overshadowing all other web scraping business cases of which there are many.
Understanding web scraping
Web scraping, if we walk away from all the technical details, is a simple process. An automated program somehow, whether automatically or through human input, acquires a set of URLs. It then goes through the URLs, extracts the data contained in the page, and moves on to the next one.
As the script moves through all the URLs, the data collected is stored locally in memory. It then searches for specific information from all the collected pages. Sometimes these scripts might accept user input for keywords or other options.
In the end, the extracted data is exported into some format with popular options being CSV and JSON. It might need to be parsed along the way in order to be made understandable for humans if manual analysis is required. For example, dynamic pricing applications implement fully (or at least in large part) automated solutions for data management to reduce the “downtime” between changes.
If it were so simple, however, there wouldn’t be any automated data collection service providers like us. While on the face of it there’s nothing complicated or technical about it, scraping at scale requires an enormous infrastructure with various tools implemented such as proxies.
Technical expertise is the lifeblood of any scraping project’s long-term survival. From our personal experience, even high-tech businesses struggle with in-house implementations of web scraping due to the complexities involved in the process. In fact, that was the catalyst for Oxylabs to start building the Scraper APIs we have now.
Data use cases
If it were just dynamic pricing, our efforts would have mostly gone to waste. While it’s a powerful and popular usage of data extraction methods, it’s by far not the only one. Data can be used in many creative ways. Some companies might be able to benefit from several ways of utilizing data.
A fairly simple example, but one that cannot be automated, is market research. Web scraping can be used over the entire course of development for a product or service. Enumerating all the competitors is a great start as in-depth data can be acquired about their offerings.
Additionally, the entire internet can be searched for comments, reviews, and feedback left on forums and websites that can reveal other opportunities. Products and services can have common pain points that can easily be solved before entering a market.
Another use case revolves around the much deeper analysis. Venture capital and financial service companies were amongthe first candidates to start eyeing web scraping. Ever since the landmark paper Twitter mood predicts the stock market has been published, investment moguls rushed to take advantage of what is now called “alternative data”.
It got its name in contrast to traditional sources such as company financial reports, data from statistical reports, etc. Alternative data is composed of sources such as the aforementioned tweets, search trends, or even such unusual things as satellite imagery.
Investment companies and venture capitalists look for secondary signals from alternative data. It, however, takes a lot of effort and creativity. For example, changes in the vacancy of retail store parking lots indicate changes in business health. The question is how closely related these two factors are. A slight miscalculation can lead someone down a negative ROI road.
A glimpse into objective data
There have been claims about how data is so important nowadays. I think these are understatements, at best. Free (or nearly so) access to data opens completely new and previously unavailable opportunities.
The current trend towards customer engagement in marketing is a great example of how external data revolutionizes the process. Most of the time, getting input from customers requires sending out feedback forms or attempting to gather metrics such as Net Promoter Scores.
As any statistician would eagerly point out, data collected in such a manner doesn’t perfectly reflect the real situation of the business. Feedback forms are usually filled out by loyal or highly engaged customers. While their opinion is valuable, you’d likely want to hear the judgments from the average customer and even your detractors.
A highly nitpicky statistician would point out that it would be heavily skewed, because the audience would be composed only of those that have already considered the products and services as valuable (or really annoying, in some cases). Finally, there’s a psychological part to it – customers know someone will be reading those forms, therefore they are not fully objective.
Web scraping can support these feedback forms by providing a glimpse into the fully objective side of the business. When someone posts a complaint on a forum that is completely unrelated to the business, they likely don’t expect that a representative will find it. Additionally, these offhand comments might unveil things that would never appear in feedback forms or NPS.
Data is not the future. The future is data. Opportunities opened up by large scale automated public data acquisition are in their infancy. In fact, I would go so far as to say that these processes will become so ubiquitous and so important that they will become the primary battleground for business competition.