After more than half a billion of its member profiles appeared on the black market, LinkedIn has issued a statement assuring its users that any parties that aggregate LinkedIn user data will be stopped and held accountable. However, Karolis Toleikis, CEO at IPRoyal—a residential proxy provider—suggests that there has to be a distinction made between malicious actors who use web scraping to perpetrate such incidents and genuine companies that employ the technology to make more informed business decisions.
On April 8, the personal information of 500 million LinkedIn users was reported to have been listed for sale on a hacker site. In response, LinkedIn neglected any security flaws on their end stating that the information was opportunistically aggregated via web scraping. Furthermore, the company reasserted its commitment to block all third parties from harvesting LinkedIn member data without their users’ consent.
While this instance involves an anonymous actor that is attempting to profiteer by selling the aggregated data on the dark web, in the past LinkedIn has also taken legal action against public companies that offer analytic products created from publicly available LinkedIn user information. Probably the most-known example is the ongoing legal battle between LinkedIn and HiQ—a data analytics startup—that was followed by LinkedIn’s attempts to stop HiQ from aggregating its public user information and resulted in the U.S. Court allowing the startup to resume its scraping operations.
“When unfortunate incidents like the one that happened on April 8 take place, the practice of web-scraping often gets conflated with various forms of malicious activity like hacking, fraud, and information theft. However, that does not represent the true potential of this technology. In reality, it is web-scraping applications utilized by genuine businesses that really drive the rapid growth of the web scraping industry,” said Mr. Toleikis.
He went on to explain how companies can benefit from integrating data scraping into their business model. “Collecting and analyzing large amounts of data is a process designed to extract insights about a particular industry or uncover underlying patterns of consumer behavior which our customers typically employ to make better business decisions. Data collected via data scraping can facilitate processes such as web content creation, business intelligence, finding sales leads, conducting marketing or advertising research, and developing personalization.”
Having said that, there are more incidents like the recent one involving LinkedIn. In April 2021 alone, data of over 500 million Facebook and Clubhouse users have been listed for sale. While the practice of scraping publicly available data remains legal in most countries, in some cases, such information is used to perpetrate serious criminal offenses. According to Toleikis, it remains largely up to residential proxy providers to ensure that web scraping is used in a legal manner.
“Residential proxy providers should always make sure that they are working with genuine companies and individuals only. Therefore, using an advanced user identity verification process is essential. This way, if the proxy provider suspects that their client has plans to engage in something unlawful–such as targeted phishing attacks, spamming campaigns, or online fraud—they can be immediately held accountable for their actions,” said Mr. Toleikis.
As with any technology that has a wide range of applications—if used properly—web scraping can assist genuine companies in obtaining business intelligence, brand protection, and product personalization.