top of page

Web scraping


Advise from Eva

Are you a fan of web scraping, do you suspect you have been scraped or do you have no idea what it is? Don't move!


💥 Lastminute.com was ordered to pay €50,000 in compensation for using data from the Ryanair site to supply its own offers, on the grounds that Ryanair's general conditions expressly limited the use of its database by third parties.


🔎 Scraping is the act of automatically extracting data from a site to use it for other purposes. This technique is used in particular to enrich or constitute a database.


⚖️ But is it really legal?


Yes and no. Well, it all depends. As usual in law what.


✔️ In principle, you can scrape all sites that do not require a login.


❌ For sites accessible through a login account, scraping is governed by the general conditions of use. And as you can imagine, most sites prohibit the reuse of their data for commercial purposes.


For example, LinkedIn's T&Cs state: "You agree not to develop, support, or use any software, (...) to web scrape the Services or otherwise copy profiles and other data from the Services."


💡 Your site is a victim of scraping: what legal grounds can you invoke?


🌀 The sui generis right: it is the intellectual property right that protects you, as the producer of the database, if you manage to demonstrate that the constitution of the content of your database required an investment (financial, material or human).


And it rarely falls all cooked in the beak!


🌀 A violation of the GDPR: inevitably, if the scraped data contains personal data (personal telephone number, email address, etc.)


🌀 Unfair competition: if you prove that the scraper steals your customers without pressure thanks to the use of your data


🌀 or parasitism: if you prove that his pockets are full thanks to your work without having paid a single penny, this big rat 🐀


🌀 Your T&Cs: you can engage the scrapper's contractual liability, quite simply


💡How to prevent scraping on your site?


➡️ Require the creation of a user account

➡️ Disallow access to suspicious IP addresses

➡️ Use captchas


💡Best tips I can give you to run your scraping campaigns:


➡️ Read the TOS (even if it's boring).

➡️ Respects Robots.txt files: they indicate to visiting software the places in which they are authorized, or not.

➡️ Choose the right time: some scraping processes can lead to technical problems on the site in question. It's not cool for Internet users. So avoid peak hours!


🍀 Otherwise: ask permission. It works more often than you think. And don't forget to return the favor when you can! 🌟


Better Call Eva

11 views

Recent Posts

See All

Comments


bottom of page