Subscribe / Unsubscribe Enewsletters | Login | Register

Pencil Banner

Is scraping data part of your game plan?

Averill Dickson | Aug. 27, 2015
Watch out, for the legal rules in this area will only get firmer and tighter.

scrape data

It was my colleague’s birthday last Monday. Her closest friends knew. The rest of us in the office knew, once she bought us all free coffees. And Google.

She wasn’t surprised that Google knew, but found it unnerving that Google chose to let her know that it knew by displaying a personalised version of its logo with birthday-like icons and a Happy Birthday Sandra label. Did she provide that information to Google at some point? Probably. Did they scrape it from another online cache? No idea. How would she know?

As an advisor to technology companies, we occasionally get asked about the legal risks associated with harvesting data via scraping and similar means. If this is a major part of your game plan, here are the starters you should be thinking about.

Consider the sites you are extracting information from – and look at their website terms of use. If the terms of use specifically prohibit access to that site via any means other than a designated browser or use of the information for your commercial purposes, then this should be a red flag. In a legal world that is still getting to grips with whether scraping should or shouldn’t be allowed, this is an easy mechanism for the website owner to demonstrate that you are in the wrong.

Consider the information you are extracting – are you taking information that is personally sensitive (such as personal contact details) or commercially sensitive (such as brand names)? The legal rules around taking personal information, and using another’s brand for one’s own commercial purposes, are both well-developed areas of law and highly protectionist. Are you taking images or compilations of information that are likely to be seen as proprietary either because it is highly original or would have been a labour intensive exercise to collate (e.g. images, or product catalogues)?

Although the law is always a step behind technological developments, the sentiment of the legal cases to date in a number of jurisdictions is that the website owner should be able to prevent scrapers from harvesting information without authorisation.

Typically, harvested data includes product descriptions and pictures reproduced from other sites. As soon as you are reproducing another person’s text or images you raise legal issues of potential copyright infringement. These risks are lessened if (a) the images and text are not reproduced in whole, (b) the text is not reproduced verbatim but, as your school teacher would say, restated in your own words (taking care not to mislead or misstate any aspect of the goods or services, however), (c) the images are unoriginal, do not reproduce trade marks, are sourced from a different place to the product description, or otherwise are less likely to be the subject of copyright held by the same owner as the other extracted information.


1  2  3  Next Page 

Sign up for CIO Asia eNewsletters.