Friday, June 07, 2019

Copyright and Third Party Data #analytics

Can you use material on the Internet for personal just because it is there? For example, if someone posts a data file on a web page (eg, on GitHub), is it OK for me to use it without asking permission? Can I use it to explain a data analysis concept in class? Can I share it with my students? Can I use it in an assignment, or an exam?

This past year I have used several data sets that I searched the World Wide Web for. I always acknowledge the source in my lecture notes when I can. One data set I used in a class last year was based on Body Dimensions (Heinz et al, 2003)*. Try an exact match search for the file name: "bdims.csv", and you will see several locations where this file can be downloaded. The Journal in which this data set was published does not hold copyright over the material published in the Journal, and warns researchers to get separate permission from the authors. So I emailed the lead author (Professor Grete Heinz) to seek permission to use her data set, and she kindly responded that I could. Should I have to do this for every data set that I use? Will I get a response from everyone?

Sometimes permission is granted on the web page where the data is made available. For example: Fingal County Council here in Dublin have an Open Data policy and advise users of the data of the following:

Citizens are free to access and use this data as they wish, free of charge, in accordance with the Creative Commons Attribution 4.0 International License external link (CC-BY).

Another great example is from our Central Statistics Office:

CSO Publications: Rights and Permissions

  • Statistics disseminated on this site are copyright of The Government of Ireland.
  • The statistics and other information provided on this site are accessible free of charge and licensed under Creative Commons Attribution (version 4.0 cc-by). Creative Commons logo
  • Reproduction is authorised subject to acknowledgement of the source

This heightened awareness is great, and many web sites now publish statements such as this so that users and researchers can be reassured that they are not stealing data. The vague area is when this is not specified even though it is intended. Nevertheless, I intend from now on that my students should acknowledge all sources of their data and where available cite permissions to use the data. This is good practice, though it does add an extra step for students to complete.


*Heinz G., Peterson, L. J., Johnson R. W., and Kerk, C. J. (2003). Exploring Relationships in Body Dimensions. Journal of Statistics Education 11(2).

No comments:

Post a Comment