When is textual data open data?

Do you have practical guidelines that clarify when language data can be considered open data? How and when does the PSI apply?


  • There is a distinction between "open data" and documents that fall within the scope of the rules on the re-use of public sector information (PSI).
    Documents that fall within the scope of the PSI rules are those held by public sector bodies (i.e. documents that public sector bodies can authorise re-use of). The PSI Directive defines public sector bodies as:
    - the State, regional or local authorities and
    - bodies governed by public law (established for the purpose of general interest, having legal personality and financed for the most part by the State, regional or local authorities or other bodies governed by public law).
    Such documents should be made available for re-use, which means that individuals should be granted permissions to re-use them.
    Some public sector bodies (such as public broadcasters or universities) are excluded from the scope of these rules. Likewise, many types of documents are excluded as well. For more detailed information, the national implementation of the PSI Directive should be consulted.

    In many countries, Open Data Portals were created to increase the re-usability of public sector information. The datasets made available via those portals are usually re-usable under very permissive conditions (i.e. under waivers such as CC0, or licenses such as CC BY 4.0 or national Open Government Licenses).

    According to the Open Definition (https://opendefinition.org), Open Data are data that can be accessed and re-used by anyone, subject only at most to the conditions of attribution (BY) and share-alike (SA). Data published on Open Data Portals are therefore Open Data. However, there are many documents that fall within the scope of PSI rules, but are not Open Data (yet). Similarly, there are many categories of Open Data that are not Public Sector Information (e.g. academic papers available under a CC BY 4.0 license).