Open Data Node

About Open Data Node

Open Data Node (or ODN in short) is an open source software platform that provides user with tools and functions for effective long-term publishing of (linked) open data in automated, repeatable and easy to use fashion. It supports the whole life cycle of data publication, management and exchange. ODN is accompanied with  generic Methodology for Open Data publishing  and equipped with a set of APIs allowing 3rd parties to build applications on top of it.

ODN is the main output of   COMSODE FP7 project.

Features

  • It provides powerful ETL capabilities, both for Linked Data and tabular/relational data, to allow publishers to convert, clean, enrich and link data before publishing as Open Data
  • To help data users to actually understand and use the data, it provides also data publication and presentation functions
  • And to help data publishers more with entire Open Data publication process (as described for example in COMSODE Methodology –  see here), it provides also cataloguing functionality
  • Data publishers will also benefit from an integration capabilities with internal systems, modular design and Open Source nature of ODN
  • Simple installation

Simple installation

On Debian systems, after you prepare COMSODE repository, you can simply run:

aptitude install odn-simple

and you have ODN installed and running.

Just follow  installation instructions.

ETL capabilities

Very shortly: ODN basically pumps data from the inside to the outside. To do that properly, in context of Open Data:

  • ODN supports both Linked Data (as new paradigm for data publication and usage) and tabular/relation data (currently prevailing technology)
  • It has ability to create repeatable publication jobs, jobs which can convert formats, clean and enrich the data, even link the data to other data
  • Publishers can schedule such jobs to automate publication of updates to keep datasets up-to-date without repeated manual labour

ODN Scheme

Very important aspect is caching of the data: Open Data intended for publication is stored inside ODN. Thanks to that, internal systems are insulated from possible overload or attacks via Open Data publishing. While in rare cases ODN can go down, internal systems are still operational and organization publishing the data can still function.

ETL capabilities are supplied by powerful UnifiedViews tool – for more information visit  page dedicated to UnifiedViews.

Publication and presentation functions

ODN support both basic methods of publishing the data (as described in post Understanding Data Accessibility):

  • batch access: ODN can prepare file dumps in various formats (CSV, XML, RDF, etc.), possibly compress them (ZIP) and make them available (via HTTP).
  • access through API: ODN can publish tabular/relational data via REST API and Linked Data via SPARQL endpoint.

And thanks to incorporating CKAN and other tools, ODN provides to data users also functions to preview, analyse and visualize data. As or now, that works mainly for tabular data. Later on, we will enhance that also for Linked Data, with visualisation tool LDVMi .

Cataloguing functionality

This functionality is provided by including customized  CKAN catalogue in ODN in two roles:

  1. CKAN in role of “internal catalogue” is the main entry for data publishers into ODN. This catalogue if private, visible only to data publisher and its authorized personnel.From this catalogue, publishers manage many aspects of their Open Data publication. Once some dataset is properly prepared for publication, it can be marked as “public” and ODN will automatically ensure the visibility of such public data to the general public in …
  2. … CKAN serving role of “public catalogue”. This public catalogue is the main entry for general public (a.k.a. data users). In this catalogue, they will see only datasets explicitly marked as “public”, and will use this catalogue to search for the datasets, learning about them, looking at and obtaining the data from.

Integration functions, modular design, Open Source implementation

For the basic use-cases, the main focus is on ability to integrate with various kinds of data sources: various formats (XLS, XML, CSV, etc.), technologies (SQL, JDBC, SPARLQ, etc.), via file system or remotely (HTTP, etc.) and so on are supported “out of the box”.

In more broader terms, thanks to Open Source implementation of ODN, taking into account also open standards, ODN can be enhanced with additional modules, or incorporated into bigger information systems, integrated with existing infrastructure as used by data publishers, etc. It can be even modified.

For example ODN’s Single-Sign-On (SSO): Thanks to  midPoint, CAS and LDAP, it can be integrated to existing user management, authentication and authorization systems organizations may already be using.

Support for correct publishing

In order to publish Open Data properly, i.e. using Open formats, in machine readable form and in timely manner,  Open Data Node does following:

  1. it extracts (harvests) data from internal systems using any available interface and method to do that safely, effectively, with low costs,
  2. it processes that data, performing format conversions, cleansing, anonymization, enrichment, linking, etc. (and as part of that also compiling some metadata about that data)
  3. it stores the results (data and metadata too), serving effectively as cache, protecting internal systems from overloading in case of high demand for data from users,
  4. it makes the results available to the general public and businessessupporting both common users (with usual office tools on PCs or other devices) and application developers (equipped with powerful software development tools and above average hardware), implementing also automated and efficient distribution of updated data increments and dataset replication (including metadata),
  5. it allows all this to be automated, easy to use and easy to maintain.

In other words and shortened:  Open Data Node helps publishers of Open Data with the complexity of source data and continuously delivers easy to use and high quality Open Data to the users.

Stable release

ODN version 1.0 was a first stable release published in April 2015. This means that we are now providing further upgrades in a way so as to not disrupt your operations, i.e. backward compatible or (if not feasible) with easy migration to new release.

Future

While ODN 1.0 does provide a lot of basic functionality (and it can also already help applications to be built – see  Building an application on Open Data with Spinque), there is still some work to do in order to make ODN even better. Using feedback from various pilots in EU and other users around the world, we will further refine ODN.

 

For more information about all past releases and future road map, please see ODN Documentation:

https://utopia.sk/wiki/display/ODN/Roadmap+and+releases